Lecture 1

Introduction to Databases

A modern database system is a complex software system whose task is to manage a large, complex collection of data. The central idea was to have one system doing once and for all the boring data storage/retrieval part.

A bit of history

In early days database applications were built directly on top of file systems. However, it suffers from many issues, including:

Data redundancy and inconsistency
Difficulty in accessing data
Data isolation
Integrity problems
Atomicity of updates
Concurrent access by multiple users
Security problems

1980s:

Research relational prototypes evolve into commercial systems
- SQL becomes industrial standard
Parallel and distributed database systems
- Wisconsin, IBM, Teradata
Object-oriented database systems

1990s:

Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce

2000s:

Big data storage systems
- Google BigTable, Yahoo PNuts, Amazon
- NoSQL systems
Big data analysis: beyond SQL
- Map Reduce, etc.

2010s:

SQL reloaded
- SQL front end to Map Reduce systems
- Massively parallel database systems
- Multi-core main-memory databases

Relational Database

The relational database is based on the relational model of data, it:

organizes data into one or more tables
rows are also called records or tuples
columns are also called attributes

Each column in the table stores a piece of data, and one row represents a “known fact”. For example:

	Name	Birthday	Birthplace
Person1
Person2
Person3

All the pieces of data in a row are related, hence “relational”.

Keys

The relational database prohibits different records. Since there are no duplicate, we need to identify what allows you to differentiate one row from another. It may be one column, or one set of columns, known collectively as a key.

It may happen that several different keys are available. One of them is (arbitrarily) singled out and called the primary key. We usually choose the simplest one. In practice, to simplify the issue, we often add a numerical attribute (often a increasing integer) as the primary key.

Normalization

One common problem with databases is that data may be written at different time by different people. If you let them write data in the way they want, it will make data retrieval very difficult because when they search, computers may compare the data literally. So we need to standardize your data, a process also known as normalization.

First Normal Form

First Normal Form (1NF): each column should only contain one piece of information.

Every Column Should Have Single Values
All Values in a Column Should Be of the Same Type
Every Column Must Have a Unique Name
The Order of Data Doesn’t Matter

Second Normal Form

Second Normal Form (2NF): Every non-key attribute must provide a fact about the key. A database conforms to the 2NF if:

It is already in 1NF.
It has no partial dependency: Every non-key attribute must depend on the entire primary key.

Third Normal Form

Third Normal Form (3NF): Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. A database conforms to the 3NF if:

It is already in 2NF.
It has no transitive dependency: No non-key attribute should depend on another non-key attribute. All non-key attributes must depend only on the primary key.

Principles of Database Systems (H) Lecture 1 Notes