Principles of Database Systems (H) Lecture 6 Notes
Postprocessing Queries
Sorting
We have a single construct in SQL that sorts the result set:
1 | SELECT title, year_released |
DESC and ASC to control it: 1 | SELECT c.country_name, m.title, m.year_released |
The specific ordering depends on the data type, e.g. the strings are
sorted lexicographically. But it can be troublesome when
NULLs are in the comparison. In this case, the ordering
depends on specific behaviour of DBMS:
NULLis smaller than everything: SQL Server, MySQL, SQLiteNULLis greater than everything: DB2, Oracle, PostgreSQL
Sometimes we may found the orderings of texts vary across different languages or cultures. To solve this, we need to use collation:
1 | -- For PostgreSQL, SQL Server, MySQL |
Another problem is that when we want orderings other than the ascending and descending behavior. For example, to order the credit of people in movies:
1 | ORDER BY |
Another problem is that we may need to retrieve only a “slice” of
sorted data, e.g. successive pages on websites. We can leverage the
LIMIT construct to achieve this:
1 | -- PostgreSQL, MySQL, SQLite |
OFFSET
keyword: 1 | -- PostgreSQL, MySQL, SQLite |
There are also many cases when plain ordering isn’t satisfying. For example, messages in a forum thread can form a very complex hierarchy, i.e. tree-like data. DBMS is known to struggle when dealing with data like this. This is the well-known BOM (Bill Of Materials) problem.
One way to solve this problem is by using the “materialized path”, turing the “ancestor” into a attribute.
Oracle has implemented another solution by dynamic ordering:
1 | SELECT message |
Most DBMSs handle hierarchies through recursive queries.
Window Functions
We have seen so far two kinds of functions:
- Scalar functions that operate on values in the current row
- Aggregate function that operate on sets of rows.
The problem with aggregate functions is that they ignore details of data. If I ask for the year of the oldest movie per country, the database will only return a country and a year, and nothing else.
If we want more details with aggregate functions, the only option is to join their output to the same table:
1 | SELECT a.country, a.title, a.year_released |
1 | SELECT country, title, year_released, |
- the release year
- earliest movie for the same country was released
But this will return many unnecessary results. We need to limit output to those movies for which the year of release happens to be the same as the earliest one for their country:
1 | SELECT m.country, m.title, m.year_released |
1 | SELECT m.country, m.title, m.year_released |
1 | SELECT m.country, m.title, m.year_released |
Also note that we cannot “flatten” the nested query into one query. This is mainly because the logical processing order of SQL:
FROM/JOINWHEREGROUP BYHAVING- Window Functions
SELECTORDER BY
Then if we try to move the window function to the outer query:
1 | SELECT country, title, year_released |
WHERE clause, this will cause the WHERE clause
depends on unavailable data!
Window functions always operate against rows that belong to a result
set. One related characteristics is that they can only appear after the
SELECT, not in the WHERE clause, and there is
nothing with them similar to HAVING with aggregate
functions.
Window functions+OVER can be rewrite by using
GROUP BY+JOIN:
1 | SELECT a.country, a.title, a.year_released |
Similar to that we can have an aggregate function without a
GROUP BY clause when we want only one result for the whole
table, we can have an empty OVER clause to indicate that
the window function should compute the result over all rows selected.
This is frequently used in operations such as computing a value as a
percentage of the total:
1 | SELECT country_name, cnt as number_of_movies, |
Tip
When there is an ORDER BY clause we cannot start
returning rows before you have seen all of them, so we may count them
too when sorting, and the marginal cost of the window function is near
zero.
We can rewrite the query by using a CROSS JOIN:
1 | SELECT country_name, cnt as number_of_movies, |
There are a special window functions, called the ranking window functions:
row_number()rank()dense_rank()
With a ranking window function you must have a ORDER BY
clause in the OVER(). You can combine it with a
PARTITION BY to order with groups.
The three functions mainly differs in the ways of dealing with ties:
row_number()assigns distinct, sequential numbers to everyonerank()assigns the same number to ties, but leave a corresponding gap in ranksdense_rank()also assigns the same number to ties, with no gap in ranks
- Title: Principles of Database Systems (H) Lecture 6 Notes
- Author: Zhige Chen
- Created at : 2025-10-21 22:03:57
- Updated at : 2025-11-01 20:46:10
- Link: https://nofe1248.github.io/2025/10/21/dbh-06/
- License: This work is licensed under CC BY-NC-SA 4.0.