[DatabaseSystem] Advanced SQL

CMU 15465 Lecture2. Advanced SQL

SQL History

SQL is a kinf of relational languages, in which users only need to specify the answer that they want, not how to compute it. The DBMS is reponsible for efficient evaluate and optimize the query.

SQL is a live language within the continuous new features these years. Noboday actually follows the SQL standard exactly, but the minimum requirement is SQL-92.

  1. DML: How to operate the data
  2. DDL: How to define the data
  3. DCL: How to keep data privilige

SQL is based on bags not sets. Bag means list, which is ordered data structure. Also bag allows duplicate, while set does not allow duplicate.

Aggregates/GroupBy

Return a value from a bag of tuples. Most frequently used operator is AVG,COUNT,MAX,MIN. Also multiple aggregates are frequently used in SQL.

COUNT,SUM,AVG support DISTINCT.Following command is not defined.

SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid

The cmd above calculates the average GPA of all students in all courses. Therefore, e.cid(course id of enrolled student) does not make sense. It is imporant to note that users need to be careful to guarantee the select object lies in the same hierarchy.

The corrected query is,

SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid

Having vs. Where
It depends on the filter based on aggregation result. The aggregated attributes could not be used in where condition judgement. Following query is wrong!

SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
AND avg_gpa > 3.9
GROUP BY e.cid

Having provides the function to filter based on aggregation result. The corrected query would be:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa>3.9

String Operation

%% matches any substring, including the empty string.
_ matches any character
SQL also supports quite a lot of string operations, but some times they differs from one implementation to another implementation.

Data/Time Operations

The syntax varies wildly.
Example
Mysql:

SELECT DATEDIFF(DATE('2018-08-29'),DATE('2018-01-01')) AS days

SQLite

SELECT CAST((julianday(CURRENT_TIMESTAMP) - julianday('2018-01-01')) AS INT) AS days;

SQLite is the most frequently used db in the world, the reson for which is that SQLite does not have the copyright. Nearly each device in the world now has the sqlite internally.

Output Redirection

User could store the query result into a table.
Build a new table for the query table. Note that the new created table must not be already defined.

CREATE TABLE CourseIds (
SELECT DISTINCT cid FROM enrolled);

Or DBMS could insert tuples from query into another table. To keep the insert correct, user need to keep the query result and inserted table have exactly the same schema. Also, DBMS has different options on what to do with duplicates.

INSERT INTO CourseIds
(SELECT DISTINCT cid FROM enrolled);

Some control sequence (ORDER BY,LIMIT,OFFSET) could be used:

SELECT sid FROM enrolled
WHERE cid = '15-721'
ORDER BY grade DESC, sid ASC
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
LIMIT 20 OFFSET 10

Nested Queries

Nested queries contains other queries. They are often difficult to optimize, inner queries could appear almost anywhere in query.

SELECT name FROM student WHERE
sid IN (SELECT sid FROM enrolled)

Query above has an inner query and an outer query. For this example, SQL would usually simpify it as a join operation.
Following query gets the names of students in ’15-445′

SELECT name FROM student
WHERE sid = ANY(
  SELECT sid FROM enrolled
  WHERE cid = '15-445'
)

The keyword any means that select any sid existing in the tuples from inner query result. User could also use other keywords to bridge inner query and outter query.
1. ALL→ Must satisfy expression for all rows in subquery
2. ANY→ Must satisfy expression for at least one row in sub-query.
3. IN→ Equivalent to ‘=ANY()’ .
4. EXISTS→ At least one row is returned.

Window Function

Window function performs a calculation across a set of tuples that related to a single row. It is like an aggregatation but tuple are not grouped into a single output tuples.

Following query is a typical window function, which uses the MAX function without group by

SELECT MAX(e.sid) FROM enrolled AS e

Window function provides any aggregation functions we have discussed earlier, and it also provides some special window functions:
1. ROW_NUMBER(): NO of the current row
2. RANK(): Order position of the current row

SELECT cid, sid,
ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled
ORDER BY cid

It looks like the group by aggregation, but which is more flexible. There is no prerequisite to keep the group.
Assume we have 5 rows and 3 groups, let’s take the query as an example. The result of window function gives 5 lines, where each line is corresponding to the row. While the group function gives 3 lines, where each line is corresponding to a group.

Common Table Expression

Common Table Expression(cte) provides a way to write auxiliary statements for use in a larger query, which could be regarded as a temp table only used once in the query.

CTE could help users to simplify a large and complex query, and divide the query into different substeps. It is an alternative to nested queries and views.

WITH cteSource (maxId) AS (
SELECT MAX(sid) FROM enrolled
)
SELECT name FROM student, cteSource
WHERE student.sid = cteSource.maxId

Queries above creates a temp table cteSource which used in following queries.



Leave a Reply

Your email address will not be published. Required fields are marked *