Query syntax
ScopeDB supports querying using a pipelined relational query language in the following basic syntax:
[ FROM ... [ SAMPLE ... ] | VALUES ... ]
SELECT ...
[ WHERE ... ]
[ [ GOURP BY ... ] AGGREGATE ... ]
[ ORDER BY ... ]
[ LIMIT ... ]
SELECT
SELECT defines the set of columns returned by a query, corresponding to a projection in relational algebra.
Examples
Setting up the data for the examples
CREATE TABLE employee_table (
employee_ID INT,
last_name STRING,
first_name STRING,
department_ID INT,
);
VALUES
(101, 'Montgomery', 'Pat', 1),
(102, 'Levine', 'Terry', 2),
(103, 'Comstock', 'Dana', 2)
INSERT INTO employee_table;
Selecting all columns in the table
FROM employee_table SELECT *;
+-------------+------------+------------+---------------+
| employee_ID | last_name | first_name | department_ID |
+-------------+------------+------------+---------------+
| 101 | Montgomery | Pat | 1 |
| 102 | Levine | Terry | 2 |
| 103 | Comstock | Dana | 2 |
+-------------+------------+------------+---------------+
Selecting a single column by name
FROM employee_table SELECT last_name;
+------------+
| last_name |
+------------+
| Montgomery |
| Levine |
| Comstock |
+------------+
Selecting a column by position
FROM employee_table SELECT $2;
+-------+
| $2 |
+-------+
| Pat |
| Terry |
| Dana |
+-------+
Extending a column to all columns in the output
FROM employee_table SELECT employee_ID + 1000 as new_employee_ID, *;
+-----------------+-------------+------------+------------+---------------+
| new_employee_ID | employee_ID | last_name | first_name | department_ID |
+-----------------+-------------+------------+------------+---------------+
| 1101 | 101 | Montgomery | Pat | 1 |
| 1102 | 102 | Levine | Terry | 2 |
| 1103 | 103 | Comstock | Dana | 2 |
+-----------------+-------------+------------+------------+---------------+
FROM
FROM reads all the data from a table.
Examples
Create a table and load data into it:
CREATE TABLE ftable1 (retail_price FLOAT, wholesale_cost FLOAT, description STRING);
VALUES (14.00, 6.00, 'bling') INSERT INTO ftable1;
Here is a basic example of using the FROM clause:
FROM ftable1;
+--------------+----------------+-------------+
| retail_price | wholesale_cost | description |
+--------------+----------------+-------------+
| 14.0 | 6.0 | bling |
+--------------+----------------+-------------+
This example is identical to the previous example, but specifies the table name qualified by the schema for the table:
FROM public.ftable1;
+--------------+----------------+-------------+
| retail_price | wholesale_cost | description |
+--------------+----------------+-------------+
| 14.0 | 6.0 | bling |
+--------------+----------------+-------------+
This example is identical to the previous example, but specifies an alias to the table name:
FROM public.ftable1 AS t1 SELECT t1.wholesale_cost;
+-------------------+
| t1.wholesale_cost |
+-------------------+
| 6.0 |
+-------------------+
This example queries a sample of 10% of the data in the table:
FROM sales SAMPLE 10 PERCENT;
VALUES
VALUES allows the specification of a set of constants to be used to form a finite set of rows.
Examples
The VALUES clause is useful for generating a fixed, known set of rows:
VALUES (1, 'one'), (2, 'two'), (3, 'three');
+----+-------+
| $0 | $1 |
+----+-------+
| 1 | one |
| 2 | two |
| 3 | three |
+----+-------+
VALUES (1, 'one'), (2, 'two'), (3, 'three') SELECT $0 AS col0, $1;
+------+-------+
| col0 | $1 |
+------+-------+
| 1 | one |
| 2 | two |
| 3 | three |
+------+-------+
SAMPLE
Returns a subset of rows sampled randomly from the specified table. The following sampling methods are supported:
- Sample a fraction of a table, with a specified probability for including a given row. The number of rows returned depends on the size of the table and the requested probability.
Syntax
FROM ... SAMPLE <probability> PERCENT
Parameters
<probability> PERCENT
Specifies the percentage probability to use for selecting the sample. Can be any number between 0 (no rows selected) and 100 (all rows selected) inclusive.
WHERE
The WHERE clause specifies a condition that acts as a filter. You can use the WHERE clause to:
- Filter the result of a FROM clause or a SELECT statement.
- Specify which rows to operate on in an UPDATE or DELETE.
Syntax
...
WHERE <predicate>
[ ... ]
Parameters
<predicate>
A boolean expression. The expression can include logical operators, such as AND
, OR
, and NOT
.
ORDER BY
Specifies an ordering of the rows of the result table.
Syntax
FROM ...
[ ... ]
ORDER BY orderItem [ , orderItem , ... ]
[ ... ]
Where:
orderItem ::= { <column_alias> | <position> | <expr> } [ { ASC | DESC } ] [ NULLS { FIRST | LAST } ]
Parameters
<column_alias>
Column alias appearing in the SELECT list.
<position>
Position of an expression in the SELECT list.
<expr>
Any expression on tables in the current scope.
{ ASC | DESC }
Optionally returns the values of the sort key in ascending (lowest to highest) or descending (highest to lowest) order.
Default: ASC
NULLS { FIRST | LAST }
Optionally specifies whether NULL values are returned before/after non-NULL values, based on the sort order (ASC
or DESC
).
Default: Depends on the sort order (ASC
or DESC
); see the usage notes below for details
Usage notes
- Unless specified otherwise, NULL values are considered to be higher than any non-NULL values. As a result, the ordering for NULLS depends on the sort order:
- If the sort order is
ASC
, NULLS are returned last; to force NULLS to be first, useNULLS FIRST
. - If the sort order is
DESC
, NULLS are returned first; to force NULLS to be last, useNULLS LAST
.
- If the sort order is
LIMIT
Constrains the maximum number of rows returned by a statement or subquery.
Syntax
FROM ...
[ ... ]
[ ORDER BY ... ]
LIMIT <count> [ OFFSET <start> ]
[ ... ]
Parameters
<count>
The number of rows returned. Must be a non-negative integer constant.
OFFSET <start>
The row number after which the limited/fetched rows are returned. Must be a non-negative integer constant.
If OFFSET is omitted, the output starts from the first row in the result set.
AGGREGATE
Returns one or more columns that indicates aggregation results.
Syntax
FROM ...
[ GROUP BY ... ] AGGREGATE <function_calls>
Examples
Setting up the data for the examples:
CREATE TABLE sales (
product_ID INT,
retail_price FLOAT,
quantity INT,
city STRING,
state STRING,
);
VALUES
(1, 2.00, 1, 'SF', 'CA'),
(1, 2.00, 2, 'SJ', 'CA'),
(2, 5.00, 4, 'SF', 'CA'),
(2, 5.00, 8, 'SJ', 'CA'),
(2, 5.00, 16, 'Miami', 'FL'),
(2, 5.00, 32, 'Orlando', 'FL'),
(2, 5.00, 64, 'SJ', 'PR')
INSERT INTO sales;
Group by one column:
FROM sales
GROUP BY product_ID AGGREGATE SUM(retail_price * quantity) AS gross_revenue;
+------------+---------------+
| product_ID | gross_revenue |
+------------+---------------+
| 1 | 6.0 |
| 2 | 620.0 |
+------------+---------------+
Group by multiple columns:
FROM sales
GROUP BY state, city AGGREGATE SUM(retail_price * quantity) AS gross_revenue;
+-------+---------+---------------+
| state | city | gross_revenue |
+-------+---------+---------------+
| CA | SF | 22.0 |
| CA | SJ | 44.0 |
| FL | Miami | 80.0 |
| FL | Orlando | 160.0 |
| PR | SJ | 320.0 |
+-------+---------+---------------+
No group by qualifier:
FROM sales
AGGREGATE SUM(retail_price * quantity) AS gross_revenue;
+---------------+
| gross_revenue |
+---------------+
| 626.0 |
+---------------+
DISTINCT ON
The DISTINCT ON
clause allows you to select unique rows based on specific columns, with optional ordering control. When used with the BY
clause, it keeps the first row for each set of duplicates according to the specified ordering criteria.
When used without the BY
clause, it returns unique rows based on the specified columns, the rest part will be output with undetermined order.
Syntax
FROM ...
DISTINCT ON expr [, expr, ...]
[ BY <order_item> [, <order_item>, ...] ]
Parameters
expr
One or more expressions that define the uniqueness criteria. Only the first row encountered for each unique combination of these expressions will be returned.
order_item
The same syntax as the ORDER BY
clause.
Examples
Setting up the data for the examples:
CREATE TABLE t (
a INT,
b INT,
);
VALUES
(1, 2),
(2, 2),
(2, 2),
(2, 3)
INSERT INTO t;
FROM t DISTINCT ON a BY b DESC;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 2 | 3 |
+---+---+
This query returns unique rows based on column a
, keeping the row with the highest value of b
for each value of a
.
FROM t DISTINCT ON a BY b ASC;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 2 | 2 |
+---+---+
This query returns unique rows based on column a
, keeping the row with the lowest value of b
for each value of a
.
FROM t DISTINCT ON a, b;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 2 | 2 |
| 2 | 3 |
+---+---+
This query returns rows that are unique across the combination of both columns a
and b
.
Usage Notes
- The
BY
clause can contain multiple ordering expressions to provide fine-grained control over which rows to keep when there are duplicates. - Each ordering expression in the
BY
clause can have its own sort direction (ASC or DESC). - When used without a
BY
clause,DISTINCT ON
returns the first encountered row for each unique combination of the specified expressions. This may result in undetermined query results. - The
BY
clause is particularly useful when you want to select specific representative rows (e.g., the most recent by multiple criteria) from each group of duplicate values.