Query Evaluation Flow¶
Definitions¶
Clause¶
We analyze Expression Trees and label tree nodes with Clauses.
A Clause is a boolean condition that can be applied to a data subset (i.e, object)
For example, given the expression
Filter¶
The labeling process of Expression Trees is done using filters.
An algorithm A is afilter if it performs the following action: When given an expression tree
For example, given the expression
A filter MaxClause
:
MaxClause(c,>,v)
is defined as c
is the column name v
is the value. Since MaxClause(temperature,>,101)
represents the node to which it was applied,
Clause Translator¶
A component which translates a Clause to a specific implementation according to the metadatastore type.
Query Evaluation Flow¶
Query evaluation is done in 2 phases:
-
A query’s Expression Tree
is labelled using a set of clauses-
The clauses are combined to provide a single clause which represents
. -
The labelling process is extensible, allowing for new index types and UDFs.
-
-
The clause is translated to a form that can be applied at the metadata store to filter out objects which can be skipped during query run time.
A simple example¶
For example, given the query:
SELECT *
FROM employees
WHERE salary > 5 AND
name IN (‘Danny’, ’Moshe’, ’Yossi’)
The Expression Tree can be visualized as following:
Assuming we have a MinMax
Index for the salary
column (store minimum and maximum values for each object) and a
ValueList
Index on the name
column (storing the distinct list of values for each object).
- Applying the
MinMax
filter results in:
- Applying the
ValueList
filter on the results of the previous filter results in:
- Finally we generate a combined Abstract Clause:
AND(MaxClause(salary, >, 5),ValueListClause(name, ('Danny', 'Moshe', 'Yossi')))
This clause will be translated to a form that can be applied at the metadata store to filter out objects which can be skipped during query run time