Extensible Data Skipping¶
Xskipper creates a level of abstraction between index creation/query evaluation, and the actual metadata store implementation, which is unique to each metadata store.
This abstraction operates in two main areas:
- Indexing Flow - abstract metadata is generated during index creation and refresh by analyzing a DataFrame.
- Query Evaluation Flow - abstract metadata clauses are generated by filters that analyze the Catalyst (Spark optimizer) expression tree for pushdown predicates, and identify subtrees that can be mapped to a metadata Clause.
These abstract structures (MetaData and Clauses) are then translated to a representation that matches a specific metadata store. Metadata Clauses can then be applied to MetaDataTypes in an efficient manner.