A library for creating, managing and deploying data skipping indexes with Apache Spark
Getting Started ConceptsBoost performance and reduce cost by skipping over irrelevant data
Supports Parquet, CSV, JSON, ORC and Avro. Hive tables are also supported
Supports MinMax, ValueList and BloomFilter indexes out of the box
Add your own index types and support data skipping for your favourite User Defined Functions