An Extensible Data Skipping Framework

A library for creating, managing and deploying data skipping indexes with Apache Spark

Getting Started Concepts

Performance

Boost performance and reduce cost by skipping over irrelevant data

Data Type Agnostic

Supports Parquet, CSV, JSON, ORC and Avro. Hive tables are also supported

Built-in Index Types

Supports MinMax, ValueList and BloomFilter indexes out of the box

Extensible

Add your own index types and support data skipping for your favourite User Defined Functions