Encrypting Indexes¶
When using Parquet Metadata Store, The metadata can optionally be encrypted using Parquet Modular Encryption (PME). This is achieved since the metadata itself is stored as a Parquet dataset, and thus PME can be used to encrypt it. This feature applies to all input formats, for example, a dataset stored in CSV format can have its metadata encrypted using PME.
Note
In the following sections, unless said otherwise, when referring to footers, columns etc., these are with respect to the metadata objects, and not the objects in the indexed dataset.
Modularity¶
Index Encryption is modular and is granular in the following way:
- Each Index can either be encrypted (with a per-index key granularity) or left plaintext.
- Footer + object name column + partition key values:
- The footer of the metadata object (in itself a Parquet file) contains (among other things):
- Schema of the metadata object, which reveals the types, parameters and column names for
all indexes collected (for example, one can learn that a Bloom Filter Index is defined on column
city
with false-positive probability0.1
) - Full path of the original dataset (or table name in case of a Hive Metastore table)
- Schema of the metadata object, which reveals the types, parameters and column names for
all indexes collected (for example, one can learn that a Bloom Filter Index is defined on column
- The object name column stores the names of all indexed objects, and their modification timestamps at indexing time.
- The partition key values (for example, when Hive Style partitioning is used in the indexed dataset) are automatically stored by Xskipper, each virtual column in its own dedicated column.
- The footer of the metadata object (in itself a Parquet file) contains (among other things):
The footer + object name column + partition columns are encrypted using the same key - "the footer key", unless Plaintext footer is specified, in that case the object name column and the partition columns will still be encrypted, but the footer itself will be left plaintext (that is, the plaintext footer mode of PME will be activated). If at least 1 index is encrypted then the footer key must be set.
Usage Flow¶
Danger
When using index encryption, whenever a "key" is configured in any Xskipper API, it's always the label - NEVER the key itself.