the (full) name of the MetaDataType class used by this index
the name of the index
Gets a DataFrame row and extract the raw metadata needed by the index
Gets a DataFrame row and extract the raw metadata needed by the index
Row a row to be indexed
raw metadata needed by the index or null if the row contain null value
Gets a DataFrame and checks whether it is valid for the index No need to check column existence as it is checked by the index builder
Gets a DataFrame and checks whether it is valid for the index No need to check column existence as it is checked by the index builder
the DataFrame to be checked
a map containing column names (as appear in the object) and their data types the key is the column name in lower case
[[XskipperException]]
with the reason if invalid
Same as above reduce given two accumulated metadata
Same as above reduce given two accumulated metadata
updated metadata for the index
Given an accumulated metadata and new value - process the new value and returns an updated accumulated metadata
Given an accumulated metadata and new value - process the new value and returns an updated accumulated metadata
accumulated metadata created by processing all values until curr
new value to be processed
updated metadata for the index
"zero" value of the index - will be used for the first comparison to the object's rows data (by default this is null)
Generate the column map according to a given schema
Generate the column map according to a given schema
a map containing column names (as appear in the object) and their data types the key is the column name in lower case
the index columns (in lower case)
the columns which the indexed is defined on
the index params map
For some formats we might have an optimized way for collecting the metadata This function enables this by receiving the entire file DataFrame instead of processing it row by row (For example in Parquet we can read the min/max from the footer)
For some formats we might have an optimized way for collecting the metadata This function enables this by receiving the entire file DataFrame instead of processing it row by row (For example in Parquet we can read the min/max from the footer)
the path of the file that is being processed
a DataFrame with the file data
the format of the file
the options that were used to read the file
the collected MetadataType or null if no metadata was collected
Represents an abstract index for a metadata on a file.