org.apache.spark.sql.SparkSession instance for processing
the URI of the dataset
a MetadataHandle instance to be used for saving the metadata
Collects the indexes and them to the metadatastore using the MetadataHandle
Collects the indexes and them to the metadatastore using the MetadataHandle
the format to be used when reading each object
the options to be used when reading each object Note: all objects are assumed to have the same options and format.
a sequence of indexes that will be applied on the indexed dataset
a sequence of (String, String) where the first string is the file name and the second is the fileID
(optional) the expected schema (since we are reading object by object the schema can be provided according to the full dataframe)
indicates whether the operation is a refresh
Collects the list of files that needs to be indexed A file needs to be indexed if: 1.
Collects the list of files that needs to be indexed A file needs to be indexed if: 1. It is a new file that was not indexed before 2. It is an indexed file which changed since it was indexed
The list of files the will be compared against the existing indexed files
indicates whether this is a refresh operation or not, in case this is not a refresh operation assuming no indexed files exits
Sequence of (String, String) where the first string is the file name and the second is the fileID for all of new/modified files, Sequence of files to be removed from the metadatastore (since they were updated)
Removes the metadata for a given list of files
Removes the metadata for a given list of files
the list of files to remove metadata for
A Helper class which collects the indexes and use a MetadataHandle to upload the metadata