Gets the DataSkippingFileFilter relevant for this tid, FileIndex and backend
Gets the DataSkippingFileFilter relevant for this tid, FileIndex and backend
the fileIndex for which we create a DataSkippingFileFilter
the table identifier
the backend to be used to create the DataSkippingFileFilter
the spark session
whether we create an evaluate DataSkippingFileFilter which only report skipping stats
Inject a rule as part extendedOperatorOptimizationRule
Gets an inMemoryFileIndex and reconstructs the FileStatusCache The way the FileStatusCache is implemented in spark makes it to not be shareable between instances meaning that calling FileStatusCache.getOrCreate(spark) will result in an empty cache and thus will require a new listing when the FileIndex is being replaced with a data skipping FileIndex.
Gets an inMemoryFileIndex and reconstructs the FileStatusCache The way the FileStatusCache is implemented in spark makes it to not be shareable between instances meaning that calling FileStatusCache.getOrCreate(spark) will result in an empty cache and thus will require a new listing when the FileIndex is being replaced with a data skipping FileIndex. To avoid this code reconstructs the cache using the existing FileIndex and then it can be used by the new FileIndex. Note: the reason we can't get the FileStatusCache of the original inMemoryFileIndex is because it is handed over to it in the constructor and is not defined there as var/val so we can't access it once we have an instance of inMemoryFileIndex
a spark session - used to get a new cache
the inMemoryFileIndex to construct the cache from
a FileStatusCache populated with the root paths from the given inMemoryFileIndex