Clears the stats for all active MetadataHandle instances in the active MetadataStoreManager Should be called before each query to make sure the aggregated stats are cleared
Python API Wrapper for disabling Xskipper in the given SparkSession
Python API Wrapper for disabling Xskipper in the given SparkSession
SparkSession object
Python API Wrapper for enabling Xskipper in the given SparkSession
Python API Wrapper for enabling Xskipper in the given SparkSession
SparkSession object
Retrieves the value associated with the given key in the configuration
Retrieves the value associated with the given key in the configuration
the key to lookup
the value associated with the key or null if the key doesn't exist (null is returned so this function can be used in the python module)
Returns a map of all configurations currently set
Gets the aggregated latest query skipping stats for all active MetadataHandle instances in the current default MetadataStoreManager.
Gets the aggregated latest query skipping stats for all active MetadataHandle instances in the current default MetadataStoreManager. In order to get reliable results it is assumed that either clearStats or clearActiveMetadataHandles was called before running the query.
This is needed since the way we aggregate the skipping stats is by going over all active
MetadataHandles of the MetadataStoreManager and aggregating their stats.
When running multiple queries there could be a scenario in which the first query used
dataset a
and the second query didn't use it, therefore, when calling aggregatedStats
for the second query the MetadataHandle for dataset a
will be present
as an active MetadataHandle therefore we need its stats to be cleared.
In case the API was called on a query which didn't involve any index or the API was called without running a query the returned DataFrame structure is - status, reason with status=FAILED In case the query cannot be skipped because one of the following: 1. No dataset in the query has no indexed files 2. No query to the metadata store can be generated - can be due to a predicate that can not be used in skipping (or due to missing metadata filter) or due to failure to translate the abstract query. the returned DataFrame structure is: status, isSkippable, skipped_Bytes, skipped_Objs, total_Bytes, total_Objs with status=SUCCESS, isSkippable=false and all other values are -1 Otherwise the DataFrame structure is the same as above with isSkippable=true and the relevant stats
a spark session to construct the dataframe with the latest query stats
a DataFrame object containing information about latest query stats
Python API Wrapper for checking if Xskipper is enabled
Python API Wrapper for checking if Xskipper is enabled
SparkSession object
true if the Xskipper is enabled for the current SparkSession
Returns information about the indexed datasets
Returns information about the indexed datasets
a DataFrame object containing information about the indexed datasets under the configured base path
Reset all xskipper settings by: 1.
Reset all xskipper settings by: 1. disables filtering 2. clear all MetadataHandle in the default MetadataStoreManager 3. reset the JVM wide configuration
the spark session to remove the rule from
Sets a specific key in the JVM wide configuration
Sets a specific key in the JVM wide configuration
the key to set
the value associated with the key
Updates JVM wide xskipper parameters (Only given parameters will be updated)
Updates JVM wide xskipper parameters (Only given parameters will be updated)
a map of parameters to be set
Removes a key from the configuration
Removes a key from the configuration
the key to remove