Packages

object Xskipper

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Xskipper
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def clearStats(): Unit

    Clears the stats for all active MetadataHandle instances in the active MetadataStoreManager Should be called before each query to make sure the aggregated stats are cleared

  2. def disable(sparkSession: SparkSession): Unit

    Python API Wrapper for disabling Xskipper in the given SparkSession

    Python API Wrapper for disabling Xskipper in the given SparkSession

    sparkSession

    SparkSession object

  3. def enable(sparkSession: SparkSession): Unit

    Python API Wrapper for enabling Xskipper in the given SparkSession

    Python API Wrapper for enabling Xskipper in the given SparkSession

    sparkSession

    SparkSession object

  4. def get(key: String): String

    Retrieves the value associated with the given key in the configuration

    Retrieves the value associated with the given key in the configuration

    key

    the key to lookup

    returns

    the value associated with the key or null if the key doesn't exist (null is returned so this function can be used in the python module)

  5. def getConf(): Map[String, String]

    Returns a map of all configurations currently set

  6. def getLatestQueryAggregatedStats(sparkSession: SparkSession): DataFrame

    Gets the aggregated latest query skipping stats for all active MetadataHandle instances in the current default MetadataStoreManager.

    Gets the aggregated latest query skipping stats for all active MetadataHandle instances in the current default MetadataStoreManager. In order to get reliable results it is assumed that either clearStats or clearActiveMetadataHandles was called before running the query.

    This is needed since the way we aggregate the skipping stats is by going over all active MetadataHandles of the MetadataStoreManager and aggregating their stats. When running multiple queries there could be a scenario in which the first query used dataset a and the second query didn't use it, therefore, when calling aggregatedStats for the second query the MetadataHandle for dataset a will be present as an active MetadataHandle therefore we need its stats to be cleared.

    In case the API was called on a query which didn't involve any index or the API was called without running a query the returned DataFrame structure is - status, reason with status=FAILED In case the query cannot be skipped because one of the following: 1. No dataset in the query has no indexed files 2. No query to the metadata store can be generated - can be due to a predicate that can not be used in skipping (or due to missing metadata filter) or due to failure to translate the abstract query. the returned DataFrame structure is: status, isSkippable, skipped_Bytes, skipped_Objs, total_Bytes, total_Objs with status=SUCCESS, isSkippable=false and all other values are -1 Otherwise the DataFrame structure is the same as above with isSkippable=true and the relevant stats

    sparkSession

    a spark session to construct the dataframe with the latest query stats

    returns

    a DataFrame object containing information about latest query stats

  7. def isEnabled(sparkSession: SparkSession): Boolean

    Python API Wrapper for checking if Xskipper is enabled

    Python API Wrapper for checking if Xskipper is enabled

    sparkSession

    SparkSession object

    returns

    true if the Xskipper is enabled for the current SparkSession

  8. def listIndexes(sparkSession: SparkSession): DataFrame

    Returns information about the indexed datasets

    Returns information about the indexed datasets

    returns

    a DataFrame object containing information about the indexed datasets under the configured base path

  9. def reset(sparkSession: SparkSession): Unit

    Reset all xskipper settings by: 1.

    Reset all xskipper settings by: 1. disables filtering 2. clear all MetadataHandle in the default MetadataStoreManager 3. reset the JVM wide configuration

    sparkSession

    the spark session to remove the rule from

  10. def set(key: String, value: String): Unit

    Sets a specific key in the JVM wide configuration

    Sets a specific key in the JVM wide configuration

    key

    the key to set

    value

    the value associated with the key

  11. def setConf(params: Map[String, String]): Unit
  12. def setConf(params: Map[String, String]): Unit

    Updates JVM wide xskipper parameters (Only given parameters will be updated)

    Updates JVM wide xskipper parameters (Only given parameters will be updated)

    params

    a map of parameters to be set

  13. def unset(key: String): Unit

    Removes a key from the configuration

    Removes a key from the configuration

    key

    the key to remove