o

io.xskipper.metadatastore.parquet

ParquetMetadataStoreManager

object ParquetMetadataStoreManager extends MetadataStoreManager with Logging

A MetadataBackend backed by Parquet objects

Linear Supertypes
Logging, MetadataStoreManager, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ParquetMetadataStoreManager
  2. Logging
  3. MetadataStoreManager
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def clearActiveMetadataHandles(): Unit

    Clear all active MetadataHandles managed by this MetadataStoreManager

    Clear all active MetadataHandles managed by this MetadataStoreManager

    Definition Classes
    MetadataStoreManager
  2. def clearStats(): Unit

    Clears the stats for all active MetadataHandle instances.

    Clears the stats for all active MetadataHandle instances. Should be called before each query (or call clearActiveMetadataHandles) to make sure the aggregated stats are cleared

    Definition Classes
    MetadataStoreManager
  3. def createMetadataHandle(sparkSession: SparkSession, tableIdentifier: String): MetadataHandle

    Creates a new instance MetadataHandle Used by getOrCreateMetadataHandle in order to create new instances

    Creates a new instance MetadataHandle Used by getOrCreateMetadataHandle in order to create new instances

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  4. def getActiveMetadataHandles(): Map[String, MetadataHandle]

    Returns all active MetadataHandles managed by this MetadataStoreManager

    Returns all active MetadataHandles managed by this MetadataStoreManager

    Definition Classes
    MetadataStoreManager
  5. def getDataSkippingFileFilterParams(tid: String, sparkSession: SparkSession, fileIndex: FileIndex): Map[String, String]

    Returns a map of parameters to be set by the io.xskipper.search.DataSkippingFileFilter on the MetadataHandle when using this MetadataStoreManager.

    Returns a map of parameters to be set by the io.xskipper.search.DataSkippingFileFilter on the MetadataHandle when using this MetadataStoreManager. This map can be used to pass specific parameters depending on the type of FileIndex that is being replaced. For example - in ParquetMetadataStoreManager we can specify that the metadata location should be inferred from the table/default db properties

    tid

    the table identifier for which the DataSkippingFileFilter will be created

    sparkSession

    the sparks session

    fileIndex

    the fileIndex that will be replaced by io.xskipper.search.DataSkippingFileFilter

    returns

    a map of parameters to be set on the MetadataStore when the DataSkippingFileFilter is created

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  6. def getLatestQueryAggregatedStats(): QueryIndexStatsResult

    Gets the aggregated latest query skipping stats for all active MetadataHandle instances.

    Gets the aggregated latest query skipping stats for all active MetadataHandle instances. In order to get reliable results it is assumed that either clearStats or clearActiveMetadataHandles was called before running the query.

    This is needed since the way we aggregate the skipping stats is by going over all active MetadataHandles of the MetadataStoreManager and aggregating their stats. When running multiple queries there could be a scenario in which the first query used dataset a and the second query didn't use it, therefore, when calling aggregatedStats for the second query the MetadataHandle for dataset a will be present as an active MetadataHandle therefore we need its stats to be cleared.

    returns

    QueryIndexStatsResult instance with the latest query aggregated stats

    Definition Classes
    MetadataStoreManager
  7. def getOrCreateMetadataHandle(sparkSession: SparkSession, tableIdentifier: String): MetadataHandle

    Gets an existing MetadataHandle or, if there is no existing one, creates a new one.

    Gets an existing MetadataHandle or, if there is no existing one, creates a new one.

    sparkSession

    the sparks session

    tableIdentifier

    the table identifier of the dataset for which the MetadataHandle instance is created

    returns

    a new instance of the MetadataHandle associated with this backend

    Definition Classes
    MetadataStoreManager
  8. def getParams(sparkSession: SparkSession): Map[String, String]

    returns

    a map representing relevant MetadataStoreManager parameters

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  9. def getType: MetadataStoreManagerType

    returns

    the MetadataStoreManagerType associated with this backend

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  10. def init(): Unit

    A unit function for setting up the MetadataStoreManager.

    A unit function for setting up the MetadataStoreManager. The setup is called once when the MetadataStoreManager is registered

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  11. def listIndexedDatasets(sparkSession: SparkSession): Map[String, (MetadataVersionStatus.MetadataVersionStatus, Seq[Index])]

    returns

    Map of indexed datasets to their respective md version status and indexes

    Definition Classes
    ParquetMetadataStoreManagerMetadataStoreManager
  12. val metadataHandlesInstanceMap: ConcurrentHashMap[String, MetadataHandle]
    Definition Classes
    MetadataStoreManager