class Xskipper extends AnyRef
Main class for programmatically interacting with Xskipper
- Alphabetic
- By Inheritance
- Xskipper
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManagerClassName: String)
Additional constructor for pySpark API
Additional constructor for pySpark API
- sparkSession
sparkSession instance for processing
- uri
the URI of the dataset / the identifier of the table on which the index is defined
- metadataStoreManagerClassName
fully qualified name of MetadataStoreManager to be used
- Exceptions thrown
XskipperException
if the metadataStoreManagerClassName is invalid
-
new
Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManager: MetadataStoreManager = ParquetMetadataStoreManager)
- sparkSession
sparkSession instance for processing
- uri
the URI of the dataset / the identifier of the table on which the index is defined
- metadataStoreManager
The MetadataStoreManager to use
Value Members
-
def
describeIndex(): DataFrame
Describes the indexes on the URI (for table URI)
Describes the indexes on the URI (for table URI)
- returns
DataFrame object containing information about the index
- Exceptions thrown
XskipperException
if the URI is not indexed
-
def
describeIndex(reader: DataFrameReader): DataFrame
Describes the indexes on the URI (for non table URI)
Describes the indexes on the URI (for non table URI)
- reader
a DataFrameReader instance to enable reading the URI as a DataFrame
- returns
DataFrame object containing information about the index
- Exceptions thrown
XskipperException
if the URI is not indexed
-
def
dropIndex(): Unit
Deletes the index
Deletes the index
- Exceptions thrown
XskipperException
if index cannot be removed
-
def
getLatestQueryStats(): DataFrame
Return latest query skipping statistics for this Xskipper instance
Return latest query skipping statistics for this Xskipper instance
In case the API was called on a URI without an index or the API was called without running a query the returned DataFrame structure is - status, reason with status=FAILED In case the query cannot be skipped because one of the following: 1. Dataset has no indexed files 2. No query to the metadata store can be generated - can be due to a predicate that can not be used in skipping (or maybe due to missing metadata filter) or due to failure to translate the abstract query. the returned dataframe structure is: status, isSkippable, skipped_Bytes, skipped_Objs, total_Bytes, total_Objs with status=SUCCESS, isSkippable=false and all other values are -1 Otherwise the DataFrame structure is the same as above with isSkippable=true and the relevant stats
- returns
DataFrame object containing information about latest query stats
-
def
indexBuilder(): IndexBuilder
Helper class for setting and building an index
-
def
isIndexed(): Boolean
Checks if the URI is indexed
Checks if the URI is indexed
- returns
true if the URI is indexed
-
def
refreshIndex(): DataFrame
Refresh index operation for table URI
Refresh index operation for table URI
- returns
DataFrame object containing statistics about the refresh operation
- Exceptions thrown
XskipperException
if index cannot be refreshed
-
def
refreshIndex(reader: DataFrameReader): DataFrame
Refresh index operation for non table URI
Refresh index operation for non table URI
- reader
a DataFrameReader instance to enable reading the URI as a DataFrame Note: The reader is assumed to have all of the parameters configured.
reader.load(Seq(<path>))
will be used by the indexing code to read each object separately- returns
DataFrame object containing statistics about the refresh operation
- Exceptions thrown
XskipperException
if index cannot be refreshed
- def setParams(params: Map[String, String]): Unit
-
def
setParams(params: Map[String, String]): Unit
Update instance specific MetadataHandle parameters
- val tableIdentifier: String
- val uri: String