Xskipper

Instance Constructors

new Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManagerClassName: String)

Additional constructor for pySpark API
Additional constructor for pySpark API
sparkSession
sparkSession instance for processing
uri
the URI of the dataset / the identifier of the table on which the index is defined
metadataStoreManagerClassName
fully qualified name of MetadataStoreManager to be used

Exceptions thrown
XskipperException if the metadataStoreManagerClassName is invalid
new Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManager: MetadataStoreManager = ParquetMetadataStoreManager)

sparkSession
sparkSession instance for processing
uri
the URI of the dataset / the identifier of the table on which the index is defined
metadataStoreManager
The MetadataStoreManager to use

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def describeIndex(): DataFrame

Describes the indexes on the URI (for table URI)
Describes the indexes on the URI (for table URI)
returns
DataFrame object containing information about the index

Exceptions thrown
XskipperException if the URI is not indexed
def describeIndex(reader: DataFrameReader): DataFrame

Describes the indexes on the URI (for non table URI)
Describes the indexes on the URI (for non table URI)
reader
a DataFrameReader instance to enable reading the URI as a DataFrame
returns
DataFrame object containing information about the index

Exceptions thrown
XskipperException if the URI is not indexed
def dropIndex(): Unit

Deletes the index
Deletes the index

Exceptions thrown
XskipperException if index cannot be removed
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def generateDescribeIndex(df: DataFrame): DataFrame

return meta index info like indexing scheme and skipping stats
return meta index info like indexing scheme and skipping stats

Attributes
protected
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getLatestQueryStats(): DataFrame

Return latest query skipping statistics for this Xskipper instance
Return latest query skipping statistics for this Xskipper instance
In case the API was called on a URI without an index or the API was called without running a query the returned DataFrame structure is - status, reason with status=FAILED In case the query cannot be skipped because one of the following: 1. Dataset has no indexed files 2. No query to the metadata store can be generated - can be due to a predicate that can not be used in skipping (or maybe due to missing metadata filter) or due to failure to translate the abstract query. the returned dataframe structure is: status, isSkippable, skipped_Bytes, skipped_Objs, total_Bytes, total_Objs with status=SUCCESS, isSkippable=false and all other values are -1 Otherwise the DataFrame structure is the same as above with isSkippable=true and the relevant stats
returns
DataFrame object containing information about latest query stats
def hashCode(): Int

Definition Classes
AnyRef → Any
def indexBuilder(): IndexBuilder

Helper class for setting and building an index
def isIndexed(): Boolean

Checks if the URI is indexed
Checks if the URI is indexed
returns
true if the URI is indexed
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def refreshIndex(): DataFrame

Refresh index operation for table URI
Refresh index operation for table URI
returns
DataFrame object containing statistics about the refresh operation

Exceptions thrown
XskipperException if index cannot be refreshed
def refreshIndex(reader: DataFrameReader): DataFrame

Refresh index operation for non table URI
Refresh index operation for non table URI
reader
a DataFrameReader instance to enable reading the URI as a DataFrame Note: The reader is assumed to have all of the parameters configured. reader.load(Seq(<path>)) will be used by the indexing code to read each object separately
returns
DataFrame object containing statistics about the refresh operation

Exceptions thrown
XskipperException if index cannot be refreshed
def setParams(params: Map[String, String]): Unit
def setParams(params: Map[String, String]): Unit

Update instance specific MetadataHandle parameters
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val tableIdentifier: String
def toString(): String

Definition Classes
AnyRef → Any
val uri: String

the URI of the dataset / the identifier of the table on which the index is defined
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object Xskipper | package xskipper

class Xskipper extends AnyRef

Instance Constructors

new Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManagerClassName: String)

new Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManager: MetadataStoreManager = ParquetMetadataStoreManager)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def describeIndex(): DataFrame

def describeIndex(reader: DataFrameReader): DataFrame

def dropIndex(): Unit

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def generateDescribeIndex(df: DataFrame): DataFrame

final def getClass(): Class[_]

def getLatestQueryStats(): DataFrame

def hashCode(): Int

def indexBuilder(): IndexBuilder

def isIndexed(): Boolean

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def refreshIndex(): DataFrame

def refreshIndex(reader: DataFrameReader): DataFrame

def setParams(params: Map[String, String]): Unit

def setParams(params: Map[String, String]): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

val tableIdentifier: String

def toString(): String

val uri: String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped