class Xskipper extends AnyRef
Main class for programmatically interacting with Xskipper
- Alphabetic
- By Inheritance
- Xskipper
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManagerClassName: String)
Additional constructor for pySpark API
Additional constructor for pySpark API
- sparkSession
sparkSession instance for processing
- uri
the URI of the dataset / the identifier of the table on which the index is defined
- metadataStoreManagerClassName
fully qualified name of MetadataStoreManager to be used
- Exceptions thrown
XskipperException
if the metadataStoreManagerClassName is invalid
-
new
Xskipper(sparkSession: SparkSession, uri: String, metadataStoreManager: MetadataStoreManager = ParquetMetadataStoreManager)
- sparkSession
sparkSession instance for processing
- uri
the URI of the dataset / the identifier of the table on which the index is defined
- metadataStoreManager
The MetadataStoreManager to use
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
describeIndex(): DataFrame
Describes the indexes on the URI (for table URI)
Describes the indexes on the URI (for table URI)
- returns
DataFrame object containing information about the index
- Exceptions thrown
XskipperException
if the URI is not indexed
-
def
describeIndex(reader: DataFrameReader): DataFrame
Describes the indexes on the URI (for non table URI)
Describes the indexes on the URI (for non table URI)
- reader
a DataFrameReader instance to enable reading the URI as a DataFrame
- returns
DataFrame object containing information about the index
- Exceptions thrown
XskipperException
if the URI is not indexed
-
def
dropIndex(): Unit
Deletes the index
Deletes the index
- Exceptions thrown
XskipperException
if index cannot be removed
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generateDescribeIndex(df: DataFrame): DataFrame
return meta index info like indexing scheme and skipping stats
return meta index info like indexing scheme and skipping stats
- Attributes
- protected
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getLatestQueryStats(): DataFrame
Return latest query skipping statistics for this Xskipper instance
Return latest query skipping statistics for this Xskipper instance
In case the API was called on a URI without an index or the API was called without running a query the returned DataFrame structure is - status, reason with status=FAILED In case the query cannot be skipped because one of the following: 1. Dataset has no indexed files 2. No query to the metadata store can be generated - can be due to a predicate that can not be used in skipping (or maybe due to missing metadata filter) or due to failure to translate the abstract query. the returned dataframe structure is: status, isSkippable, skipped_Bytes, skipped_Objs, total_Bytes, total_Objs with status=SUCCESS, isSkippable=false and all other values are -1 Otherwise the DataFrame structure is the same as above with isSkippable=true and the relevant stats
- returns
DataFrame object containing information about latest query stats
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
indexBuilder(): IndexBuilder
Helper class for setting and building an index
-
def
isIndexed(): Boolean
Checks if the URI is indexed
Checks if the URI is indexed
- returns
true if the URI is indexed
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
refreshIndex(): DataFrame
Refresh index operation for table URI
Refresh index operation for table URI
- returns
DataFrame object containing statistics about the refresh operation
- Exceptions thrown
XskipperException
if index cannot be refreshed
-
def
refreshIndex(reader: DataFrameReader): DataFrame
Refresh index operation for non table URI
Refresh index operation for non table URI
- reader
a DataFrameReader instance to enable reading the URI as a DataFrame Note: The reader is assumed to have all of the parameters configured.
reader.load(Seq(<path>))
will be used by the indexing code to read each object separately- returns
DataFrame object containing statistics about the refresh operation
- Exceptions thrown
XskipperException
if index cannot be refreshed
- def setParams(params: Map[String, String]): Unit
-
def
setParams(params: Map[String, String]): Unit
Update instance specific MetadataHandle parameters
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val tableIdentifier: String
-
def
toString(): String
- Definition Classes
- AnyRef → Any
- val uri: String
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )