class IndexBuilder extends Logging
Helper class for building indexes
- Alphabetic
- By Inheritance
- IndexBuilder
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
IndexBuilder(spark: SparkSession, uri: String, xskipper: Xskipper)
- spark
org.apache.spark.sql.SparkSession object
- uri
the URI of the dataset / the identifier of the table on which the index is defined
- xskipper
xskipper the Xskipper instance associated with this IndexBuilder
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
addBloomFilterIndex(col: String, fpp: Double, keyMetadata: String): IndexBuilder
Adds a BloomFilter index for the given column
Adds a BloomFilter index for the given column
- col
the column to add the index on
- fpp
the false positive rate to use
- keyMetadata
the key metadata to be used
-
def
addBloomFilterIndex(col: String, fpp: Double = ..., ndv: Long = ...): IndexBuilder
Adds a BloomFilter index for the given column
Adds a BloomFilter index for the given column
- col
the column to add the index on
- fpp
the false positive rate to use
- ndv
the expected number of distinct values in the bloom filter
-
def
addBloomFilterIndex(col: String, keyMetadata: String): IndexBuilder
Adds a BloomFilter index for the given column
Adds a BloomFilter index for the given column
- col
the column to add the index on
- keyMetadata
the key metadata to be used
-
def
addBloomFilterIndex(col: String): IndexBuilder
Adds a BloomFilter index for the given column
Adds a BloomFilter index for the given column
- col
the column to add the index on
-
def
addCustomIndex(index: String, cols: Array[String], params: Map[String, String]): IndexBuilder
Adds a custom index (Overload for python)
Adds a custom index (Overload for python)
- index
the index name
- cols
the sequence of columns
- params
the index instance to add
-
def
addCustomIndex(index: String, cols: Array[String], params: Map[String, String], keyMetadata: String): IndexBuilder
Adds a custom index (Overload for python)
Adds a custom index (Overload for python)
- index
the index name
- cols
the sequence of columns
- params
the index instance to add
- keyMetadata
the key metadata to be used
-
def
addCustomIndex(index: Index): IndexBuilder
Adds a custoom index
Adds a custoom index
- index
the index instance to add
-
def
addMinMaxIndex(col: String, keyMetadata: String): IndexBuilder
Adds a MinMax index for the given column
Adds a MinMax index for the given column
- col
the column to add the index on
- keyMetadata
the key metadata to be used
-
def
addMinMaxIndex(col: String): IndexBuilder
Adds a MinMax index for the given column
Adds a MinMax index for the given column
- col
the column to add the index on
-
def
addValueListIndex(col: String, keyMetadata: String): IndexBuilder
Adds a ValueList index for the given column
Adds a ValueList index for the given column
- col
the column to add the index on
- keyMetadata
the key metadata to be used
-
def
addValueListIndex(col: String): IndexBuilder
Adds a ValueList index for the given column
Adds a ValueList index for the given column
- col
the column to add the index on
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
build(): DataFrame
Build index operation for table URI It is assumed that the URI that was used in Xskipper definition is the identifier of a table (<db>.
)
Build index operation for table URI It is assumed that the URI that was used in Xskipper definition is the identifier of a table (<db>.
)
- returns
a DataFrame indicating if the operation succeeded or not
- def build(reader: DataFrameReader): DataFrame
Build index operation for non table URI
Build index operation for non table URI
- reader
a DataFrameReader instance to enable reading the URI as a DataFrame Note: The reader is assumed to have all of the parameters configured.
reader.load(Seq(<path>))
will be used by the indexing code to read each object separately- returns
a DataFrame indicating if the operation succeeded or not
- def clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
- def createOrRefreshExistingIndex(df: DataFrame, indexes: Seq[Index], isRefresh: Boolean): DataFrame
Creates or refresh an existing index by the DataFrame of the data to be indexed (assumed to be comprised of objects) This method first collects the objects that are already indexed and then indexes only the non indexed objects
Creates or refresh an existing index by the DataFrame of the data to be indexed (assumed to be comprised of objects) This method first collects the objects that are already indexed and then indexes only the non indexed objects
- df
the DataFrame to be indexed - can be either a dataset created by SparkSession.read on some hadoop file system path or a hive table on top of some hadoop file system
- indexes
a sequence of Index that will be applied on the DataFrame
- isRefresh
whehther or not this is a refresh operation. this is only required because in case of refresh we ignore index stats (instead of initializing them)
- returns
a DataFrame of the format status, #indexedFiles, #removedFiles
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
- final def getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val indexes: ArrayBuffer[Index]
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val metadataProcessor: MetadataProcessor
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
Inherited from Logging
Inherited from AnyRef
Inherited from Any
Ungrouped