Class

io.xskipper.index.execution

IndexBuilder

Related Doc: package execution

Permalink

class IndexBuilder extends Logging

Helper class for building indexes

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. IndexBuilder
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new IndexBuilder(spark: SparkSession, uri: String, xskipper: Xskipper)

    Permalink

    spark

    org.apache.spark.sql.SparkSession object

    uri

    the URI of the dataset / the identifier of the table on which the index is defined

    xskipper

    xskipper the Xskipper instance associated with this IndexBuilder

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addBloomFilterIndex(col: String, fpp: Double, keyMetadata: String): IndexBuilder

    Permalink

    Adds a BloomFilter index for the given column

    Adds a BloomFilter index for the given column

    col

    the column to add the index on

    fpp

    the false positive rate to use

    keyMetadata

    the key metadata to be used

  5. def addBloomFilterIndex(col: String, fpp: Double = ..., ndv: Long = ...): IndexBuilder

    Permalink

    Adds a BloomFilter index for the given column

    Adds a BloomFilter index for the given column

    col

    the column to add the index on

    fpp

    the false positive rate to use

    ndv

    the expected number of distinct values in the bloom filter

  6. def addBloomFilterIndex(col: String, keyMetadata: String): IndexBuilder

    Permalink

    Adds a BloomFilter index for the given column

    Adds a BloomFilter index for the given column

    col

    the column to add the index on

    keyMetadata

    the key metadata to be used

  7. def addBloomFilterIndex(col: String): IndexBuilder

    Permalink

    Adds a BloomFilter index for the given column

    Adds a BloomFilter index for the given column

    col

    the column to add the index on

  8. def addCustomIndex(index: String, cols: Array[String], params: Map[String, String]): IndexBuilder

    Permalink

    Adds a custom index (Overload for python)

    Adds a custom index (Overload for python)

    index

    the index name

    cols

    the sequence of columns

    params

    the index instance to add

  9. def addCustomIndex(index: String, cols: Array[String], params: Map[String, String], keyMetadata: String): IndexBuilder

    Permalink

    Adds a custom index (Overload for python)

    Adds a custom index (Overload for python)

    index

    the index name

    cols

    the sequence of columns

    params

    the index instance to add

    keyMetadata

    the key metadata to be used

  10. def addCustomIndex(index: Index): IndexBuilder

    Permalink

    Adds a custoom index

    Adds a custoom index

    index

    the index instance to add

  11. def addMinMaxIndex(col: String, keyMetadata: String): IndexBuilder

    Permalink

    Adds a MinMax index for the given column

    Adds a MinMax index for the given column

    col

    the column to add the index on

    keyMetadata

    the key metadata to be used

  12. def addMinMaxIndex(col: String): IndexBuilder

    Permalink

    Adds a MinMax index for the given column

    Adds a MinMax index for the given column

    col

    the column to add the index on

  13. def addValueListIndex(col: String, keyMetadata: String): IndexBuilder

    Permalink

    Adds a ValueList index for the given column

    Adds a ValueList index for the given column

    col

    the column to add the index on

    keyMetadata

    the key metadata to be used

  14. def addValueListIndex(col: String): IndexBuilder

    Permalink

    Adds a ValueList index for the given column

    Adds a ValueList index for the given column

    col

    the column to add the index on

  15. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  16. def build(): DataFrame

    Permalink

    Build index operation for table URI It is assumed that the URI that was used in Xskipper definition is the identifier of a table (<db>.

    )

    Build index operation for table URI It is assumed that the URI that was used in Xskipper definition is the identifier of a table (<db>.

    )

    returns

    a DataFrame indicating if the operation succeeded or not

  17. def build(reader: DataFrameReader): DataFrame

    Permalink

    Build index operation for non table URI

    Build index operation for non table URI

    reader

    a DataFrameReader instance to enable reading the URI as a DataFrame Note: The reader is assumed to have all of the parameters configured. reader.load(Seq(<path>)) will be used by the indexing code to read each object separately

    returns

    a DataFrame indicating if the operation succeeded or not

  18. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. def createOrRefreshExistingIndex(df: DataFrame, indexes: Seq[Index], isRefresh: Boolean): DataFrame

    Permalink

    Creates or refresh an existing index by the DataFrame of the data to be indexed (assumed to be comprised of objects) This method first collects the objects that are already indexed and then indexes only the non indexed objects

    Creates or refresh an existing index by the DataFrame of the data to be indexed (assumed to be comprised of objects) This method first collects the objects that are already indexed and then indexes only the non indexed objects

    df

    the DataFrame to be indexed - can be either a dataset created by SparkSession.read on some hadoop file system path or a hive table on top of some hadoop file system

    indexes

    a sequence of Index that will be applied on the DataFrame

    isRefresh

    whehther or not this is a refresh operation. this is only required because in case of refresh we ignore index stats (instead of initializing them)

    returns

    a DataFrame of the format status, #indexedFiles, #removedFiles

  20. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  22. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  25. val indexes: ArrayBuffer[Index]

    Permalink
  26. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  29. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  37. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  38. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  39. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  40. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. val metadataProcessor: MetadataProcessor

    Permalink
  43. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  44. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  45. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  46. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  47. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  48. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. Inherited from Logging

    Inherited from AnyRef

    Inherited from Any

    Ungrouped