object ParquetUtils extends Logging
- Alphabetic
- By Inheritance
- ParquetUtils
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
def
getColumnName(idx: Index, version: Long = ...): String
returns the column name for the specified index and version
returns the column name for the specified index and version
- idx
the index for which the column name needs to created
- version
version number, the metadata spec of which will determine the column name
- def getColumnNameForCols(cols: Seq[String], idxName: String, version: Long = ...): String
-
def
getIndexSchema(index: Index, translators: Seq[ParquetMetaDataTranslator]): Option[DataType]
Given an index and schema translator tries searching for the first available translation.
Given an index and schema translator tries searching for the first available translation. to a native DataFrame schema. if no translation is found return None
- index
the index to translate
- translators
the list of available translators
- returns
the DataType associated with the translation
- def getMdVersionStatus(version: Long): MetadataVersionStatus.MetadataVersionStatus
- def getMdVersionStatusFromDf(df: DataFrame): MetadataVersionStatus.MetadataVersionStatus
-
def
getVersion(schema: StructType): Long
retrieves the version number from a metadata DataFrame Schema, returns 0 if the version is not explicitly defined (files without version number are implicitly declared version 0).
retrieves the version number from a metadata DataFrame Schema, returns 0 if the version is not explicitly defined (files without version number are implicitly declared version 0). the function assumes the
obj_name
column exists in the schema.- schema
- the schema of the metadata df
-
def
getVersion(df: DataFrame): Long
retrieves the version number from a Metadata DataFrame, returns 0 if the version is not explicitly defined (files without version number are implicitly declared version 0).
retrieves the version number from a Metadata DataFrame, returns 0 if the version is not explicitly defined (files without version number are implicitly declared version 0). the function assumes the
obj_name
column exists in the schema. -
def
isPmeAvailable(): Boolean
checks if Parquet Modular Encryption (PME) is available the check is performed by verifying that org.apache.parquet.crypto.AesCipher is available (will be available if and only if PME is loaded)
checks if Parquet Modular Encryption (PME) is available the check is performed by verifying that org.apache.parquet.crypto.AesCipher is available (will be available if and only if PME is loaded)
- returns
true if PME is loaded, else false
- def mdFileToDF(session: SparkSession, mdPath: String): DataFrame