Parquet files can be reversed in Metadata like any other technology.
This article is a quick guide explaining how to work with Parquet Files.
Parquet Component must have been installed before following this guide, refer to this article for the download and installation.
Metadata creation and reverse
Reversing a Parquet file can be performed through the following steps.
First, create a new Metadata and choose Parquet Schemas.
Create a new "Schema" node in it:
Then, specify the file path to the Parquet File to reverse:
Finally, perform the reverse by doing a "right click > Reverse" on the schema node:
That's it, the Parquet file is now reversed:
Additional Information about Parquet Metadata
Parquet Metadata has the following structure.
Note that you can reverse multiple Parquet schemas by creating multiple "schema" nodes inside the Metadata.
Parquet root node
The "Parquet" root node is just a container to store many Parquet schemas in one place.
You can define a user-friendly name to represent this Metadata, and you must define a Parquet Module to use.
Parquet Schema Node
"Schema" nodes are created under the "Parquet" root node.
A schema node is the entry point to reverse and use Parquet schemas in the Metadata.
When you want to reverse the schema from an existing file, you must fill the file path to the Parquet file to reverse.
Note that you can also define all the attributes manually, if needed.
Parquet Fields nodes
Under a schema node, the fields will be defined.
They represent all the fields of the Parquet file structure.
Field logical Type
On a field, you can find the reference to the related logical types, regarding predefined list.
Notes:
- Each "time" type is separated, such as nano/micro precision or UTC reference.
- The same behaviour can also be noticed for signed or unsigned integers.
Field Physical Type
On a field, you can find the reference to the related physical type, regarding predefined list
Note that the "Group" primitive type is specific Stambia one.
It is automatically set when doing a reverse.
It is used to indicate this node contains a sub-node.