Parquet Metadata Reader

View the metadata of your Parquet files in one click. No server-side processing, everything happens in your browser. To view the Parquet Schema, check out our Parquet Schema Reader tool.

Understanding Parquet File Structure and Metadata

Parquet files use a hierarchical structure that optimizes both storage and query performance. Understanding this structure and its metadata is crucial for efficient data processing.

Row Groups

Horizontal partitions of data, each containing a subset of rows from the file.

Column Chunks

Data for specific columns within each row group, optimized for columnar access.

Pages

Smallest storage units in Parquet, containing encoded data values within column chunks.

Parquet Metadata Levels

File Metadata

Overall file information, including schema, number of rows, and row group locations.

Row Group Metadata

Information about each row group, such as the number of rows and column chunk locations.

Column Chunk Metadata

Details about each column chunk, including data type, encoding, compression, and statistics.

Page Header Metadata

Information about individual pages within column chunks, such as encoding and compression details.

Key Metadata Fields

file_name

Name of the Parquet file

row_group_id

Unique identifier for each row group

row_group_num_rows

Number of rows in each row group

column_id

Identifier for each column within a row group

path_in_schema

Column name in the file schema

type

Data type of the column

stats_min, stats_max

Minimum and maximum values in the column chunk

compression

Compression method used for the column chunk

encodings

Encoding methods used for the column chunk data

total_compressed_size

Size of the compressed column chunk

total_uncompressed_size

Size of the uncompressed column chunk

Benefits of Understanding Parquet Metadata

Optimize query performance using statistics

Implement efficient data skipping

Understand data distribution

Plan data processing strategies

Troubleshoot performance issues