Parquet Metadata Reader
View the metadata of your Parquet files in one click. No server-side processing, everything happens in your browser. To view the Parquet Schema, check out our Parquet Schema Reader tool.
Understanding Parquet File Structure and Metadata
Parquet files use a hierarchical structure that optimizes both storage and query performance. Understanding this structure and its metadata is crucial for efficient data processing.
Row Groups
Horizontal partitions of data, each containing a subset of rows from the file.
Column Chunks
Data for specific columns within each row group, optimized for columnar access.
Pages
Smallest storage units in Parquet, containing encoded data values within column chunks.
Parquet Metadata Levels
File Metadata
Overall file information, including schema, number of rows, and row group locations.
Row Group Metadata
Information about each row group, such as the number of rows and column chunk locations.
Column Chunk Metadata
Details about each column chunk, including data type, encoding, compression, and statistics.
Page Header Metadata
Information about individual pages within column chunks, such as encoding and compression details.
Key Metadata Fields
file_name
Name of the Parquet file
row_group_id
Unique identifier for each row group
row_group_num_rows
Number of rows in each row group
column_id
Identifier for each column within a row group
path_in_schema
Column name in the file schema
type
Data type of the column
stats_min, stats_max
Minimum and maximum values in the column chunk
compression
Compression method used for the column chunk
encodings
Encoding methods used for the column chunk data
total_compressed_size
Size of the compressed column chunk
total_uncompressed_size
Size of the uncompressed column chunk
Benefits of Understanding Parquet Metadata
Optimize query performance using statistics
Implement efficient data skipping
Understand data distribution
Plan data processing strategies
Troubleshoot performance issues