JSON to Parquet
Convert a JSON array to a standards-compliant CSV file optimised for Parquet ingestion pipelines — choose your field delimiter and toggle the header row on or off to match your target schema. The output downloads as a .csv file you can feed straight to pandas read_csv or DuckDB read_csv_auto and write to Parquet in one line. The entire conversion happens in your browser with no data sent to any server.
JSON
Delimiter
Options
Why CSV instead of Parquet?
Parquet is a binary columnar format that cannot be generated client-side in the browser. Download the CSV below and convert it to Parquet in one line:
# Python / pandas
import pandas as pd
pd.read_csv("data.csv").to_parquet("data.parquet")
# DuckDB
COPY (SELECT * FROM read_csv_auto('data.csv')) TO 'data.parquet' (FORMAT PARQUET);CSV (Parquet-ready)
What is JSON to Parquet-Ready CSV Converter?
Parquet is the storage format of the modern data lake — columnar, compressed, and orders of magnitude more efficient to query than CSV. Getting your JSON data into Parquet typically requires pyarrow, DuckDB, or Apache Spark — tools that are unavailable in a browser context. This tool handles the intermediate step: it converts a JSON array into a properly typed, flat CSV that is optimised for Parquet ingestion, with inferred column types (integer, float, boolean, string, timestamp) and optional reconstruction of nested JSON structures from dot-notation column names. Feed the output CSV directly to pd.read_csv().to_parquet() in Python or to DuckDB's COPY ... TO ... FORMAT PARQUET command. The type annotations in the CSV header allow Parquet writers to assign the correct physical types rather than defaulting everything to string.
How to Use
- 1
Paste Your JSON Array
Paste a JSON array of objects representing the dataset. Each object becomes a Parquet row; keys become column names. The tool generates a Parquet-compatible CSV with inferred column types.
- 2
Configure Column Types
Review the auto-inferred column types (integer, float, string, boolean, timestamp). Override types for specific columns if needed — particularly for columns that look numeric but should be treated as strings (e.g., ZIP codes, phone numbers).
- 3
Generate Parquet-Ready Output
Click "Convert". The tool produces a typed CSV with a schema header that can be loaded directly into DuckDB, pandas read_parquet workflow, or AWS Glue as an intermediate format.
- 4
Load into Your Parquet Pipeline
Download the CSV and convert to binary Parquet using: DuckDB — COPY (SELECT * FROM read_csv_auto("file.csv")) TO "file.parquet" (FORMAT PARQUET); or pandas — pd.read_csv("file.csv").to_parquet("file.parquet").
Common Use Cases
Data Lake Ingestion
Convert JSON API exports or log files into Parquet-compatible CSV as an intermediate step for ingestion into AWS S3, Azure Data Lake, or Google Cloud Storage data lake storage layers.
Analytics Query Optimisation
Parquet's columnar storage format dramatically reduces query costs in BigQuery, Athena, and Redshift Spectrum. Convert JSON data to Parquet-ready CSV to benefit from columnar scan performance.
Apache Spark & Pandas Input
Spark and pandas both read Parquet natively. Convert JSON datasets to Parquet-ready format to take advantage of schema enforcement, predicate pushdown, and efficient columnar reads.
Long-Term Data Archiving
Parquet's efficient compression and self-describing schema make it ideal for archiving large JSON datasets. Convert JSON exports to Parquet format to reduce storage costs by 4-10× compared to raw JSON.
Conversion Examples
JSON Array → Parquet-Ready CSV
JSON data is normalised into typed columnar CSV ready for Parquet conversion.
Input JSON
[
{"id": 1, "sensor": "temp_A", "reading": 23.5, "ts": "2024-01-01T00:00:00Z"},
{"id": 2, "sensor": "temp_B", "reading": 24.1, "ts": "2024-01-01T00:01:00Z"}
]Output CSV
id,sensor,reading,ts 1,temp_A,23.5,2024-01-01T00:00:00Z 2,temp_B,24.1,2024-01-01T00:01:00Z