Lessons

Learn Pandas

Pandas Read JSON

Working with JSON data is a common task in data science, machine learning, and software engineering. Whether you're receiving structured data from a web API or loading it from a configuration file, converting JSON into a structured format like a Pandas DataFrame is often a critical first step. In this article, you'll learn how to use Pandas read_json() and other tools to handle both simple and nested JSON data effectively in Python.

What is JSON?

JSON (JavaScript Object Notation) is a lightweight and widely-used data interchange format. It represents data using key-value pairs and supports nested structures like arrays and objects. JSON is language-independent and is commonly used for exchanging data between web servers and clients, or between different software applications.

For example:

python
1
2
3
4
5
{
  "name": "John",
  "age": 30,
  "city": "New York"
}

JSON to a Pandas DataFrame

To convert JSON data into a Pandas DataFrame, you can use the read_json() function. It accepts JSON strings, file paths, or URLs and converts them into DataFrame objects.

Example: Reading Simple JSON

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pandas as pd
import json

json_data = {
    "name": ["John", "Jane", "Bob"],
    "age": [25, 30, 35],
    "city": ["New York", "San Francisco", "Chicago"]
}

# Convert dictionary to JSON string
json_string = json.dumps(json_data)

# Convert JSON string to DataFrame
df = pd.read_json(json_string)

print(df)

Output

1
2
3
4
   name  age           city
0  John   25       New York
1  Jane   30  San Francisco
2   Bob   35        Chicago

Handling Nested JSON Data in Pandas

JSON often comes with nested structures such as dictionaries within dictionaries or lists of dictionaries. Pandas can still read these using read_json(), but the nested parts will appear as raw Python dictionaries inside columns.

Example: Nested JSON

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
json_data = {
    "name": ["John", "Jane", "Bob"],
    "age": [25, 30, 35],
    "city": ["New York", "San Francisco", "Chicago"],
    "children": [
        {"child_name": "Mary", "child_age": 5},
        {"child_name": "Tom", "child_age": 3},
        {"child_name": "Kate", "child_age": 2}
    ]
}

json_string = json.dumps(json_data)
df = pd.read_json(json_string)

print(df)

Output

1
2
3
4
   name  age           city                                children
0  John   25       New York  {'child_name': 'Mary', 'child_age': 5}
1  Jane   30  San Francisco   {'child_name': 'Tom', 'child_age': 3}
2   Bob   35        Chicago  {'child_name': 'Kate', 'child_age': 2}

Nested JSON Flattening Data with json_normalize()

To flatten nested JSON (like the children column above), use pandas.json_normalize() to convert nested structures into separate columns.

Example: Flatten Nested JSON

python
1
2
3
4
5
6
7
8
9
from pandas import json_normalize

# Normalize the 'children' column
children_df = json_normalize(df['children'])

# Combine the new DataFrame with the original
df_flattened = pd.concat([df, children_df], axis=1).drop('children', axis=1)

print(df_flattened)

Output

1
2
3
4
   name  age           city child_name  child_age
0  John   25       New York       Mary          5
1  Jane   30  San Francisco        Tom          3
2   Bob   35        Chicago       Kate          2

Pros and Cons of Converting JSON to Pandas

Pros

  • Easy and Efficient: read_json() simplifies conversion to a DataFrame.
  • Flexible Input: Supports strings, file paths, and URLs.
  • Nested Structure Support: Handles deeply nested data with additional tools like json_normalize.
  • Seamless Integration: Works well with other Python libraries like NumPy and Matplotlib.

Cons

  • Limited Complex Schema Support: May struggle with irregular or deeply nested schemas.
  • Memory Usage: Entire dataset is loaded into memory, which can be a concern for large files.
  • Dependency on Extra Tools: Flattening may require additional handling with external utilities.

Error Handling Tips

When working with JSON data in Pandas, consider the following common pitfalls:

  • Invalid JSON Format: Always validate your JSON before using read_json().
  • Missing Keys: Handle optional or missing fields carefully to avoid NaN issues.
  • Memory Constraints: Use chunking or filtering if dealing with large files.
  • URL Errors: When reading JSON from URLs, add exception handling for connection failures.
  • Flattening Errors: Ensure uniform structure in nested data to prevent inconsistent DataFrames.

Real-World Use Cases

You might convert JSON to DataFrames when:

  • Consuming API responses (e.g., from Twitter, GitHub, or weather APIs)
  • Parsing JSON configuration files or logs
  • Cleaning data from web scraping results
  • Preparing datasets for machine learning pipelines

Conclusion

Converting JSON data to a Pandas DataFrame is a straightforward process with the read_json() function. When dealing with nested structures, tools like json_normalize() provide powerful ways to flatten and prepare your data for analysis. Whether you're handling API responses or working with local files, mastering JSON in Pandas is essential for any data professional working with Python.

Frequently Asked Questions