Profile picture

Reading JSON Files

Last updated: November 25th, 20192019-11-25Project preview

rmotr


Reading JSON files

In this lecture we'll learn how to read JSON files (.json) into a pandas DataFrame, and how to export that DataFrame to a JSON file.

purple-divider

Hands on!

In [1]:
import pandas as pd

green-divider

The read_json method

We'll begin with the read_json method, that let us read simple JSON files into a DataFrame.

This read_json method accepts many parameters as we saw on read_csv and read_excel, such as filepath, dtype and encoding.

Full read_json documentation can be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html.

In this case we'll try to read our games.json JSON file.

This file contains records of PlayStation games in Europe with its title, price, provider and genre.

In [2]:
!head -n20 games.json
[
    {
        "title": "Call of Duty®: WWII + Destiny 2 - Lote",
        "price": "129,99 €",
        "content_rating_img": "https://cdn-a.sonyentertainmentnetwork.com/grc/images/ratings/hd/pegi/18.png",
        "release_date": "Dic 21 2018",
        "provider": "Activision Blizzard Int'l BV",
        "genre": "Shooter",
        "image": "https://store.playstation.com/store/api/chihiro/00_09_000/container/ES/es/999/EP0002-CUSA08633_00-CODWWIIANDD2BASE/1516962740000/image?w=240&h=240&bg_color=000000&opacity=100&_version=00_09_000"
    },
    {
        "title": "God of War® Digital Deluxe Edition",
        "price": "69,99 €",
        "content_rating_img": "https://cdn-a.sonyentertainmentnetwork.com/grc/images/ratings/hd/pegi/18.png",
        "release_date": "Abr 20 2018",
        "provider": "Sony Interactive Entertainment Europe",
        "genre": "Acción",
        "image": "https://store.playstation.com/store/api/chihiro/00_09_000/container/ES/es/999/EP9000-CUSA07411_00-GOW2018PREORDPSN/1517129711000/image?w=240&h=240&bg_color=000000&opacity=100&_version=00_09_000"
    },
    {
In [3]:
games = pd.read_json('games.json')
In [4]:
games.head()
Out[4]:
title price content_rating_img release_date provider genre image
0 Call of Duty®: WWII + Destiny 2 - Lote 129,99 € https://cdn-a.sonyentertainmentnetwork.com/grc... Dic 21 2018 Activision Blizzard Int'l BV Shooter https://store.playstation.com/store/api/chihir...
1 God of War® Digital Deluxe Edition 69,99 € https://cdn-a.sonyentertainmentnetwork.com/grc... Abr 20 2018 Sony Interactive Entertainment Europe Acción https://store.playstation.com/store/api/chihir...
2 Far Cry 5 69,99 € https://cdn-a.sonyentertainmentnetwork.com/grc... Mar 27 2018 UBISOFT ENTERTAINMENT SA Aventura https://store.playstation.com/store/api/chihir...
3 Far Cry 5 Edición Deluxe 79,99 € https://cdn-a.sonyentertainmentnetwork.com/grc... Mar 27 2018 UBISOFT ENTERTAINMENT SA Aventura https://store.playstation.com/store/api/chihir...
4 Far Cry 5 Edición Oro 89,99 € https://cdn-a.sonyentertainmentnetwork.com/grc... Mar 27 2018 UBISOFT ENTERTAINMENT SA Aventura https://store.playstation.com/store/api/chihir...

green-divider

 Nested JSON example

JSON documents tend not to be so straightforward to read as tables, specially when they have nested structures.

Dataset source

In [5]:
!head -n20 users.json
{
  "info": [
    {
      "id": 1,
      "name": "Leanne Graham",
      "username": "Bret",
      "email": "Sincere@april.biz",
      "address": [{
        "street": "Kulas Light",
        "suite": "Apt. 556",
        "city": "Gwenborough",
        "zipcode": "92998-3874",
        "geo": {
          "lat": "-37.3159",
          "lng": "81.1496"
        }
      }],
      "phone": "1-770-736-8031 x56442",
      "website": "hildegard.org",
      "company": {
In [6]:
df = pd.read_json('users.json')
In [7]:
df.head()
Out[7]:
info
0 {'id': 1, 'name': 'Leanne Graham', 'username':...
1 {'id': 2, 'name': 'Ervin Howell', 'username': ...
2 {'id': 3, 'name': 'Clementine Bauch', 'usernam...
3 {'id': 4, 'name': 'Patricia Lebsack', 'usernam...
4 {'id': 5, 'name': 'Chelsey Dietrich', 'usernam...

This is not the data structure we wanted, so in this cases read_json could not be the best solution.

We'll need to use the json Python module to parse our JSON file into a Python Dictionary object, to be able to index that dictionary and select nested data we want.

To do that we'll use the json.load() method, that will parse our JSON file into a Python Dictionary json_dict.

In [8]:
import json

with open('users.json') as file:
    json_dict = json.load(file)
In [9]:
json_dict
Out[9]:
{'info': [{'id': 1,
   'name': 'Leanne Graham',
   'username': 'Bret',
   'email': 'Sincere@april.biz',
   'address': [{'street': 'Kulas Light',
     'suite': 'Apt. 556',
     'city': 'Gwenborough',
     'zipcode': '92998-3874',
     'geo': {'lat': '-37.3159', 'lng': '81.1496'}}],
   'phone': '1-770-736-8031 x56442',
   'website': 'hildegard.org',
   'company': {'name': 'Romaguera-Crona',
    'catchPhrase': 'Multi-layered client-server neural-net',
    'bs': 'harness real-time e-markets'}},
  {'id': 2,
   'name': 'Ervin Howell',
   'username': 'Antonette',
   'email': 'Shanna@melissa.tv',
   'address': [{'street': 'Victor Plains',
     'suite': 'Suite 879',
     'city': 'Wisokyburgh',
     'zipcode': '90566-7771',
     'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}],
   'phone': '010-692-6593 x09125',
   'website': 'anastasia.net',
   'company': {'name': 'Deckow-Crist',
    'catchPhrase': 'Proactive didactic contingency',
    'bs': 'synergize scalable supply-chains'}},
  {'id': 3,
   'name': 'Clementine Bauch',
   'username': 'Samantha',
   'email': 'Nathan@yesenia.net',
   'address': [{'street': 'Douglas Extension',
     'suite': 'Suite 847',
     'city': 'McKenziehaven',
     'zipcode': '59590-4157',
     'geo': {'lat': '-68.6102', 'lng': '-47.0653'}}],
   'phone': '1-463-123-4447',
   'website': 'ramiro.info',
   'company': {'name': 'Romaguera-Jacobson',
    'catchPhrase': 'Face to face bifurcated interface',
    'bs': 'e-enable strategic applications'}},
  {'id': 4,
   'name': 'Patricia Lebsack',
   'username': 'Karianne',
   'email': 'Julianne.OConner@kory.org',
   'address': [{'street': 'Hoeger Mall',
     'suite': 'Apt. 692',
     'city': 'South Elvis',
     'zipcode': '53919-4257',
     'geo': {'lat': '29.4572', 'lng': '-164.2990'}}],
   'phone': '493-170-9623 x156',
   'website': 'kale.biz',
   'company': {'name': 'Robel-Corkery',
    'catchPhrase': 'Multi-tiered zero tolerance productivity',
    'bs': 'transition cutting-edge web services'}},
  {'id': 5,
   'name': 'Chelsey Dietrich',
   'username': 'Kamren',
   'email': 'Lucio_Hettinger@annie.ca',
   'address': [{'street': 'Skiles Walks',
     'suite': 'Suite 351',
     'city': 'Roscoeview',
     'zipcode': '33263',
     'geo': {'lat': '-31.8129', 'lng': '62.5342'}}],
   'phone': '(254)954-1289',
   'website': 'demarco.info',
   'company': {'name': 'Keebler LLC',
    'catchPhrase': 'User-centric fault-tolerant solution',
    'bs': 'revolutionize end-to-end systems'}},
  {'id': 6,
   'name': 'Mrs. Dennis Schulist',
   'username': 'Leopoldo_Corkery',
   'email': 'Karley_Dach@jasper.info',
   'address': [{'street': 'Norberto Crossing',
     'suite': 'Apt. 950',
     'city': 'South Christy',
     'zipcode': '23505-1337',
     'geo': {'lat': '-71.4197', 'lng': '71.7478'}}],
   'phone': '1-477-935-8478 x6430',
   'website': 'ola.org',
   'company': {'name': 'Considine-Lockman',
    'catchPhrase': 'Synchronised bottom-line interface',
    'bs': 'e-enable innovative applications'}},
  {'id': 7,
   'name': 'Kurtis Weissnat',
   'username': 'Elwyn.Skiles',
   'email': 'Telly.Hoeger@billy.biz',
   'address': [{'street': 'Rex Trail',
     'suite': 'Suite 280',
     'city': 'Howemouth',
     'zipcode': '58804-1099',
     'geo': {'lat': '24.8918', 'lng': '21.8984'}}],
   'phone': '210.067.6132',
   'website': 'elvis.io',
   'company': {'name': 'Johns Group',
    'catchPhrase': 'Configurable multimedia task-force',
    'bs': 'generate enterprise e-tailers'}},
  {'id': 8,
   'name': 'Nicholas Runolfsdottir V',
   'username': 'Maxime_Nienow',
   'email': 'Sherwood@rosamond.me',
   'address': [{'street': 'Ellsworth Summit',
     'suite': 'Suite 729',
     'city': 'Aliyaview',
     'zipcode': '45169',
     'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}],
   'phone': '586.493.6943 x140',
   'website': 'jacynthe.com',
   'company': {'name': 'Abernathy Group',
    'catchPhrase': 'Implemented secondary concept',
    'bs': 'e-enable extensible e-tailers'}},
  {'id': 9,
   'name': 'Glenna Reichert',
   'username': 'Delphine',
   'email': 'Chaim_McDermott@dana.io',
   'address': [{'street': 'Dayna Park',
     'suite': 'Suite 449',
     'city': 'Bartholomebury',
     'zipcode': '76495-3109',
     'geo': {'lat': '24.6463', 'lng': '-168.8889'}}],
   'phone': '(775)976-6794 x41206',
   'website': 'conrad.com',
   'company': {'name': 'Yost and Sons',
    'catchPhrase': 'Switchable contextually-based project',
    'bs': 'aggregate real-time technologies'}},
  {'id': 10,
   'name': 'Clementina DuBuque',
   'username': 'Moriah.Stanton',
   'email': 'Rey.Padberg@karina.biz',
   'address': [{'street': 'Kattie Turnpike',
     'suite': 'Suite 198',
     'city': 'Lebsackbury',
     'zipcode': '31428-2261',
     'geo': {'lat': '-38.2386', 'lng': '57.2232'}}],
   'phone': '024-648-3804',
   'website': 'ambrose.net',
   'company': {'name': 'Hoeger LLC',
    'catchPhrase': 'Centralized empowering task-force',
    'bs': 'target end-to-end models'}}]}
In [10]:
json_dict.keys()
Out[10]:
dict_keys(['info'])
In [11]:
json_dict.values()
Out[11]:
dict_values([[{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': [{'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}], 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}}, {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': [{'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}], 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}}, {'id': 3, 'name': 'Clementine Bauch', 'username': 'Samantha', 'email': 'Nathan@yesenia.net', 'address': [{'street': 'Douglas Extension', 'suite': 'Suite 847', 'city': 'McKenziehaven', 'zipcode': '59590-4157', 'geo': {'lat': '-68.6102', 'lng': '-47.0653'}}], 'phone': '1-463-123-4447', 'website': 'ramiro.info', 'company': {'name': 'Romaguera-Jacobson', 'catchPhrase': 'Face to face bifurcated interface', 'bs': 'e-enable strategic applications'}}, {'id': 4, 'name': 'Patricia Lebsack', 'username': 'Karianne', 'email': 'Julianne.OConner@kory.org', 'address': [{'street': 'Hoeger Mall', 'suite': 'Apt. 692', 'city': 'South Elvis', 'zipcode': '53919-4257', 'geo': {'lat': '29.4572', 'lng': '-164.2990'}}], 'phone': '493-170-9623 x156', 'website': 'kale.biz', 'company': {'name': 'Robel-Corkery', 'catchPhrase': 'Multi-tiered zero tolerance productivity', 'bs': 'transition cutting-edge web services'}}, {'id': 5, 'name': 'Chelsey Dietrich', 'username': 'Kamren', 'email': 'Lucio_Hettinger@annie.ca', 'address': [{'street': 'Skiles Walks', 'suite': 'Suite 351', 'city': 'Roscoeview', 'zipcode': '33263', 'geo': {'lat': '-31.8129', 'lng': '62.5342'}}], 'phone': '(254)954-1289', 'website': 'demarco.info', 'company': {'name': 'Keebler LLC', 'catchPhrase': 'User-centric fault-tolerant solution', 'bs': 'revolutionize end-to-end systems'}}, {'id': 6, 'name': 'Mrs. Dennis Schulist', 'username': 'Leopoldo_Corkery', 'email': 'Karley_Dach@jasper.info', 'address': [{'street': 'Norberto Crossing', 'suite': 'Apt. 950', 'city': 'South Christy', 'zipcode': '23505-1337', 'geo': {'lat': '-71.4197', 'lng': '71.7478'}}], 'phone': '1-477-935-8478 x6430', 'website': 'ola.org', 'company': {'name': 'Considine-Lockman', 'catchPhrase': 'Synchronised bottom-line interface', 'bs': 'e-enable innovative applications'}}, {'id': 7, 'name': 'Kurtis Weissnat', 'username': 'Elwyn.Skiles', 'email': 'Telly.Hoeger@billy.biz', 'address': [{'street': 'Rex Trail', 'suite': 'Suite 280', 'city': 'Howemouth', 'zipcode': '58804-1099', 'geo': {'lat': '24.8918', 'lng': '21.8984'}}], 'phone': '210.067.6132', 'website': 'elvis.io', 'company': {'name': 'Johns Group', 'catchPhrase': 'Configurable multimedia task-force', 'bs': 'generate enterprise e-tailers'}}, {'id': 8, 'name': 'Nicholas Runolfsdottir V', 'username': 'Maxime_Nienow', 'email': 'Sherwood@rosamond.me', 'address': [{'street': 'Ellsworth Summit', 'suite': 'Suite 729', 'city': 'Aliyaview', 'zipcode': '45169', 'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}], 'phone': '586.493.6943 x140', 'website': 'jacynthe.com', 'company': {'name': 'Abernathy Group', 'catchPhrase': 'Implemented secondary concept', 'bs': 'e-enable extensible e-tailers'}}, {'id': 9, 'name': 'Glenna Reichert', 'username': 'Delphine', 'email': 'Chaim_McDermott@dana.io', 'address': [{'street': 'Dayna Park', 'suite': 'Suite 449', 'city': 'Bartholomebury', 'zipcode': '76495-3109', 'geo': {'lat': '24.6463', 'lng': '-168.8889'}}], 'phone': '(775)976-6794 x41206', 'website': 'conrad.com', 'company': {'name': 'Yost and Sons', 'catchPhrase': 'Switchable contextually-based project', 'bs': 'aggregate real-time technologies'}}, {'id': 10, 'name': 'Clementina DuBuque', 'username': 'Moriah.Stanton', 'email': 'Rey.Padberg@karina.biz', 'address': [{'street': 'Kattie Turnpike', 'suite': 'Suite 198', 'city': 'Lebsackbury', 'zipcode': '31428-2261', 'geo': {'lat': '-38.2386', 'lng': '57.2232'}}], 'phone': '024-648-3804', 'website': 'ambrose.net', 'company': {'name': 'Hoeger LLC', 'catchPhrase': 'Centralized empowering task-force', 'bs': 'target end-to-end models'}}]])
In [12]:
for user in json_dict['info']:
    print(user)
{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': [{'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}], 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}}
{'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': [{'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}], 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}}
{'id': 3, 'name': 'Clementine Bauch', 'username': 'Samantha', 'email': 'Nathan@yesenia.net', 'address': [{'street': 'Douglas Extension', 'suite': 'Suite 847', 'city': 'McKenziehaven', 'zipcode': '59590-4157', 'geo': {'lat': '-68.6102', 'lng': '-47.0653'}}], 'phone': '1-463-123-4447', 'website': 'ramiro.info', 'company': {'name': 'Romaguera-Jacobson', 'catchPhrase': 'Face to face bifurcated interface', 'bs': 'e-enable strategic applications'}}
{'id': 4, 'name': 'Patricia Lebsack', 'username': 'Karianne', 'email': 'Julianne.OConner@kory.org', 'address': [{'street': 'Hoeger Mall', 'suite': 'Apt. 692', 'city': 'South Elvis', 'zipcode': '53919-4257', 'geo': {'lat': '29.4572', 'lng': '-164.2990'}}], 'phone': '493-170-9623 x156', 'website': 'kale.biz', 'company': {'name': 'Robel-Corkery', 'catchPhrase': 'Multi-tiered zero tolerance productivity', 'bs': 'transition cutting-edge web services'}}
{'id': 5, 'name': 'Chelsey Dietrich', 'username': 'Kamren', 'email': 'Lucio_Hettinger@annie.ca', 'address': [{'street': 'Skiles Walks', 'suite': 'Suite 351', 'city': 'Roscoeview', 'zipcode': '33263', 'geo': {'lat': '-31.8129', 'lng': '62.5342'}}], 'phone': '(254)954-1289', 'website': 'demarco.info', 'company': {'name': 'Keebler LLC', 'catchPhrase': 'User-centric fault-tolerant solution', 'bs': 'revolutionize end-to-end systems'}}
{'id': 6, 'name': 'Mrs. Dennis Schulist', 'username': 'Leopoldo_Corkery', 'email': 'Karley_Dach@jasper.info', 'address': [{'street': 'Norberto Crossing', 'suite': 'Apt. 950', 'city': 'South Christy', 'zipcode': '23505-1337', 'geo': {'lat': '-71.4197', 'lng': '71.7478'}}], 'phone': '1-477-935-8478 x6430', 'website': 'ola.org', 'company': {'name': 'Considine-Lockman', 'catchPhrase': 'Synchronised bottom-line interface', 'bs': 'e-enable innovative applications'}}
{'id': 7, 'name': 'Kurtis Weissnat', 'username': 'Elwyn.Skiles', 'email': 'Telly.Hoeger@billy.biz', 'address': [{'street': 'Rex Trail', 'suite': 'Suite 280', 'city': 'Howemouth', 'zipcode': '58804-1099', 'geo': {'lat': '24.8918', 'lng': '21.8984'}}], 'phone': '210.067.6132', 'website': 'elvis.io', 'company': {'name': 'Johns Group', 'catchPhrase': 'Configurable multimedia task-force', 'bs': 'generate enterprise e-tailers'}}
{'id': 8, 'name': 'Nicholas Runolfsdottir V', 'username': 'Maxime_Nienow', 'email': 'Sherwood@rosamond.me', 'address': [{'street': 'Ellsworth Summit', 'suite': 'Suite 729', 'city': 'Aliyaview', 'zipcode': '45169', 'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}], 'phone': '586.493.6943 x140', 'website': 'jacynthe.com', 'company': {'name': 'Abernathy Group', 'catchPhrase': 'Implemented secondary concept', 'bs': 'e-enable extensible e-tailers'}}
{'id': 9, 'name': 'Glenna Reichert', 'username': 'Delphine', 'email': 'Chaim_McDermott@dana.io', 'address': [{'street': 'Dayna Park', 'suite': 'Suite 449', 'city': 'Bartholomebury', 'zipcode': '76495-3109', 'geo': {'lat': '24.6463', 'lng': '-168.8889'}}], 'phone': '(775)976-6794 x41206', 'website': 'conrad.com', 'company': {'name': 'Yost and Sons', 'catchPhrase': 'Switchable contextually-based project', 'bs': 'aggregate real-time technologies'}}
{'id': 10, 'name': 'Clementina DuBuque', 'username': 'Moriah.Stanton', 'email': 'Rey.Padberg@karina.biz', 'address': [{'street': 'Kattie Turnpike', 'suite': 'Suite 198', 'city': 'Lebsackbury', 'zipcode': '31428-2261', 'geo': {'lat': '-38.2386', 'lng': '57.2232'}}], 'phone': '024-648-3804', 'website': 'ambrose.net', 'company': {'name': 'Hoeger LLC', 'catchPhrase': 'Centralized empowering task-force', 'bs': 'target end-to-end models'}}

green-divider

Using pandas from_dict method

With our Python Dictionary ready, we'll introduce another useful pandas method: from_dict().

This from_dict method will construct a new DataFrame from a dict of array-like or dicts.

Full from_dict documentation can be found here: https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.from_dict.html

In [13]:
pd.DataFrame.from_dict({'Fruits': ['Apple', 'Banana']})
Out[13]:
Fruits
0 Apple
1 Banana

 Changing data orientation using orient parameter

The "orientation" of the data. If the keys of the passed dictionary should be the columns of the resulting DataFrame, pass orient=columns (default behaviour). Otherwise if the keys should be rows, use orient=index.

In [14]:
pd.DataFrame.from_dict({'Fruits': ['Apple', 'Banana']}, orient='index')
Out[14]:
0 1
Fruits Apple Banana

Go ahead and use our json_dict dictionary to create a new DataFrame, but this time using the value attribute:

In [15]:
df = pd.DataFrame.from_dict(json_dict)
In [16]:
df.head()
Out[16]:
info
0 {'id': 1, 'name': 'Leanne Graham', 'username':...
1 {'id': 2, 'name': 'Ervin Howell', 'username': ...
2 {'id': 3, 'name': 'Clementine Bauch', 'usernam...
3 {'id': 4, 'name': 'Patricia Lebsack', 'usernam...
4 {'id': 5, 'name': 'Chelsey Dietrich', 'usernam...
In [17]:
df = pd.DataFrame.from_dict(json_dict['info'])
In [18]:
df.head()
Out[18]:
id name username email address phone website company
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org {'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net {'name': 'Deckow-Crist', 'catchPhrase': 'Proac...
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info {'name': 'Romaguera-Jacobson', 'catchPhrase': ...
3 4 Patricia Lebsack Karianne Julianne.OConner@kory.org [{'street': 'Hoeger Mall', 'suite': 'Apt. 692'... 493-170-9623 x156 kale.biz {'name': 'Robel-Corkery', 'catchPhrase': 'Mult...
4 5 Chelsey Dietrich Kamren Lucio_Hettinger@annie.ca [{'street': 'Skiles Walks', 'suite': 'Suite 35... (254)954-1289 demarco.info {'name': 'Keebler LLC', 'catchPhrase': 'User-c...

This way our data looks more understandable. Each cell has a correct data structure.

green-divider

 Using json_normalize for complex nesting

In our example we still have two more complex columns, address and company.

  • address: list of dictionaries (records)
  • company: dictionary (record)

The json_normalize method will be useful to unpack and flatten that data easily.

Let's unpack the works column into a standalone dataframe. We'll also grab the flat columns so we can do analysis.

In [19]:
df.head()
Out[19]:
id name username email address phone website company
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org {'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net {'name': 'Deckow-Crist', 'catchPhrase': 'Proac...
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info {'name': 'Romaguera-Jacobson', 'catchPhrase': ...
3 4 Patricia Lebsack Karianne Julianne.OConner@kory.org [{'street': 'Hoeger Mall', 'suite': 'Apt. 692'... 493-170-9623 x156 kale.biz {'name': 'Robel-Corkery', 'catchPhrase': 'Mult...
4 5 Chelsey Dietrich Kamren Lucio_Hettinger@annie.ca [{'street': 'Skiles Walks', 'suite': 'Suite 35... (254)954-1289 demarco.info {'name': 'Keebler LLC', 'catchPhrase': 'User-c...
In [20]:
json_dict
Out[20]:
{'info': [{'id': 1,
   'name': 'Leanne Graham',
   'username': 'Bret',
   'email': 'Sincere@april.biz',
   'address': [{'street': 'Kulas Light',
     'suite': 'Apt. 556',
     'city': 'Gwenborough',
     'zipcode': '92998-3874',
     'geo': {'lat': '-37.3159', 'lng': '81.1496'}}],
   'phone': '1-770-736-8031 x56442',
   'website': 'hildegard.org',
   'company': {'name': 'Romaguera-Crona',
    'catchPhrase': 'Multi-layered client-server neural-net',
    'bs': 'harness real-time e-markets'}},
  {'id': 2,
   'name': 'Ervin Howell',
   'username': 'Antonette',
   'email': 'Shanna@melissa.tv',
   'address': [{'street': 'Victor Plains',
     'suite': 'Suite 879',
     'city': 'Wisokyburgh',
     'zipcode': '90566-7771',
     'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}],
   'phone': '010-692-6593 x09125',
   'website': 'anastasia.net',
   'company': {'name': 'Deckow-Crist',
    'catchPhrase': 'Proactive didactic contingency',
    'bs': 'synergize scalable supply-chains'}},
  {'id': 3,
   'name': 'Clementine Bauch',
   'username': 'Samantha',
   'email': 'Nathan@yesenia.net',
   'address': [{'street': 'Douglas Extension',
     'suite': 'Suite 847',
     'city': 'McKenziehaven',
     'zipcode': '59590-4157',
     'geo': {'lat': '-68.6102', 'lng': '-47.0653'}}],
   'phone': '1-463-123-4447',
   'website': 'ramiro.info',
   'company': {'name': 'Romaguera-Jacobson',
    'catchPhrase': 'Face to face bifurcated interface',
    'bs': 'e-enable strategic applications'}},
  {'id': 4,
   'name': 'Patricia Lebsack',
   'username': 'Karianne',
   'email': 'Julianne.OConner@kory.org',
   'address': [{'street': 'Hoeger Mall',
     'suite': 'Apt. 692',
     'city': 'South Elvis',
     'zipcode': '53919-4257',
     'geo': {'lat': '29.4572', 'lng': '-164.2990'}}],
   'phone': '493-170-9623 x156',
   'website': 'kale.biz',
   'company': {'name': 'Robel-Corkery',
    'catchPhrase': 'Multi-tiered zero tolerance productivity',
    'bs': 'transition cutting-edge web services'}},
  {'id': 5,
   'name': 'Chelsey Dietrich',
   'username': 'Kamren',
   'email': 'Lucio_Hettinger@annie.ca',
   'address': [{'street': 'Skiles Walks',
     'suite': 'Suite 351',
     'city': 'Roscoeview',
     'zipcode': '33263',
     'geo': {'lat': '-31.8129', 'lng': '62.5342'}}],
   'phone': '(254)954-1289',
   'website': 'demarco.info',
   'company': {'name': 'Keebler LLC',
    'catchPhrase': 'User-centric fault-tolerant solution',
    'bs': 'revolutionize end-to-end systems'}},
  {'id': 6,
   'name': 'Mrs. Dennis Schulist',
   'username': 'Leopoldo_Corkery',
   'email': 'Karley_Dach@jasper.info',
   'address': [{'street': 'Norberto Crossing',
     'suite': 'Apt. 950',
     'city': 'South Christy',
     'zipcode': '23505-1337',
     'geo': {'lat': '-71.4197', 'lng': '71.7478'}}],
   'phone': '1-477-935-8478 x6430',
   'website': 'ola.org',
   'company': {'name': 'Considine-Lockman',
    'catchPhrase': 'Synchronised bottom-line interface',
    'bs': 'e-enable innovative applications'}},
  {'id': 7,
   'name': 'Kurtis Weissnat',
   'username': 'Elwyn.Skiles',
   'email': 'Telly.Hoeger@billy.biz',
   'address': [{'street': 'Rex Trail',
     'suite': 'Suite 280',
     'city': 'Howemouth',
     'zipcode': '58804-1099',
     'geo': {'lat': '24.8918', 'lng': '21.8984'}}],
   'phone': '210.067.6132',
   'website': 'elvis.io',
   'company': {'name': 'Johns Group',
    'catchPhrase': 'Configurable multimedia task-force',
    'bs': 'generate enterprise e-tailers'}},
  {'id': 8,
   'name': 'Nicholas Runolfsdottir V',
   'username': 'Maxime_Nienow',
   'email': 'Sherwood@rosamond.me',
   'address': [{'street': 'Ellsworth Summit',
     'suite': 'Suite 729',
     'city': 'Aliyaview',
     'zipcode': '45169',
     'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}],
   'phone': '586.493.6943 x140',
   'website': 'jacynthe.com',
   'company': {'name': 'Abernathy Group',
    'catchPhrase': 'Implemented secondary concept',
    'bs': 'e-enable extensible e-tailers'}},
  {'id': 9,
   'name': 'Glenna Reichert',
   'username': 'Delphine',
   'email': 'Chaim_McDermott@dana.io',
   'address': [{'street': 'Dayna Park',
     'suite': 'Suite 449',
     'city': 'Bartholomebury',
     'zipcode': '76495-3109',
     'geo': {'lat': '24.6463', 'lng': '-168.8889'}}],
   'phone': '(775)976-6794 x41206',
   'website': 'conrad.com',
   'company': {'name': 'Yost and Sons',
    'catchPhrase': 'Switchable contextually-based project',
    'bs': 'aggregate real-time technologies'}},
  {'id': 10,
   'name': 'Clementina DuBuque',
   'username': 'Moriah.Stanton',
   'email': 'Rey.Padberg@karina.biz',
   'address': [{'street': 'Kattie Turnpike',
     'suite': 'Suite 198',
     'city': 'Lebsackbury',
     'zipcode': '31428-2261',
     'geo': {'lat': '-38.2386', 'lng': '57.2232'}}],
   'phone': '024-648-3804',
   'website': 'ambrose.net',
   'company': {'name': 'Hoeger LLC',
    'catchPhrase': 'Centralized empowering task-force',
    'bs': 'target end-to-end models'}}]}
In [21]:
from pandas.io.json import json_normalize

users = json_normalize(json_dict)

users
Out[21]:
info
0 [{'id': 1, 'name': 'Leanne Graham', 'username'...
In [22]:
users = json_normalize(json_dict['info'])
In [23]:
users.head(3)
Out[23]:
id name username email address phone website company.name company.catchPhrase company.bs
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org Romaguera-Crona Multi-layered client-server neural-net harness real-time e-markets
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net Deckow-Crist Proactive didactic contingency synergize scalable supply-chains
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info Romaguera-Jacobson Face to face bifurcated interface e-enable strategic applications

Now the company column was exploted in three new columns: company.name, company.catchPhrase and company.bs.

Custom separator for new columns

We can use the sep parameter to define how will nested records will be separated by.

In [24]:
users = json_normalize(json_dict['info'],
                       sep='_')
In [25]:
users.head(3)
Out[25]:
id name username email address phone website company_name company_catchPhrase company_bs
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org Romaguera-Crona Multi-layered client-server neural-net harness real-time e-markets
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net Deckow-Crist Proactive didactic contingency synergize scalable supply-chains
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info Romaguera-Jacobson Face to face bifurcated interface e-enable strategic applications

Unpack column with a list of records

When we have a column with a list of nested values (records, dictionary) it's difficult to deal with it.

json_normalize let us define a record_path parameter to define the path to a column with a list of records to unpack them. If this parameter is not passed, data will be assumed to be an array of records.

In our case we'll unpack the address column into a standalone DataFrame.

In [26]:
address = json_normalize(json_dict['info'],
                         sep='_',
                         record_path='address')
In [27]:
address.head()
Out[27]:
street suite city zipcode geo_lat geo_lng
0 Kulas Light Apt. 556 Gwenborough 92998-3874 -37.3159 81.1496
1 Victor Plains Suite 879 Wisokyburgh 90566-7771 -43.9509 -34.4618
2 Douglas Extension Suite 847 McKenziehaven 59590-4157 -68.6102 -47.0653
3 Hoeger Mall Apt. 692 South Elvis 53919-4257 29.4572 -164.2990
4 Skiles Walks Suite 351 Roscoeview 33263 -31.8129 62.5342

The record_path parameter could receive a list of columns indicating the nested path.

In [28]:
address = json_normalize(json_dict,
                         sep='_',
                         record_path=['info', 'address'])
In [29]:
address.head()
Out[29]:
street suite city zipcode geo_lat geo_lng
0 Kulas Light Apt. 556 Gwenborough 92998-3874 -37.3159 81.1496
1 Victor Plains Suite 879 Wisokyburgh 90566-7771 -43.9509 -34.4618
2 Douglas Extension Suite 847 McKenziehaven 59590-4157 -68.6102 -47.0653
3 Hoeger Mall Apt. 692 South Elvis 53919-4257 29.4572 -164.2990
4 Skiles Walks Suite 351 Roscoeview 33263 -31.8129 62.5342

 Adding extra columns to unpacked columns

There is another useful parameter, meta, which allow us to add fields as metadata for each record in the resulting DataFrame.

In our case we'll add the user identifier to each address.

In [30]:
address = json_normalize(json_dict['info'],
                         sep='_',
                         record_path='address',
                         meta=['id'])
In [31]:
address.head()
Out[31]:
street suite city zipcode geo_lat geo_lng id
0 Kulas Light Apt. 556 Gwenborough 92998-3874 -37.3159 81.1496 1
1 Victor Plains Suite 879 Wisokyburgh 90566-7771 -43.9509 -34.4618 2
2 Douglas Extension Suite 847 McKenziehaven 59590-4157 -68.6102 -47.0653 3
3 Hoeger Mall Apt. 692 South Elvis 53919-4257 29.4572 -164.2990 4
4 Skiles Walks Suite 351 Roscoeview 33263 -31.8129 62.5342 5

green-divider

 Save to JSON file

Finally we can save our DataFrame as a JSON file.

In [32]:
users.head()
Out[32]:
id name username email address phone website company_name company_catchPhrase company_bs
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org Romaguera-Crona Multi-layered client-server neural-net harness real-time e-markets
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net Deckow-Crist Proactive didactic contingency synergize scalable supply-chains
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info Romaguera-Jacobson Face to face bifurcated interface e-enable strategic applications
3 4 Patricia Lebsack Karianne Julianne.OConner@kory.org [{'street': 'Hoeger Mall', 'suite': 'Apt. 692'... 493-170-9623 x156 kale.biz Robel-Corkery Multi-tiered zero tolerance productivity transition cutting-edge web services
4 5 Chelsey Dietrich Kamren Lucio_Hettinger@annie.ca [{'street': 'Skiles Walks', 'suite': 'Suite 35... (254)954-1289 demarco.info Keebler LLC User-centric fault-tolerant solution revolutionize end-to-end systems

We can simply generate a JSON string from our DataFrame:

In [33]:
users.to_json()
Out[33]:
'{"id":{"0":1,"1":2,"2":3,"3":4,"4":5,"5":6,"6":7,"7":8,"8":9,"9":10},"name":{"0":"Leanne Graham","1":"Ervin Howell","2":"Clementine Bauch","3":"Patricia Lebsack","4":"Chelsey Dietrich","5":"Mrs. Dennis Schulist","6":"Kurtis Weissnat","7":"Nicholas Runolfsdottir V","8":"Glenna Reichert","9":"Clementina DuBuque"},"username":{"0":"Bret","1":"Antonette","2":"Samantha","3":"Karianne","4":"Kamren","5":"Leopoldo_Corkery","6":"Elwyn.Skiles","7":"Maxime_Nienow","8":"Delphine","9":"Moriah.Stanton"},"email":{"0":"Sincere@april.biz","1":"Shanna@melissa.tv","2":"Nathan@yesenia.net","3":"Julianne.OConner@kory.org","4":"Lucio_Hettinger@annie.ca","5":"Karley_Dach@jasper.info","6":"Telly.Hoeger@billy.biz","7":"Sherwood@rosamond.me","8":"Chaim_McDermott@dana.io","9":"Rey.Padberg@karina.biz"},"address":{"0":[{"street":"Kulas Light","suite":"Apt. 556","city":"Gwenborough","zipcode":"92998-3874","geo":{"lat":"-37.3159","lng":"81.1496"}}],"1":[{"street":"Victor Plains","suite":"Suite 879","city":"Wisokyburgh","zipcode":"90566-7771","geo":{"lat":"-43.9509","lng":"-34.4618"}}],"2":[{"street":"Douglas Extension","suite":"Suite 847","city":"McKenziehaven","zipcode":"59590-4157","geo":{"lat":"-68.6102","lng":"-47.0653"}}],"3":[{"street":"Hoeger Mall","suite":"Apt. 692","city":"South Elvis","zipcode":"53919-4257","geo":{"lat":"29.4572","lng":"-164.2990"}}],"4":[{"street":"Skiles Walks","suite":"Suite 351","city":"Roscoeview","zipcode":"33263","geo":{"lat":"-31.8129","lng":"62.5342"}}],"5":[{"street":"Norberto Crossing","suite":"Apt. 950","city":"South Christy","zipcode":"23505-1337","geo":{"lat":"-71.4197","lng":"71.7478"}}],"6":[{"street":"Rex Trail","suite":"Suite 280","city":"Howemouth","zipcode":"58804-1099","geo":{"lat":"24.8918","lng":"21.8984"}}],"7":[{"street":"Ellsworth Summit","suite":"Suite 729","city":"Aliyaview","zipcode":"45169","geo":{"lat":"-14.3990","lng":"-120.7677"}}],"8":[{"street":"Dayna Park","suite":"Suite 449","city":"Bartholomebury","zipcode":"76495-3109","geo":{"lat":"24.6463","lng":"-168.8889"}}],"9":[{"street":"Kattie Turnpike","suite":"Suite 198","city":"Lebsackbury","zipcode":"31428-2261","geo":{"lat":"-38.2386","lng":"57.2232"}}]},"phone":{"0":"1-770-736-8031 x56442","1":"010-692-6593 x09125","2":"1-463-123-4447","3":"493-170-9623 x156","4":"(254)954-1289","5":"1-477-935-8478 x6430","6":"210.067.6132","7":"586.493.6943 x140","8":"(775)976-6794 x41206","9":"024-648-3804"},"website":{"0":"hildegard.org","1":"anastasia.net","2":"ramiro.info","3":"kale.biz","4":"demarco.info","5":"ola.org","6":"elvis.io","7":"jacynthe.com","8":"conrad.com","9":"ambrose.net"},"company_name":{"0":"Romaguera-Crona","1":"Deckow-Crist","2":"Romaguera-Jacobson","3":"Robel-Corkery","4":"Keebler LLC","5":"Considine-Lockman","6":"Johns Group","7":"Abernathy Group","8":"Yost and Sons","9":"Hoeger LLC"},"company_catchPhrase":{"0":"Multi-layered client-server neural-net","1":"Proactive didactic contingency","2":"Face to face bifurcated interface","3":"Multi-tiered zero tolerance productivity","4":"User-centric fault-tolerant solution","5":"Synchronised bottom-line interface","6":"Configurable multimedia task-force","7":"Implemented secondary concept","8":"Switchable contextually-based project","9":"Centralized empowering task-force"},"company_bs":{"0":"harness real-time e-markets","1":"synergize scalable supply-chains","2":"e-enable strategic applications","3":"transition cutting-edge web services","4":"revolutionize end-to-end systems","5":"e-enable innovative applications","6":"generate enterprise e-tailers","7":"e-enable extensible e-tailers","8":"aggregate real-time technologies","9":"target end-to-end models"}}'

Or specify a file path where we want our generated JSON code to be saved:

In [34]:
users.to_json('out.json')
In [35]:
pd.read_json('out.json').head()
Out[35]:
id name username email address phone website company_name company_catchPhrase company_bs
0 1 Leanne Graham Bret Sincere@april.biz [{'street': 'Kulas Light', 'suite': 'Apt. 556'... 1-770-736-8031 x56442 hildegard.org Romaguera-Crona Multi-layered client-server neural-net harness real-time e-markets
1 2 Ervin Howell Antonette Shanna@melissa.tv [{'street': 'Victor Plains', 'suite': 'Suite 8... 010-692-6593 x09125 anastasia.net Deckow-Crist Proactive didactic contingency synergize scalable supply-chains
2 3 Clementine Bauch Samantha Nathan@yesenia.net [{'street': 'Douglas Extension', 'suite': 'Sui... 1-463-123-4447 ramiro.info Romaguera-Jacobson Face to face bifurcated interface e-enable strategic applications
3 4 Patricia Lebsack Karianne Julianne.OConner@kory.org [{'street': 'Hoeger Mall', 'suite': 'Apt. 692'... 493-170-9623 x156 kale.biz Robel-Corkery Multi-tiered zero tolerance productivity transition cutting-edge web services
4 5 Chelsey Dietrich Kamren Lucio_Hettinger@annie.ca [{'street': 'Skiles Walks', 'suite': 'Suite 35... (254)954-1289 demarco.info Keebler LLC User-centric fault-tolerant solution revolutionize end-to-end systems

purple-divider

Notebooks AI
Notebooks AI Profile20060