Profile picture

Principal Systems Engineer @ MITRE

Case-Study-01

Last updated: June 23rd, 20202020-06-23Project preview

US Population By States

References:

Data Souces:

This notebook demonstrates the use of following Python techniques:

  • Web scrapting
    • requests
    • beautiful soup
  • Data cleansing and merging
    • Pandas
    • Pandas merge
  • Interactive data visualization with Plotly
    • Plotly/Plotly Express
    • Bar chart
    • Choropleth map
  • Interactive dashboard
    • Dash
    • Plotly/Plotly Express

Note:

A choropleth map is a type of thematic map in which areas are shaded or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as population density or per-capita income.

From Greek, khōra place + plēthos multitude.

In [7]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio
In [8]:
pio.renderers
Out[8]:
Renderers configuration
-----------------------
    Default renderer: 'plotly_mimetype+notebook'
    Available renderers:
        ['plotly_mimetype', 'jupyterlab', 'nteract', 'vscode',
         'notebook', 'notebook_connected', 'kaggle', 'azure', 'colab',
         'cocalc', 'databricks', 'json', 'png', 'jpeg', 'jpg', 'svg',
         'pdf', 'browser', 'firefox', 'chrome', 'chromium', 'iframe',
         'iframe_connected', 'sphinx_gallery']
In [9]:
pio.renderer = "jupyterlab"
In [10]:
DATA_URL = "https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population"
In [11]:
html = requests.get(DATA_URL).text
soup = BeautifulSoup(html, "lxml")
print(soup.prettify()[10000:20000])
Bureau" title="United States Census Bureau">
        United States Census Bureau
       </a>
       .  The total
       <a href="/wiki/Population" title="Population">
        population
       </a>
       of the
       <a href="/wiki/United_States" title="United States">
        United States
       </a>
       was 330,150,668 at the
       <a class="external text" href="https://www.census.gov/" rel="nofollow">
        December 2019 Census
       </a>
       .
      </p>
      <p>
       These
       <a href="/wiki/Number" title="Number">
        numbers
       </a>
       include all people other than people just
       <a class="new" href="/w/index.php?title=Visit&amp;action=edit&amp;redlink=1" title="Visit (not yet started)">
        visiting
       </a>
       : These include
       <a class="mw-redirect" href="/wiki/Citizen" title="Citizen">
        citizens
       </a>
       , non-citizen
       <a class="new" href="/w/index.php?title=Permanent_resident&amp;action=edit&amp;redlink=1" title="Permanent resident (not yet started)">
        permanent residents
       </a>
       and non-citizen long-term visitors. People in the
       <a href="/wiki/Military" title="Military">
        military
       </a>
       who are working in other
       <a href="/wiki/Country" title="Country">
        countries
       </a>
       and their
       <a href="/wiki/Family" title="Family">
        families
       </a>
       are counted in their home state.
       <sup class="reference" id="cite_ref-1">
        <a href="#cite_note-1">
         [1]
        </a>
       </sup>
       The
       <a class="mw-redirect" href="/wiki/United_States_of_America" title="United States of America">
        United States of America
       </a>
       is the third most populous country in the world, behind
       <a href="/wiki/China" title="China">
        China
       </a>
       and
       <a href="/wiki/India" title="India">
        India
       </a>
       .
      </p>
      <div class="thumb tright">
       <div class="thumbinner" style="width:302px;">
        <a class="image" href="/wiki/File:%25_Population_Chnge_by_state_from_2010_-_2017.png">
         <img alt="" class="thumbimage" data-file-height="3586" data-file-width="5400" decoding="async" height="199" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/47/%25_Population_Chnge_by_state_from_2010_-_2017.png/300px-%25_Population_Chnge_by_state_from_2010_-_2017.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/47/%25_Population_Chnge_by_state_from_2010_-_2017.png/450px-%25_Population_Chnge_by_state_from_2010_-_2017.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/47/%25_Population_Chnge_by_state_from_2010_-_2017.png/600px-%25_Population_Chnge_by_state_from_2010_-_2017.png 2x" width="300"/>
        </a>
        <div class="thumbcaption">
         <div class="magnify">
          <a class="internal" href="/wiki/File:%25_Population_Chnge_by_state_from_2010_-_2017.png" title="Enlarge">
          </a>
         </div>
         Map of population change in U.S. states from April 5, 2010 to July 1, 2017
        </div>
       </div>
      </div>
      <table class="wikitable sortable" style="width:100%; text-align:center;">
       <tbody>
        <tr style="vertical-align: top;">
         <th style="vertical-align: middle">
          Rank in states &amp; territories, 2019
         </th>
         <th style="vertical-align: middle">
          Rank in states &amp; territories, 2010
         </th>
         <th style="vertical-align: middle">
          State
         </th>
         <th style="vertical-align: middle">
          Population estimate, July 1, 2019
          <br/>
          <sup class="reference" id="cite_ref-AnnualEstUS_2-0">
           <a href="#cite_note-AnnualEstUS-2">
            [2]
           </a>
          </sup>
         </th>
         <th style="vertical-align: middle">
          Census population, April 1, 2010
          <br/>
          <sup class="reference" id="cite_ref-3">
           <a href="#cite_note-3">
            [3]
           </a>
          </sup>
         </th>
         <th>
          Percent change, 2010–2019
          <br/>
          <sup class="reference" id="cite_ref-4">
           <a href="#cite_note-4">
            [note 1]
           </a>
          </sup>
         </th>
         <th>
          Absolute change, 2010-2019
         </th>
         <th style="vertical-align: middle">
          Total seats in the
          <a href="/wiki/United_States_House_of_Representatives" title="United States House of Representatives">
           U.S. House of Representatives
          </a>
          , 2013–2023
         </th>
         <th style="vertical-align: middle">
          Estimated population per electoral vote, 2019
          <br/>
          <sup class="reference" id="cite_ref-5">
           <a href="#cite_note-5">
            [note 2]
           </a>
          </sup>
         </th>
         <th style="vertical-align: middle">
          Estimated population per House seat, 2019
         </th>
         <th style="vertical-align: middle">
          Census population per House seat, 2010
         </th>
         <th style="vertical-align: middle">
          Percent of the total U.S. population, 2018
          <br/>
          <sup class="reference" id="cite_ref-6">
           <a href="#cite_note-6">
            [note 3]
           </a>
          </sup>
         </th>
        </tr>
        <tr>
         <td align="center">
          1
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7000100000000000000♠
          </span>
          1
         </td>
         <td style="text-align: left;">
          <span class="flagicon">
           <img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/23px-Flag_of_California.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/35px-Flag_of_California.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/45px-Flag_of_California.svg.png 2x" width="23"/>
          </span>
          <a href="/wiki/California" title="California">
           California
          </a>
         </td>
         <td>
          39,512,223
         </td>
         <td>
          37,254,523
         </td>
         <td>
          6.1%
         </td>
         <td>
          +2,257,700
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7001530000000000000♠
          </span>
          53
         </td>
         <td>
          718,404
         </td>
         <td>
          745,514
         </td>
         <td>
          702,885
         </td>
         <td>
          11.96%
         </td>
        </tr>
        <tr>
         <td align="center">
          <span class="sortkey" style="display:none">
           7000200000000000000♠
          </span>
          2
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7000200000000000000♠
          </span>
          2
         </td>
         <td style="text-align: left;">
          <span class="flagicon">
           <img alt="" class="thumbborder" data-file-height="720" data-file-width="1080" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/23px-Flag_of_Texas.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/35px-Flag_of_Texas.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/45px-Flag_of_Texas.svg.png 2x" width="23"/>
          </span>
          <a href="/wiki/Texas" title="Texas">
           Texas
          </a>
         </td>
         <td>
          28,995,881
         </td>
         <td>
          25,145,561
         </td>
         <td>
          15.3%
         </td>
         <td>
          +3,850,320
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7001360000000000000♠
          </span>
          36
         </td>
         <td>
          763,050
         </td>
         <td>
          805,441
         </td>
         <td>
          698,503
         </td>
         <td>
          8.68%
         </td>
        </tr>
        <tr>
         <td align="center">
          <span class="sortkey" style="display:none">
           7000300000000000000♠
          </span>
          3
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7000400000000000000♠
          </span>
          4
         </td>
         <td style="text-align: left;">
          <span class="flagicon">
           <img alt="" class="thumbborder" data-file-height="500" data-file-width="750" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/23px-Flag_of_Florida.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/35px-Flag_of_Florida.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/45px-Flag_of_Florida.svg.png 2x" width="23"/>
          </span>
          <a href="/wiki/Florida" title="Florida">
           Florida
          </a>
         </td>
         <td>
          21,477,737
         </td>
         <td>
          18,801,310
         </td>
         <td>
          14.2%
         </td>
         <td>
          +2,676,427
         </td>
         <td align="center">
          <span class="sortkey" style="display:none">
           7001270000000000000♠
          </span>
          27
         </td>
         <td>
          740,611
         </td>
         <td>
          795,472
         </td>
         <td>
          696,468
         </td>
         <td>
          6.44%
         </td>
        </tr>
        <tr>
         <td a
In [12]:
my_table = soup.find("table",{"class":"wikitable sortable"})
print(my_table.prettify()[:3000])
<table class="wikitable sortable" style="width:100%; text-align:center;">
 <tbody>
  <tr style="vertical-align: top;">
   <th style="vertical-align: middle">
    Rank in states &amp; territories, 2019
   </th>
   <th style="vertical-align: middle">
    Rank in states &amp; territories, 2010
   </th>
   <th style="vertical-align: middle">
    State
   </th>
   <th style="vertical-align: middle">
    Population estimate, July 1, 2019
    <br/>
    <sup class="reference" id="cite_ref-AnnualEstUS_2-0">
     <a href="#cite_note-AnnualEstUS-2">
      [2]
     </a>
    </sup>
   </th>
   <th style="vertical-align: middle">
    Census population, April 1, 2010
    <br/>
    <sup class="reference" id="cite_ref-3">
     <a href="#cite_note-3">
      [3]
     </a>
    </sup>
   </th>
   <th>
    Percent change, 2010–2019
    <br/>
    <sup class="reference" id="cite_ref-4">
     <a href="#cite_note-4">
      [note 1]
     </a>
    </sup>
   </th>
   <th>
    Absolute change, 2010-2019
   </th>
   <th style="vertical-align: middle">
    Total seats in the
    <a href="/wiki/United_States_House_of_Representatives" title="United States House of Representatives">
     U.S. House of Representatives
    </a>
    , 2013–2023
   </th>
   <th style="vertical-align: middle">
    Estimated population per electoral vote, 2019
    <br/>
    <sup class="reference" id="cite_ref-5">
     <a href="#cite_note-5">
      [note 2]
     </a>
    </sup>
   </th>
   <th style="vertical-align: middle">
    Estimated population per House seat, 2019
   </th>
   <th style="vertical-align: middle">
    Census population per House seat, 2010
   </th>
   <th style="vertical-align: middle">
    Percent of the total U.S. population, 2018
    <br/>
    <sup class="reference" id="cite_ref-6">
     <a href="#cite_note-6">
      [note 3]
     </a>
    </sup>
   </th>
  </tr>
  <tr>
   <td align="center">
    1
   </td>
   <td align="center">
    <span class="sortkey" style="display:none">
     7000100000000000000♠
    </span>
    1
   </td>
   <td style="text-align: left;">
    <span class="flagicon">
     <img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/23px-Flag_of_California.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/35px-Flag_of_California.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/45px-Flag_of_California.svg.png 2x" width="23"/>
    </span>
    <a href="/wiki/California" title="California">
     California
    </a>
   </td>
   <td>
    39,512,223
   </td>
   <td>
    37,254,523
   </td>
   <td>
    6.1%
   </td>
   <td>
    +2,257,700
   </td>
   <td align="center">
    <span class="sortkey" style="display:none">
     7001530000000000000♠
    </span>
    53
   </td>
   <td>
    718,404
   </td>
   <td>
    745,514
   </td>
   <td>
    702,885
   </td>
   <td>
 
In [13]:
my_trs = my_table.findAll("tr")
my_trs[:5]
Out[13]:
[<tr style="vertical-align: top;">
 <th style="vertical-align: middle">Rank in states &amp; territories, 2019
 </th>
 <th style="vertical-align: middle">Rank in states &amp; territories, 2010
 </th>
 <th style="vertical-align: middle">State
 </th>
 <th style="vertical-align: middle">Population estimate, July 1, 2019<br/><sup class="reference" id="cite_ref-AnnualEstUS_2-0"><a href="#cite_note-AnnualEstUS-2">[2]</a></sup>
 </th>
 <th style="vertical-align: middle">Census population, April 1, 2010<br/><sup class="reference" id="cite_ref-3"><a href="#cite_note-3">[3]</a></sup>
 </th>
 <th>Percent change, 2010–2019<br/><sup class="reference" id="cite_ref-4"><a href="#cite_note-4">[note 1]</a></sup>
 </th>
 <th>Absolute change, 2010-2019
 </th>
 <th style="vertical-align: middle">Total seats in the <a href="/wiki/United_States_House_of_Representatives" title="United States House of Representatives">U.S. House of Representatives</a>, 2013–2023
 </th>
 <th style="vertical-align: middle">Estimated population per electoral vote, 2019<br/><sup class="reference" id="cite_ref-5"><a href="#cite_note-5">[note 2]</a></sup>
 </th>
 <th style="vertical-align: middle">Estimated population per House seat, 2019
 </th>
 <th style="vertical-align: middle">Census population per House seat, 2010
 </th>
 <th style="vertical-align: middle">Percent of the total U.S. population, 2018<br/><sup class="reference" id="cite_ref-6"><a href="#cite_note-6">[note 3]</a></sup>
 </th></tr>,
 <tr>
 <td align="center">1
 </td>
 <td align="center"><span class="sortkey" style="display:none">7000100000000000000♠</span>1
 </td>
 <td style="text-align: left;"><span class="flagicon"><img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/23px-Flag_of_California.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/35px-Flag_of_California.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/01/Flag_of_California.svg/45px-Flag_of_California.svg.png 2x" width="23"/> </span><a href="/wiki/California" title="California">California</a>
 </td>
 <td>39,512,223
 </td>
 <td>37,254,523
 </td>
 <td>6.1%
 </td>
 <td>+2,257,700
 </td>
 <td align="center"><span class="sortkey" style="display:none">7001530000000000000♠</span>53
 </td>
 <td>718,404
 </td>
 <td>745,514
 </td>
 <td>702,885
 </td>
 <td>11.96%
 </td></tr>,
 <tr>
 <td align="center"><span class="sortkey" style="display:none">7000200000000000000♠</span>2
 </td>
 <td align="center"><span class="sortkey" style="display:none">7000200000000000000♠</span>2
 </td>
 <td style="text-align: left;"><span class="flagicon"><img alt="" class="thumbborder" data-file-height="720" data-file-width="1080" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/23px-Flag_of_Texas.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/35px-Flag_of_Texas.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Texas.svg/45px-Flag_of_Texas.svg.png 2x" width="23"/> </span><a href="/wiki/Texas" title="Texas">Texas</a>
 </td>
 <td>28,995,881
 </td>
 <td>25,145,561
 </td>
 <td>15.3%
 </td>
 <td>+3,850,320
 </td>
 <td align="center"><span class="sortkey" style="display:none">7001360000000000000♠</span>36
 </td>
 <td>763,050
 </td>
 <td>805,441
 </td>
 <td>698,503
 </td>
 <td>8.68%
 </td></tr>,
 <tr>
 <td align="center"><span class="sortkey" style="display:none">7000300000000000000♠</span>3
 </td>
 <td align="center"><span class="sortkey" style="display:none">7000400000000000000♠</span>4
 </td>
 <td style="text-align: left;"><span class="flagicon"><img alt="" class="thumbborder" data-file-height="500" data-file-width="750" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/23px-Flag_of_Florida.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/35px-Flag_of_Florida.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/45px-Flag_of_Florida.svg.png 2x" width="23"/> </span><a href="/wiki/Florida" title="Florida">Florida</a>
 </td>
 <td>21,477,737
 </td>
 <td>18,801,310
 </td>
 <td>14.2%
 </td>
 <td>+2,676,427
 </td>
 <td align="center"><span class="sortkey" style="display:none">7001270000000000000♠</span>27
 </td>
 <td>740,611
 </td>
 <td>795,472
 </td>
 <td>696,468
 </td>
 <td>6.44%
 </td></tr>,
 <tr>
 <td align="center"><span class="sortkey" style="display:none">7000400000000000000♠</span>4
 </td>
 <td align="center"><span class="sortkey" style="display:none">7000300000000000000♠</span>3
 </td>
 <td style="text-align: left;"><span class="flagicon"><img alt="" class="thumbborder" data-file-height="640" data-file-width="1280" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Flag_of_New_York.svg/23px-Flag_of_New_York.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Flag_of_New_York.svg/35px-Flag_of_New_York.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Flag_of_New_York.svg/46px-Flag_of_New_York.svg.png 2x" width="23"/> </span><a class="mw-redirect" href="/wiki/New_York_(state)" title="New York (state)">New York</a>
 </td>
 <td>19,453,561
 </td>
 <td>19,378,102
 </td>
 <td>0.4%
 </td>
 <td>+75,459
 </td>
 <td align="center"><span class="sortkey" style="display:none">7001270000000000000♠</span>27
 </td>
 <td>670,812
 </td>
 <td>720,502
 </td>
 <td>717,707
 </td>
 <td>5.91%
 </td></tr>]
In [14]:
state_pop_list = []

for tr in my_trs[1:]:                          # skip the header row
    my_tds = tr.findAll("td")
    state = my_tds[2].text.strip()             # strip away the non-sense characters
    pop = my_tds[3].text.strip()
    state_pop = [state, pop]
    state_pop_list.append(state_pop)

state_pop_list[:4]
    
Out[14]:
[['California', '39,512,223'],
 ['Texas', '28,995,881'],
 ['Florida', '21,477,737'],
 ['New York', '19,453,561']]
In [15]:
df = pd.DataFrame(state_pop_list, columns=["State", "Population"])
df.head()
Out[15]:
State Population
0 California 39,512,223
1 Texas 28,995,881
2 Florida 21,477,737
3 New York 19,453,561
4 Pennsylvania 12,801,989
In [16]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   State       60 non-null     object
 1   Population  60 non-null     object
dtypes: object(2)
memory usage: 1.1+ KB
In [17]:
df
Out[17]:
State Population
0 California 39,512,223
1 Texas 28,995,881
2 Florida 21,477,737
3 New York 19,453,561
4 Pennsylvania 12,801,989
5 Illinois 12,671,821
6 Ohio 11,689,100
7 Georgia 10,617,423
8 North Carolina 10,488,084
9 Michigan 9,986,857
10 New Jersey 8,882,190
11 Virginia 8,535,519
12 Washington 7,614,893
13 Arizona 7,278,717
14 Massachusetts 6,949,503
15 Tennessee 6,833,174
16 Indiana 6,732,219
17 Missouri 6,137,428
18 Maryland 6,045,680
19 Wisconsin 5,822,434
20 Colorado 5,758,736
21 Minnesota 5,639,632
22 South Carolina 5,148,714
23 Alabama 4,903,185
24 Louisiana 4,648,794
25 Kentucky 4,467,673
26 Oregon 4,217,737
27 Oklahoma 3,956,971
28 Connecticut 3,565,287
29 Utah 3,205,958
30 Iowa 3,155,070
31 Puerto Rico 3,193,694
32 Nevada 3,080,156
33 Arkansas 3,017,825
34 Mississippi 2,976,149
35 Kansas 2,913,314
36 New Mexico 2,096,829
37 Nebraska 1,934,408
38 Idaho 1,787,065
39 West Virginia 1,792,147
40 Hawaii 1,415,872
41 New Hampshire 1,359,711
42 Maine 1,344,212
43 Montana 1,068,778
44 Rhode Island 1,059,361
45 Delaware 973,764
46 South Dakota 884,659
47 North Dakota 762,062
48 Alaska 731,545
49 District of Columbia 705,749
50 Vermont 623,989
51 Wyoming 578,759
52 Guam 165,718
53 U.S. Virgin Islands 104,914
54 American Samoa 55,641
55 Northern Mariana Islands 55,194
56 Contiguous United States 325,386,357
57 The fifty states 327,533,795
58 Fifty states + D.C. 328,300,544
59 Total U.S. (including D.C. and territories) 331,814,684
In [18]:
# Remove the last 4 rows that represents aggregate data

df2 = df[:-4]
df2
Out[18]:
State Population
0 California 39,512,223
1 Texas 28,995,881
2 Florida 21,477,737
3 New York 19,453,561
4 Pennsylvania 12,801,989
5 Illinois 12,671,821
6 Ohio 11,689,100
7 Georgia 10,617,423
8 North Carolina 10,488,084
9 Michigan 9,986,857
10 New Jersey 8,882,190
11 Virginia 8,535,519
12 Washington 7,614,893
13 Arizona 7,278,717
14 Massachusetts 6,949,503
15 Tennessee 6,833,174
16 Indiana 6,732,219
17 Missouri 6,137,428
18 Maryland 6,045,680
19 Wisconsin 5,822,434
20 Colorado 5,758,736
21 Minnesota 5,639,632
22 South Carolina 5,148,714
23 Alabama 4,903,185
24 Louisiana 4,648,794
25 Kentucky 4,467,673
26 Oregon 4,217,737
27 Oklahoma 3,956,971
28 Connecticut 3,565,287
29 Utah 3,205,958
30 Iowa 3,155,070
31 Puerto Rico 3,193,694
32 Nevada 3,080,156
33 Arkansas 3,017,825
34 Mississippi 2,976,149
35 Kansas 2,913,314
36 New Mexico 2,096,829
37 Nebraska 1,934,408
38 Idaho 1,787,065
39 West Virginia 1,792,147
40 Hawaii 1,415,872
41 New Hampshire 1,359,711
42 Maine 1,344,212
43 Montana 1,068,778
44 Rhode Island 1,059,361
45 Delaware 973,764
46 South Dakota 884,659
47 North Dakota 762,062
48 Alaska 731,545
49 District of Columbia 705,749
50 Vermont 623,989
51 Wyoming 578,759
52 Guam 165,718
53 U.S. Virgin Islands 104,914
54 American Samoa 55,641
55 Northern Mariana Islands 55,194
In [19]:
# remove the "," from th population string and convert it to integer

df2["Population"] = df2["Population"].apply(lambda pop: int(pop.replace(",","")))
df2
<ipython-input-19-0f95c1e093bc>:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[19]:
State Population
0 California 39512223
1 Texas 28995881
2 Florida 21477737
3 New York 19453561
4 Pennsylvania 12801989
5 Illinois 12671821
6 Ohio 11689100
7 Georgia 10617423
8 North Carolina 10488084
9 Michigan 9986857
10 New Jersey 8882190
11 Virginia 8535519
12 Washington 7614893
13 Arizona 7278717
14 Massachusetts 6949503
15 Tennessee 6833174
16 Indiana 6732219
17 Missouri 6137428
18 Maryland 6045680
19 Wisconsin 5822434
20 Colorado 5758736
21 Minnesota 5639632
22 South Carolina 5148714
23 Alabama 4903185
24 Louisiana 4648794
25 Kentucky 4467673
26 Oregon 4217737
27 Oklahoma 3956971
28 Connecticut 3565287
29 Utah 3205958
30 Iowa 3155070
31 Puerto Rico 3193694
32 Nevada 3080156
33 Arkansas 3017825
34 Mississippi 2976149
35 Kansas 2913314
36 New Mexico 2096829
37 Nebraska 1934408
38 Idaho 1787065
39 West Virginia 1792147
40 Hawaii 1415872
41 New Hampshire 1359711
42 Maine 1344212
43 Montana 1068778
44 Rhode Island 1059361
45 Delaware 973764
46 South Dakota 884659
47 North Dakota 762062
48 Alaska 731545
49 District of Columbia 705749
50 Vermont 623989
51 Wyoming 578759
52 Guam 165718
53 U.S. Virgin Islands 104914
54 American Samoa 55641
55 Northern Mariana Islands 55194
In [20]:
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   State       56 non-null     object
 1   Population  56 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 1.0+ KB
In [21]:
# it is already sorted in a descending order, so no need.

#df3 = df2.sort_values(by=["Population"],ascending=False)
#df3
In [22]:
fig = px.bar(df2, y="State", x="Population", orientation='h', height=800)

fig.update_layout(
    title='US Population by States',
    yaxis=dict(
        tickangle=0,
        showticklabels=True,
        type='category',
       # title='Xaxis Name',
        tickmode='linear'
    )
)
fig