Environmental Enforcement Watch

Last updated: May 27th, 20202020-05-27Project preview

Exploring ECHO Data in Your Area

This workbook is a way to quickly view data from EPA's Enforcement and Compliance History Online portal that is relevant just to your area. It is designed to work with the ECHO Exporter file.

How to run this notebook:

  1. In the cell below, replace the zip code with your own 5-digit U.S. ZIP code (keep it in quotation marks. For example: my_zip = "98115"
  2. Go to the Runtime menu and click "Run all": Screen Shot 2020-02-26 at 3 26 14 PM
  3. That's it! It might take a minute or two to run and generate all the reports.
In [4]:
my_zip = "92115"

Below this point, everything is calculated automatically

You don't need to interact with it in order to get it to work, but if you want to dive deeper, you can use it to get started exploring!

In [5]:
data_location = "https://github.com/edgi-govdata-archiving/echo-data/blob/master/ECHO_EXPORTER.csv?raw=true" # Where the ECHO data is saved
In [6]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import folium
In [7]:
# Define columns of interest (see the echo_exporter_columns xlsx file that comes bundled with the csv download)
# This is not a comprehensive list of columns; more are available.
# This dictionary maps the column titles to their data types, to allow for faster import

# Note to self - right now mapping everything not explicitly a number as a string, might be an issue later
column_mapping = {
    "REGISTRY_ID": str,
    "FAC_NAME": str,
    "FAC_ZIP": str,
    "FAC_LAT": float,
    "FAC_LONG": float,
    "FAC_QTRS_WITH_NC": float,
    "CAA_PERMIT_TYPES": str,
    "CWA_PERMIT_TYPES": str,
    "GHG_CO2_RELEASES": float
In [ ]:
# Get the data
echo_data = pd.read_csv(data_location, usecols = list(column_mapping.keys()), dtype=column_mapping)

How many facilities in your zip code are tracked in the ECHO database?

ECHO is shorthand for Enforcement and Compliance History Online, the major repository of the U.S. Environmental Protection Agency for public data on its oversight and enforcement activities. ECHO contains information on most if not all of the facilities regulated by the agency for their compliance with our major environmental laws. (EPA’s “about ECHO data” page)

In [ ]:
# Filter to just your zip code
my_echo = echo_data[echo_data["FAC_ZIP"] == my_zip]

num_facilities = my_echo.shape[0]
print("There are %s facilities in %s tracked in the ECHO database." %(num_facilities, my_zip))
In [ ]:
# Let's show a quick map of your area and the facilities in it

def mapper(dataframe):
    # Initialize the map
    m = folium.Map(
        location = [my_echo.mean()["FAC_LAT"], my_echo.mean()["FAC_LONG"]],
        zoom_start = 11

    # Add a clickable marker for each facility
    for index, row in dataframe.iterrows():
            location = [row["FAC_LAT"], row["FAC_LONG"]],
            popup = row["FAC_NAME"]

    # Show the map
    return m

map_of_facilities_in_my_area = mapper(my_echo)

What permit types have been issued in this zip code?

ECHO contains data on EPA’s permitting systems, a chief means by which it administers three major national environmental laws, for clean air (the Clean Air Act, or CAA), clean water (the Clean Water Act, or CWA), and hazardous waste handling (RCRA, or the Resource Recovery and Conservation Act). Potential or actual emitters of any of these kinds of pollution have to receive a permit from the EPA before they can legally operate, setting a clear limit on the pollution they can emit. Permitted firms can still pollute some, but only within limits set by the permit. The agency is supposed to set the allowed emissions at a level that avoids threats to human health and the environment. When a facility’s emissions stay within the permitted limits, it is considered “in compliance.” When its air, water, or hazardous waste emissions exceed the terms of its permit, that’s a violation.

In [1]:

# Get a DataFrame with just the columns relating to permit type
permits = my_echo[permit_cols]
# Count how many non-null values are in each column
counted_permit_types = permits.count()

# Print how many values are present for each column (permitting law)

# Graph the number of permits by which law they correspond to
plt.bar(counted_permit_types.keys(), counted_permit_types)
plt.ylim(top = num_facilities) # so the top of the graph is the total # of facilities in the region
plt.title("Number of Permits of Various Types in %s" %(my_zip))
NameError                                 Traceback (most recent call last)
<ipython-input-1-20544d267e44> in <module>
      3 # Get a DataFrame with just the columns relating to permit type
----> 4 permits = my_echo[permit_cols]
      5 # Count how many non-null values are in each column
      6 counted_permit_types = permits.count()

NameError: name 'my_echo' is not defined

To learn more about the several types of permits offered under the Clean Air Act (operating, New Source Review, and others), see EPA’s explanation here. To learn about the major permitting system under the Clean Water Act, the National Pollutant Discharge Elimination System (NPDES), see EPA’s explanation here. For more about the EPA’s permitting program for handlers of hazardous waste, click here.

In its air and water programs, EPA draws a line between “major” polluters, which require permits, and “minor” polluters, few of which do. Air polluting facilities, for instance, are classified as “major,” and must apply for operating permits, if they actually or potentially emit at least 100 tons of a general air pollutant in a year, or 10 tons of a single “hazardous” air pollutant or 25 of a combination of hazardous chemicals per year. Most “non-major” air polluters, releasing less than those amounts, aren’t required to have permits unless classified as an especially dangerous industry, like smelting or chemical production.

In [ ]:
# Drilling down into what types of permits have been issued

# Define a function for counting permit types
def count_permits(permit_law):
    # Find all the possible permit types
    permit_types_with_nan = my_echo[permit_law].unique().tolist()
    # Remove null value as a permit type
    permit_types = [i for i in permit_types_with_nan if str(i) != "nan"] # note that nan values fail to be counted even when left in
    # Define a dictionary to save counted permits
    permits_issued = {}
    # For each permit type...
    for permit_type in permit_types:
        # Count those unique values and save them corresponding to their permit type
        permits_issued[permit_type] = my_echo[my_echo[permit_law] == permit_type].shape[0]
    # Return a tuple naming the law the permit is issued under and a dictionary counting its issued permits
    return (permits_issued)

# For each permit type
for permit_law in permit_cols:
    counted_permits = count_permits(permit_law)
    # Print the raw data
    # Plot a pie chart breaking down type of permit in each category
    plt.pie(counted_permits.values(), labels = list(counted_permits.keys()))

See above for the explanation of “major” and “non-major” air permits; similar distinctions also guide permitting in the water (CWA) and hazardous waste (RCRA, CERCLA) programs. LQG refers to “Large Quantity Generators” of hazardous wastes, generating 1,000 kilograms per month or more of hazardous waste or more than one kilogram per month of acutely hazardous waste; SQG to “Small Quantity Generators,” generating less than that, as well as VSQGs or “very small quantities generators,”– yielding 100 kilograms or less per month of hazardous waste or one kilogram or less per month of acutely hazardous waste.

Are there facilities in my region not in compliance with their permits in the last three years?

ECHO registers compliance —that a facility’s emissions are within the limits allowed by its permit(s) — by quarter. Those shown to be exceeding their permitted emissions, either through inspections, evaluations, or monitoring data of various kinds, are considered noncompliant, and in violation. It is worth noting that emissions averaged over a quarter of a year can fail to reflect the level of danger posed by a single and sudden but burst, such as from a massive spill or fire.

In [ ]:
# How many facilities have been out of compliance in the last 12 quarters?

noncompliant = my_echo[my_echo["FAC_QTRS_WITH_NC"] > 0].sort_values(by="FAC_QTRS_WITH_NC", ascending=False)
num_noncompliant = noncompliant.shape[0]
plt.pie([num_noncompliant, num_facilities - num_noncompliant], labels=["Noncompliant", "Compliant"], autopct='%1.1f%%', shadow=True)

plt.title("%s of %s Total Facilities Noncompliant in %s in the last 12 qtrs" %(num_noncompliant, num_facilities, my_zip))

Where are these noncompliant facilities?

In [ ]:
map_of_noncompliant_facilities = mapper(noncompliant)

Which facilities are out of compliance?

In [ ]:
# Which facilities aren't in compliance?

print("Facilities in %s noncompliant in the last 12 quarters:" %my_zip)
cols_for_print = ["FAC_NAME", "FAC_QTRS_WITH_NC"]
In [ ]:
# More details on noncompliant facilities
print("Facilities in %s noncompliant in the last 12 quarters:" %my_zip)



Greenhouse Gases

In [ ]:
my_ghg = my_echo[my_echo["GHG_CO2_RELEASES"].notna()]

plt.pie([my_ghg.shape[0], num_facilities - my_ghg.shape[0]], labels=["Reporting GHG Emissions", "Not Reporting GHG Emissions"], autopct='%1.1f%%', shadow=True)

plt.title('Of the %s facilities reporting to ECHO in %s, %s report greenhouse gas emissions.' %(num_facilities, my_zip, my_ghg.shape[0]))

Stats on CO2 releases for the facilities that are reporting: Total Facility Emissions in metric tons CO2e (excluding Biogenic CO2) from the most recent reporting year.

In [ ]:
print("'Count' is the number of facilities reporting; all of the other numbers are statistics on the greenhouse gas emissions of all of the facilities in %s, measured in metric tons of CO2e." %my_zip)

Bonus: Try entering the mean (from above) into https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator and see what an average facility in your zip code emits into the atmosphere compared to you.

In [ ]:
plt.bar(my_ghg["FAC_NAME"], my_ghg["GHG_CO2_RELEASES"])
plt.title("Total Facility Emissions from the most recent reporting year")
plt.ylabel("metric tons CO2e (excluding Biogenic CO2)")

Next questions

What other questions would you like to see added? Here are some I have:

  • What are the top 3 noncompliant facilities in the zip code and what are they violating?
  • Which types of noncompliance are we experiencing here?
  • Beyond "significant" – how much over their permits are they?

Please suggest questions (click "New Issue") on the Github page for this project. Maybe we can answer them together!

In [ ]:
In [ ]:
Notebooks AI
Notebooks AI Profile20060