Tapping into the ECHO Database
By Hubert Colas, PhD
Community water systems are defined as providing water to the same population year-round, which are effectively municipal-residential systems. This is different than the two other US EPA-defined classifications: non-transient non-community water systems and transient non-community water systems, which represent locations such as schools, office buildings, hospitals and public places like gas stations and campgrounds. US EPA requires all facilities that fall under its regulation to report violations. This includes drinking water, air emissions, surface water discharges and hazardous waste. All of the information is compiled into the agency’s Enforcement and Compliance History Online (ECHO) database. As a result, the ECHO database provides a wealth of data and information on water quality.
With water quality a major issue, data analytics added to a public dashboard can provide valuable insights. Data is becoming the new currency and technology has made it easier to collect and crunch numbers to provide real insight. Water professionals are required to report huge amounts of data but it is not always followed by analysis. Reading the data accurately, however, can deliver valuable insights to help improve water systems and deliver a better quality product.
US EPA created such a water dashboard to make it easy for the public to access the data. It provides charts of drinking water violations. Most of the data displayed is very general but there is an option to export specific subsets of violation data. By digging deeper into the data, water professionals can gain insight into the violations affecting their population. The drinking water dashboard is filtered to display all community water systems in the US and its territories. There is no option in the dashboard to only display data for 50 states but data can be filtered once exported so tribal lands and places like Puerto Rico can be removed. By filtering data, it can be displayed in a user-friendly format that is easy to understand. By creating an infographic (https://www.fluksaqua.com/en/benchmark/drinking-water-quality-violations-in-the-us/), important information on water quality can be accessed and understood by not only water professionals but by customers as well. Water quality affects everyone.
Exporting violation data through the ECHO dashboard
It is relatively straightforward to export fiscal year data from ECHO but additional steps are required for streamlining the information. The data, however, needs to be treated appropriately to provide accurate information. For pinpointing violations, the types of violations need to be identified so they can be exported. In the top right section of the ECHO dashboard, there is a bar chart of violations (see Figure 1). The drop-down menu in this section allows the user to specify whether they would like to explore all the violations, health-based violations, acute health-based violations, monitoring and reporting violations and public notification and other violations.
Health-based violations are threats to public health and acute health-based violations are a subset that are considered especially severe. Monitoring and reporting violations include failures to monitor and report results and public notification violations indicate failures to inform or educate the public about the violations or their drinking water in general. The five contaminants highlighted in Figure 1 are arsenic, nitrates, radionuclides, coliforms and DBPs. The chart is focused on only maximum contaminant level (MCL) violations because these are considered health-based violations. MCL violations indicate that a contaminant was present in the distribution system above legally required maximum values.
Once the health-based violations type is selected in the violations section, a table of the data can be accessed by clicking on a bar in the chart. Each bar corresponds to a fiscal year. (US EPA’s fiscal year runs from October 1 to September 30.) Selecting the fiscal year bar links to a new table. Two additional columns need to be added to this table to make data processing more straightforward. One of these columns is also essential in tying date data to the violations. Data can be added to the table by selecting the Analyze feature, which is accessed using the link at the bottom of the table.
It is important to note that although there is a folder containing return to compliance information (FS05), the data that can be exported here are not as complete as the detailed facility report data. Detailed facility reports are comprehensive web pages for each system in the ECHO database. These reports include detailed violation information, such as violation dates. Since there is no option to export dates when violations occurred, the facility reports must be scraped in order to achieve accurate results. Before processing, the data must be trimmed to 50 states so it can be used in the infographic format.
Once the table has been modified, the data can be exported. Some of the modifications in Figure 1 included a) removing tribal lands and territories, b) removing entries outside of the fiscal year and c) removing the rows of violations other than the 10 MCL violations for the five contaminants considered in the infographic. Usually, a CSV file is recommended because it removes any merged cells and other formatting which can increase file size and slow down processing times. The exported file must be opened and saved in an Excel format file.
Processing violation counts and population affected
With the exported data modified to include only violations of interests in locations of interest, several processing steps are applied to convert data into a table of violations in each state and a table of population affected in each state. The total violation count is created by simply counting the occurrences of the violations. Population affected is determined by using the population-served metric in each system, which identifies the number of people who access the utility for drinking water. The total population served by a system with a violation is determined and then divided against the total number of people served by water systems in the state. A list of active community water systems in the US can be exported from the ECHO dashboard and the total population served in each state can be determined from this data set
Determining violation dates and length
The ECHO dashboard does not export compliance period dates for violations. Compliance period dates describe the date range in which the violation occurred and are required for determining duration. While the dashboard does not export date data, dates are available in detailed facility reports in the database.
Detailed facility reports list all the violations for every facility regulated by US EPA, including drinking water systems. Each web page has a consistent structure and any facility report can be accessed by modifying the system ID at the end of the URL. While the facility reports can be scraped for data, the ECHO database is not designed for large-scale data transfers or robotic queries; the agency can disable users that initiate robotic, programmed queries. When extracting data from the facility reports, these guidelines need to be respected
Processing duration data
Once the information is acquired from the detailed facility reports, the data can be compiled into a table of average violation durations in each state. The violations exported from the ECHO dashboard are correlated against the facility reports based on System ID and Violation ID. Once a violation is matched to a facility report violation, compliance period begin dates, compliance period end dates and return to compliance dates (if available) can be assigned to each violation.
The duration of a violation is then determined in two ways:
1) The first method is applied when a violation has a known resolved date. In these cases, the duration is the length of time between the violation occurring and the violation being resolved. The date a violation occurs is defined as the last day of the violation’s compliance period. In some cases, violations are resolved within their compliance period. When this happens, violations are classified as having a duration of zero days. In some cases, compliance period data cannot be determined. The duration cannot be defined and these violations are excluded from the average calculations.
2) The second method is applied when a violation has an unknown resolved date. In these cases, the duration is the length of time between when the violation occurred and the final date of the fiscal year. In some cases, the violation occurred on the final day of the year. When this happens, violations are classified as having a duration of zero days. In other cases, the compliance period end date extends beyond the final day of the year. The duration cannot be defined and these violations are excluded from the average calculations.
The average duration in each state is calculated as the total number of days with unresolved violations divided by the number of applicable violations. An NA entry for average duration indicates that a state had one violation and the compliance period end day extended beyond the final day of the year.
The ECHO database is comprised of a wealth of information but collecting data is only useful when it is analyzed so water professionals can identify areas of improvement. Water quality affects the entire population; creating accessible information to display the achievements of the professionals responsible for the system is of paramount importance.
About the author
Hubert Colas, Eng, PhD, is President of the Americas at FluksAqua, online community created by a dedicated group of water and wastewater operators for their peers) since its inception in January 2015. Prior to that, over a 21-year span, he held multiple positions at BPR (now a Tetra Tech division), including President and GM of BPR CSO from 2004 to 2013, as well as a board member. Colas initiated and coordinated the R&D project that led to the development of BPR CSO’s state-of-the-art, real-time control technology applied to wastewater systems. With three decades of experience in water management, hydrology, hydraulics and real-time control of wastewater systems, he has acted as Project Director on many projects in Canada, the US and in Europe. Colas has presided over the Work Group on Real Time Control of Urban Drainage Systems of the Joint IAHR/IWA Committee on Urban Drainage, was a Water Environment Federation delegate for Réseau Environnement, and chaired the Canadian Junior Water Prize.