Visualizing Three Decades of Canada's Greenhouse Gas Emissions
Background
Human activities contribute to directly greenhouse gas emissions. The largest sources of greenhouse gases are attributed to the burning of fossil fuels for energy production. The principal byproduct that results from the combustion of fossil fuels is CO2, among other harmful gases.
As CO2 is released into the atmosphere accumulation of molecules forms greenhouse gas that absorbs and radiates heat from the earth's surface. Essentially greenhouse gases reduce the ability of the earth to "cool itself off". As a result, higher levels of pollutants circulate in the same atmosphere that humans (among other species) respire air from and impact climate.
Compared to the first industrial revolution in the early-mid 1800s, ambient CO2 levels have risen from 400 parts per million (ppm) to 421 ppm and show no sign of lowering so long as the burning of fossil fuels continues across the globe.
To keep the earth from warming beyond levels known today, many countries have set climate goals that seek to transform fossil fuel-dependent industries into cleaner more efficient, and sustainable green economies.
Canada is one such country proposing to reduce emissions by 45% from 2005 levels by the year 2030. Not only will it take a concerted effort to meet the target, but there is also the challenge of estimating carbon footprint measurements.
Here lies the importance of establishing a coordinated effort to collect, analyze, and report high-quality data to inform the public and industry regulators that oversee emissions standards.
In this blog post, I summarize the steps taken to import, transform, and visualize data. In addition to serving as a project to practice data analytics, the intended result is to arrive at a dashboard that others can use to draw their own inferences from Canada's commitment to transforming its economy.
Importing and Transforming Data
OpenData Canada provides access to Canada's official GHG emission inventory. Click the link for more information.
Should you choose to follow along or create your own dashboard download the files from my GitHub repo.
Data is saved as a .csv file format with sheets containing emissions data - one for each province. Below is table with emissions data populated for the Quebec. The wide formatted table contains is a time-series dataset with units of emissions as values by sector and sub-sector, etc.
Since the goal is to build a dashboard, cleaning, aggregating, and reshaping data are requirements prior to importing to a data visualization tool such as Tableau. This is no small task. Consider the format of column A, for example. Each sector is without a unique field with spaces used to convey or delimit the data hierarchy.
Apart from the notes in rows A55:62, there isn't a data dictionary to reference. Understanding the relationship between emissions categories is important. Using the Sum() function in Excel validates the relationship between sector categories. For example, "Natural Gas Production" is its own sub-sector whereas "Mining and Extraction" roll up into "Oil Sands".
Knowing this I created a template table for each of the 10 provinces in Canada. Rows that served the purpose of aggregating other categories were removed leaving a table with dimensions 35x37 (including date cols in wide format). Each table was appended the next with the appropriate province name assigned in col: B.
After the aggregate rows were removed I copy and pasted the emissions values (mt CO2 equivalent) from each date value into the master table.
Data Cleaning
With the emissions values populated for each province, we are ready to clean the data. The following are general notes put forward by the group responsible for preparing the results.
Notes:
- Totals may not add up due to rounding.
- 0.0 Indicates emissions of less than 0.05 Mt CO2 eq were truncated due to rounding.
- "- "Indicates no emissions
I reduced the impact of rounding errors (thereby increasing precision) by expanding each cell from 1 to 4 decimal places (0.0 --> 0.0000). Despite the notes, smaller values were hidden by formatting. The number of cells showing 0 values increased significantly as a result of this change. As an aside, when moving values to a new cell are sure to preserve the formatting. As for cells containing "-", these were treated as 0.0 by Tableau so I left them as is.
Other data cleaning steps included: trim() spaces from each of sector column in cols A:F to avoid unwanted blank_spaces, adding "NA" to sectors without subsectors, and histograms to check for outliers.
Rehshaping Data in Power Query
Wide-format data isn’t always suitable for importing into statistical software or data visualization tools, however, it can be useful for reporting tables to a reader. Unpivotoing rotates columns of a table-valued expression into column values. An example is shown below.
For a short and simple guide on how to convert data from wide to long format click here. After unpivoting the emissions dataset, the table dimensions are now 8 cols x 13,753 rows.
Data visualization: Tableau
Lastly, save and import the .csv file to [Tableau Public (https://public.tableau.com/app/discover). After you open Tableau, under Connect, click Excel. Select the Excel workbook you want to connect to, and then click Open. Double-check each field to ensure that fields and values were converted to the intended format.
Having manipulated the data into the desired format and data structure, the final step is to thoughtfully design a dashboard that does something useful. In this case, the end-user I have in mind is the layman person, that is, my fellow Canadians that are keen to use data to drive insights.
Source: Tableau Public Link





