Human-Readable ◉‿◉

Electric Vehicle Sale Trends in the US: An Exploratory Analysis

Background:

There has been plenty of 'buzz' surrounding the electric vehicle (EV) revolution and its potential to significantly reduce greenhouse gas emissions that contribute to climate change. This excitement is amid a broader market shift towards electromobility more generally. But is the hype cycle surrounding EV just that, or is there enough substance to convince potential for a just transition from the internal combustion engine (ICE).

Historically auto manufacturers (OEMs) have been reluctant to transition from ICE vehicles. Meanwhile, companies such as Tesla 'bet the house' on future EV demand.

The sentiment toward EVs has shifted since the early-to-mid 2010s. Prior to Tesla's debut of the company's flagship Model S sedan in 2012, there were plenty of myths, misconceptions, and product constraints for the EV marketing team to confront (e.g. too slow, heavy, expensive, range limited, and lacking charging infrastructure). Moreover, with Americans paying little at the pump ($) and price parity not yet achieved, owning and operating an EV will remain at most a nicety for most.

Nonetheless, the unfolding narrative of global EV adoption remains an intriguing theme to follow and continues to captivate my attention and interest in analyzing relevant data on the topic. In this post, I analyze time-series data from two datasets to explore trends in EV adoption in terms of cars or units sold in the US market. Code and data visualizations are provided throughout with a summary of key findings at the end.

Dataset:

Github repo

Data Science Tools:

Diving into the Data

Data Preparation:

In the interest of keeping this analysis to a tolerable length below is a bulleted list of steps taken to ready the dataset for analysis. Note: the .sql file contains the cleaned version.


Exploratory Analysis:

Data viz tools help paint the big picture and patterns within the dataset in a way that spreadsheets can not.

The histogram below indicates the majority of monthly sales were concentrated (right-skewed) between 0k-10k units sold.

Without taking into consideration the element of time we can assume much of demand or supply-side economics. We might jot down a few assumptions to be addressed at a later stage in the analysis.

Perhaps the slow ramp-up in production was due in part to supply-chain constraints (e.g. battery material sourcing) or low consumer market demand. Another assumption might be that traditional OEMs didn't have the same pricing leverage for EVs as ICE, where the cost of producing vehicles at a low scale meant strategically capping volume due to high-opportunity costs for low-margin sales opportunities.

Outliers:

Before analyzing the data further let's plot another bar chart to check for the presence of outliers that could skew the results if not accounted for early on.

The y-axis gives a sense of how many models were sold in aggregate over an 8-year period (including years with 0 units sold). By filtering data at the model level in ASC order we see who the 'big players are in this market segment. Another assumption might be that segment leaders with respect to units sold were also first to market with their product. Again, we'll want to check said assumptions later in the analysis.

Drilling down deeper into the dataset we can observe the mean and range for units sold per model per year over an eight-year window of 15,459 (1-300,471) calculated from data shown in the bar chart below.

[PHEV+EV units sold per model, per year]

Since we are blind to how this data was collected (including assumptions made in the process) getting our hands dirty with the finer details should pay dividends when considering the validity of the dataset and its limitations.

Let's tackle the minimum value of the range. It's suspicious that only 1 unit would be sold for a given month. Maybe selling 1 unit was sufficient to meet an incentive program by the government or the vehicle was wildly unpopular/unknown for reasons not captured in the dataset. Let's make a note of this and move on.

As for units sold at or around the max value (300,471) sales are attributed to the Tesla-branded models specifically the Model 3. The patterns of exponential growth are apparent over time and give few if any reasons to replace or remove those values from the dataset.

Data Analysis

Since vehicle sales are the focus of the dataset let's start by plotting a line graph that calculates annual sales growth by model for each year in the dataset. With more than 50 models sold the best way to preserve the readability of the visualization is to draw a distinction between PHEV and EV (motor type). Instead, we can plot separate line graphs, and include a filter to toggle between the two. See below.

A few observations are worth mentioning. This is our first insight into the first movers in this market segment - Chevrolet, Ford, and Toyota. We can infer that said companies launched 1 or more models in the year 2012 followed by steady sales growth thereafter. Analysis: Without access to company financials it's difficult to conclude if the first mover status was advantageous from a sales perspective. Certainly the idea of a favorite brand "going green" would attract headlines and brand following.

Eyeballing the x-axis from left to right we see the cumulative number of distinct (colored) lines (each representative of an OEM) increases as a function of time. Recall having mentioned laggard OEMs in the background section, well, according to the data the majority of OEMs did not sell either a PHEV or an EV until the mid-to-late-2010s. In particular, Mini, Kia, and Chrysler were slow to get to market (2017). Further analysis to address the 'why' in question would be conjecture at best.

The data implies not all of the pioneers benefited from jumping out ahead of the pack. For example, Honda and Cadillac were early, and yet both performed poorly relative to peers with sales remaining flat over the better part of the decade.

Shifting gears a bit to EV sales. Having plotted a similar chart for Make x EV we see that a total of 11 OEMs sold EVs to US-based customers.

Of those eleven companies, Tesla and Nissan quickly become the clear market leaders. From 2012 to 2017 the popular Nissan Leaf held the top spot only to be succeeded by Tesla, attributed to impressive YoY sales demand for the Model S in the year 2012, followed by the Model X (2015), and a meteoric rise in demand for the Model 3 (2017).

Of the few OEMs that were selling EVs in and around that time horizon, most have since retired those models. This is an important observation and one that speaks to the logistical, technological, financial, and even the role of politics in business that amount to complexities and constraints of transitioning manufacturing practices and overall company direction.

A great example to highlight is the now retired Jaguar I-PACE - having sold a combined 4,533 units since 2018. This disappointing performance came in light of winning 62 international awards for car of the year among other titles. Talk about a disconnect between industry expectations and product-market fit.

As discussed earlier, the data shows PHEV and EV sales growth increased over time. However, we need to be mindful of the term segment in that PHEV/EV represents a segment (or piece of the total pie, not the pie itself). Therefore, in absolute terms, EV sales growth is not all that relevant unless the rate of growth is greater than total vehicle sales in the US.

Let's join table .total_vehicle_sales. to calculate the annual growth rate of PHEV and EV, both separately and combined, and relative to the percentage of total vehicles sold in the US. These calculations will provide insight into overall market penetration.

SELECT 
	Date,
	total_ev,
	total_phev,
    (total_ev + total_phev) AS total_electric,
    TOTAL AS total_vehicles
FROM vehicle_sales
JOIN 
	(SELECT
		b.Date AS date,
        total_ev,
        total_phev
	FROM (SELECT 
			YEAR(DATE) AS Date,
			SUM(Monthly_Sales) AS total_ev
		FROM ev_sales
		WHERE Motor_Type = 'EV'
		GROUP BY YEAR(Date)) AS B
	JOIN (SELECT 
			YEAR(DATE) AS Date,
			SUM(Monthly_Sales) AS total_phev
		FROM ev_sales
		WHERE Motor_Type = 'PHEV'
		GROUP BY YEAR(Date)) AS A
	ON B.date = A.Date) AS C
ON vehicle_sales.YEAR = C.Date

The chart shows total vehicle sales peaked in 2016 and then declined modestly thereafter, whereas PHEV & EV increased significantly YoY. Moreover, in 2016 EV sales eclipsed PHEV sales for the first time. At face value, the pace of EV growth is significant, especially when considering that only several hundred units were sold in 2012 as compared to nearly 8 million in 2019.

Alternatively, we can plot a line chart to look at market penetration in percentage terms.

Hold up. At this point in the analysis, it's become apparent that the EV dataset might have more than a few flaws. In reference to the chart above, is it possible that 46% (or more than 1/2 of new cars sold) were EVs? Not likely. If not for referencing the U.S. Bureau of Economic Analysis dataset I may have omitted this insight.

According to a report by Pew Research "as of 2020 nearly 1.8 million EV were registered in the U.S... globally U.S. represents... about 17% of the world’s total stock of 10.2 million EVs". Given what we know from our exploratory analysis (e.g. 7.8mm EV sold in 2019), we can infer that the Kaggle dataset values are inaccurate.

We can conclude that the data is of low quality and therefore should be exempt from informing (real-world) decision-making. Moving on.

Best Selling Brands

Let's gain insight into the brands that came out on top and consistently so by querying the top five best-selling vehicles sold in 2015, 2017, and 2019.

(SELECT
    GROUP_CONCAT(DISTINCT(YEAR(Date))) AS date,
    GROUP_CONCAT(DISTINCT(make)) AS make,
    model,
    GROUP_CONCAT(DISTINCT(motor_type)) AS motor_type,
    SUM(monthly_sales) AS annual_sales
FROM ev_sales
WHERE YEAR(Date) = '2015'
GROUP BY model
ORDER BY SUM(monthly_sales) DESC
LIMIT 5)
UNION
(SELECT
    GROUP_CONCAT(DISTINCT(YEAR(Date))) AS date,
    GROUP_CONCAT(DISTINCT(make)) AS make,
    model,
    GROUP_CONCAT(DISTINCT(motor_type)) AS motor_type,
    SUM(monthly_sales) AS annual_sales
FROM ev_sales
WHERE YEAR(Date) = '2017'
GROUP BY model
ORDER BY SUM(monthly_sales) DESC
LIMIT 5)
UNION
(SELECT
    GROUP_CONCAT(DISTINCT(YEAR(Date))) AS date,
    GROUP_CONCAT(DISTINCT(make)) AS make,
    model,
    GROUP_CONCAT(DISTINCT(motor_type)) AS motor_type,
    SUM(monthly_sales) AS annual_sales
FROM ev_sales
WHERE YEAR(Date) = '2019'
GROUP BY model
ORDER BY SUM(monthly_sales) DESC
LIMIT 5)

For those recreating the analysis with the embedded code note that the results from MySql were reshaped in Excel prior to importing to Tableau.

In this chart, the shared x-axis lends itself to comparisons between years. Blue bars represent EVs; orange PHEVs. Over a 5-year period, we see the same brands remain at the top. A few drawbacks of this design include busy graphics and that the vehicle models featured in at least one of the three top 5 lists are also featured in the other three lists regardless of their ranking.

Breakdown of Sales by Vehicle and Motor Type

Let's explore the most recent sales data from 2019. We'll use a pie chart to investigate sales by vehicle type and motor type. Note, there are multiple permutations given the total number of combinations is six In this case, we are keen to visualize total sales from 2019 as one pie chart. Multiple pie charts (one for each subcategory) could introduce unnecessary confusion. Besides, stakeholders may want the 30,000 foot-view before diving deeper into the data.

Almost 95% of vehicles sold belong to the consumer car vehicle type. It's surprising to note that so few EV & PHEV SUVs and pick-up trucks given consumer popularity for these vehicle types. Without more insight into the auto industry, it's hard to know what is limiting the types of product offerings.

In Excel write a formula to table extract quarters for the month of the year for a given date value in table .ev_sales. The formula to accomplish this step is shown below.

Now that each date has a corresponding quarter assigned create a second table to populate and transform int --> percentiles. The main inputs we need from the previous table are: year, quarter, and sales value. In Excel apply the SUMIFS function to calculate the percentiles shown in the table below.

According to Investopedia, there are two peak seasons for auto sales - March through the end of May and Sept through the end of Nov. Even though quarterly intervals don't map onto mentioned date ranges we might expect Q2 and Q3 periods to have higher sales volumes than Q1 and Q4.

In contrast to the source mentioned above, sales in Q1 were actually lower than in Q2 and Q3, however, Q4 sales consistently outperformed.

It's fair to assume that QoQ differences may have flattened out (less lumpy) as more competitors entered the market segment. It's also possible that the sample size too large to detect [seasonality] (https://www.investopedia.com/terms/s/seasonality.asp) through visual observation alone. For example, the same descriptive analysis when conducted within a specific model (e.g. Chevy Volt) could introduce the level of specificity that makes those trends more apparent.

Conclusion

The main takeaway from this project was that identifying earlier that you have bad data, or the calculation is wrong will save time and embarrassment later. Data scrubbing techniques can yield clues that something is amiss, however, opting not to compare data to other datasets could result in inaccurate reporting and therefore bad business decision-making.

Lastly, when in doubt use good old common sense! Even if your level of domain knowledge is limited, there are certain results that just don't make sense. And remember, nothing in life is free, including Kaggle datasets.

Key Takeaways:

#Tableau #car sales #data science #electric vehicles #excel #mySql