Analyzing User Traffic and Website Performance in SQL
In this blog post I summarize my experience taking the Advanced Data Analytics self-paced course as designed and published by Maven Analytics.
Note this is paid course that is available on the Maven learning platform. For this reason, I will not share related data files and other learning materials that are otherwise proprietary. I will however feature a few examples of business problems that peaked my interest along with queries I executed to gather insights.
The course covers the following topics as outlined below:
Traffic Analysis and Optimization: Identify top traffic sources, measure their conversion rates, analyze trends, and use segmentation for bidding optimization
Website Measurement and Testing: Find the most-visited pages and top entry pages, calculate bounce rates, build conversion funnels, and analyze test
Channel Analysis and Optimization: Compare marketing channels, understand relative performance, optimize a channel portfolio, and analyze trends
Product-level Analysis: Analyze sales, build product-level conversion funnels, learn about cross-selling, and measure the impact of launching new products
User-level Analysis: Learn about behaviors of repeat visitors and purchasers, and compare new and repeat visitor website conversion patterns
Click here to view the overview of the eCommerce Db.
1. Traffic Analysis and Optimization
Where did you come from, where did you go? Where did you come from, Cotton-Eye Joe?…
Traffic source analysis is about understanding where your customers are coming from and which channels are driving the highest quality traffic. If you have engaged in online shopping know that each interaction on the web results in a trail of cookies ("breadcrumbs" of the internet) that are leveraged by businesses to gather customer behavioural insights.
*Q: While we’re on Gsearch, could you dive into nonbrand, and pull monthly sessions and orders split by device type?
Building on the prior query this question introduces more complexity by drilling down on device_type and utm_campaign. Note that the select statement contains CASE-WHEN statements. The count case pivot method is SQLs version of Excel Pivot Charts whereby unique values in one column are transformed into multiple columns in the output and performs aggregations on any remaining column values.
It can be helpful to parse a single case-when statement to shine light on how this method works and is being applied in an eCommerce context.
In this example, the CASE-WHEN statement applies conditional as informed by the question to sort data into new fields. Consider the first record appearing in the screenshot. The data reveals that user 102 entered a webpage on Mar 19' using a desktop. The NULL order_id indicates that while the user spent time on the website their actions did not result in an item(s) purchased. As a result the conversation rate (cvr) would be 0%.
The COUNT function counts all non NULL values contained in relevant fields grouped by year-month. The output indicates that traffic from non-brand desktop users is significantly more likely to convert relative to mobile.
USE mavenfuzzyfactory;
SELECT
YEAR(website_sessions.created_at) AS yr,
MONTH(website_sessions.created_at) AS mo,
-- employ count case pivot method by device type
COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_sessions,
COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN orders.order_id ELSE NULL END) AS desktop_orders,
COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_non_brand_cvr,
COUNT(DISTINCT CASE WHEN device_type = 'mobile' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_sessions,
COUNT(DISTINCT CASE WHEN device_type = 'mobile' THEN orders.order_id ELSE NULL END) AS mobile_orders,
COUNT(DISTINCT CASE WHEN device_type = 'mobile' THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_non_brand_cvr
FROM website_sessions
LEFT JOIN orders
ON orders.website_session_id = website_sessions.website_session_id
WHERE website_sessions.created_at < '2012-11-27' -- prior to prescribed date
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
GROUP BY 1,2;
Q: Could you perform Day-parting analysis to help us understand how much support staff the business should have at different times of day?
For those interested in operations analytics this is a great example of how website traffic data is leveraged to inform staffing levels.
The subquery is necessary to define (new) relevant fields transformed using aggregate functions and grouping. Once established the CASE-WHEN statements are applied.
What I found helpful was to Google how MySQL defines date & time functions. In this way the CASE-WHEN statements are applied correctly to avoid misunderstanding what values represent, e.g. WEEKDAY() output of 1 is equal to Sunday, or that a WEEKDAY() value of 7 is the last day of a week.
Lastly, applying data visualizations techniques such as conditional formatting can greatly improve readability for stakeholders.
SELECT
-- using data from daily_hrly_sessions perform count case pivot method
hr_of_day,
ROUND(AVG(website_sessions),2) AS avg_sessions,
ROUND(AVG(CASE WHEN week_day = 0 THEN website_sessions ELSE NULL END),2) AS mon,
ROUND(AVG(CASE WHEN week_day = 1 THEN website_sessions ELSE NULL END),2) AS tues,
ROUND(AVG(CASE WHEN week_day = 2 THEN website_sessions ELSE NULL END),2) AS weds,
ROUND(AVG(CASE WHEN week_day = 3 THEN website_sessions ELSE NULL END),2) AS thurs,
ROUND(AVG(CASE WHEN week_day = 4 THEN website_sessions ELSE NULL END),2) AS fri,
ROUND(AVG(CASE WHEN week_day = 5 THEN website_sessions ELSE NULL END),2) AS sat,
ROUND(AVG(CASE WHEN week_day = 6 THEN website_sessions ELSE NULL END),2) AS sun
FROM (
-- subquery at the website_sessions lvl AS daily_hrly_sessions
SELECT
DATE(created_at) AS created_at,
WEEKDAY(created_at) AS week_day,
HOUR(created_at) AS hr_of_day,
COUNT(DISTINCT website_session_id) AS website_sessions
FROM website_sessions
WHERE created_at BETWEEN '2012-09-15' AND '2012-11-15' -- prescribed window of time
GROUP BY 1,2,3 -- group by date-time field granuality i.e. date --> weekday --> hour of day
ORDER BY WEEKDAY(created_at)
) AS daily_hrly_sessions
GROUP BY 1
ORDER BY 1
2. Website Measurement and Testing
Q: It would be great to show a full conversion funnel for lander/ vs. /home and compare the two pages to orders between Jun 19 – Jul 28.
Conversion funnel analysis is about understanding and optimizing each step of your user’s experience on their journey. In this example the marketing manager wants to know if by "tweaking" the existing homepage to drive more sales for the business.
The subquery executes at the level of website sessions from gsearch non-brand traffic. CASE-WHEN statements serve to create flags or markers indicating progression through the funnel (1 = visit; 0 = no visit). In such cases where a single session results in multiple pageviews the website_session_id is repeated. Note if users revisited a previously viewed webpage within the same session the system did not track such behavior.
Thus each^session creates a matrix of 1s and 0s. Therefore, by applying group by functions to the subquery of "sessions matrices" we can collapse the matrix to a single record highlighting only the maximum values (1 > (0 = 0)). Below are two records featuring progression through the funnel. Given the nature of A/B testing there should not be a single instance of a web session that resulted in flags for both the /home and /lander1 pages.
CREATE TEMPORARY TABLE session_level_made_it_flagged
SELECT
page_view_lvl_flags.website_session_id,
MAX(homepage_flag) AS saw_homepage,
MAX(lander1_flag) AS saw_lander1,
MAX(product_flag) AS saw_product,
MAX(mrfuzzy_flag) AS saw_mrfuzzy,
MAX(cart_flag) AS saw_cart,
MAX(shipping_flag) AS saw_shipping,
MAX(billing_flag) AS saw_billing,
MAX(thankyou_flag) AS saw_thankyou
FROM
(SELECT
website_sessions.website_session_id,
website_pageviews.pageview_url,
CASE WHEN pageview_url = '/home' THEN 1 ELSE 0 END AS homepage_flag,
CASE WHEN pageview_url = '/lander-1' THEN 1 ELSE 0 END AS lander1_flag,
CASE WHEN pageview_url = '/products' THEN 1 ELSE 0 END AS product_flag,
CASE WHEN pageview_url = '/the-original-mr-fuzzy' THEN 1 ELSE 0 END AS mrfuzzy_flag,
CASE WHEN pageview_url = '/cart' THEN 1 ELSE 0 END AS cart_flag,
CASE WHEN pageview_url = '/shipping' THEN 1 ELSE 0 END AS shipping_flag,
CASE WHEN pageview_url = '/billing' THEN 1 ELSE 0 END AS billing_flag,
CASE WHEN pageview_url = '/thank-you-for-your-order' THEN 1 ELSE 0 END AS thankyou_flag
FROM website_sessions
LEFT JOIN website_pageviews
ON website_pageviews.website_session_id = website_sessions.website_session_id
WHERE
website_sessions.created_at < '2012-07-28'
AND website_sessions.created_at > '2012-06-19'
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
) AS page_view_lvl_flags
GROUP BY
page_view_lvl_flags.website_session_id
;
SELECT
CASE
WHEN saw_homepage = 1 THEN 'saw_homepage'
WHEN saw_lander1 = 1 THEN 'saw_lander1'
ELSE "check for error"
END AS segment,
COUNT(DISTINCT CASE WHEN saw_product = 1 THEN website_session_id ELSE NULL END) AS prod_click_rate,
COUNT(DISTINCT CASE WHEN saw_mrfuzzy = 1 THEN website_session_id ELSE NULL END) AS mrfuzzy_click_rate,
COUNT(DISTINCT CASE WHEN saw_cart = 1 THEN website_session_id ELSE NULL END) AS cart_click_rate,
COUNT(DISTINCT CASE WHEN saw_shipping = 1 THEN website_session_id ELSE NULL END) AS shipping_click_rate,
COUNT(DISTINCT CASE WHEN saw_billing = 1 THEN website_session_id ELSE NULL END) AS billing_click_rate,
COUNT(DISTINCT CASE WHEN saw_thankyou = 1 THEN website_session_id ELSE NULL END) AS thankyou_click_rate
FROM session_level_made_it_flagged
GROUP BY
segment
...
5. User-level Analysis
Q: Could you calculate the min, max, and avg time between first and second sessions for repeat visitors to the website?
For longer data programs that require multiple subqueries temporary tables serve to keep organized improving readability amongst other benefits. In addition, a temp table is visible only within the current session, and is dropped automatically when the session is closed. Note MySQL and other dbs have a max memory usage. Be mindful of the 'active' sessions as they will stack. When querying multiple tables limit results to relevant data only.
-- DROP TEMPORARY TABLE IF EXISTS new_and_repeat_web_sessions;
CREATE TEMPORARY TABLE new_and_repeat_web_sessions
SELECT
new_sessions.user_id,
new_sessions.website_session_id AS new_sessions,
website_sessions.website_session_id AS repeat_sessions,
new_sessions.created_at AS first_visit_date,
website_sessions.created_at AS repeat_visit_date
FROM
(SELECT
created_at,
website_session_id,
user_id,
is_repeat_session
FROM website_sessions
WHERE
is_repeat_session = 0 -- first time visiting website
AND created_at >= '2014-01-01'
AND created_at < '2014-11-03'
) AS new_sessions
LEFT JOIN website_sessions
ON website_sessions.user_id = new_sessions.user_id
AND website_sessions.website_session_id > new_sessions.website_session_id
AND website_sessions.created_at >= '2014-01-01'
AND website_sessions.created_at < '2014-11-03';
The first subquery filters website_sessions by the prescribed dates, as well, to ensure results are specific to users visiting the website for the first time. Executing a COUNT() reveals 1/3 of users in the table are new. The next step is to perform a self-join to account for [new visits + return] vs. [new visits + no_return]. Take note of the highlighted record below - each record should tell a story that is grounded in the context of the question.
In the subsequent query "one and done" users are removed from the result set. Grouping is applied at the user level before calculating differences in dates. Lastly, the marketing manager requested aggregate statistical values for said users, therefore the final output should show a calculation of the min, max, and avg time to revisit the website.
-- DROP TEMPORARY TABLE IF EXISTS days_bw_first_second_sessions;
CREATE TEMPORARY TABLE days_bw_first_second_sessions
SELECT
user_id,
DATEDIFF(second_session_date,first_visit_date) AS days_between_sessions
FROM
(SELECT
user_id,
new_sessions,
first_visit_date,
MIN(repeat_sessions) AS second_session_id,
MIN(repeat_visit_date) AS second_session_date
FROM new_and_repeat_web_sessions
WHERE repeat_sessions IS NOT NULL
GROUP BY 1,2,3
) AS first_second_session
;
SELECT
MIN(days_between_sessions) AS min_days_bw_sessions,
MAX(days_between_sessions) AS max_days_bw_sessions,
AVG(days_between_sessions) AS avg_days_bw_sessions
FROM days_bw_first_second_sessions








