Web Analysts are mostly men and generally men like curves and women, so now you partially understand part of the purpose of the title, for the other portion simply read the rest of the blog post. The job of a web analyst lays in-between three fields: (1) information systems, (2) marketing, and (3) statistics. Concerning my expertise, my strengths lays mostly on the last two components, since I have strong expertise in both marketing (doctoral-level + consulting) and statistics (doctoral-level + consulting). But mostly, I have strong training in thinking or if you prefer, as it was founded by Aristotle, in “logic”; actually for most of my life I have been paid to think, and more importantly, I have been paid to “think” to thereafter “act”, and I have acted too. Thus, based on the principle of “logic”, I have decided to write this post about one simple concept that is useful for web analytics purposes, the statistic principle of “smoothing”. I have decided to apply it to a “web analytics” context but I could also have easily applied it to multiple other contexts.
An Hypothetical (but Realistic) Scenario
Let’s take the evolution of a time-series representing the number of conversions on a website at the week-level. For the purpose of this example, I will refer to this conversion as a purchase, where the number of purchases is positively bounded between 0 and a finite number “S”. I decided to focus on the number of conversions instead of the conversion rate, since for most companies, what is most important from a financial perspective, are “absolute values” rather than “percentage values”. Let’s imagine you’re in charge of reporting this metric in a KPI fashion and to interpret. Here is below the graph and its associated table you have in front of you.
So what would you do? Let’s say that you know there is nothing going wrong with observation 12 and don’t want to report a false alert (or a false alarm as they would say in “control engineering“) and therefore don’t want to alert anyone in the organization. You know that the only reason why observation 11 in Figure 1 is high is because of a short small offline advertising campaign that is quite hard to track (nobody listen to you when it comes to offline campaign, arghhh!). One last point, the crazy HiPPO in your organization wants “purchase” reported in a KPI fashion for sure, so you can’t escape from this hazardous task. So what would you clearly do fellow?
Which metric is the best?
1) Report the growth rate as a KPI? This would make (58-90)/90 = – 35.56%, this would create for sure an alert, which is a false alert as you already know it.
2) Draw a trend between the first period and the period before the last period and compare? (178-90)/12 = 7.33, 58/(100-7.33)) – 1 = – 29.84%, this would do the same thing as in 1) and would statistically suck anyway.
3) Do some eyeball regression? This would be excellent, since it would clearly identify that observation 11 was a special case, and that the first 4 observations were high because of Holidays. The problem is that you’re paid to report numbers, so where are the numbers fellow?
4) Do some basic smoothing?
Case 1 – Compare to a moving-average of order 3: (58 / ((90+68+54)/3)) -1 = -17.92%, still a drop but less.
Case 2 – Compare to a moving-average of order 4 or 5: (58 / ((90+68+54+58)/4)) -1 = -14.07%, still a drop but that’s better, you can stop to sweat a little.
Case 3 – Compare to a weighted-moving average of order 4 with the current weights (t-1 = 50%, t-2 = 25%. t-3 = 12.5%, t-4 = 12.5%) – (58 / (90*.5+68*.25+54*.125+58*.125))-1 = -28.40%, that’s quite a stupid approach since your objective is not to create a false alert buddy.
Case 4: Compare to a moving-average of order 3 by eliminating observation 11: 58/((58+54+68)/3)-1 = -3.33%, that’s the kind of results you want to show, but you eliminate one observation and it may look a little too optimistic.
In conclusion, the case 2 in the smoothing part in 4) would be my humble choice and should be favored compared to the growth ratio reported in 1). Furthermore, don’t forget that you need to present this KPI along with other KPIs in your analysis.
Historical Background on Smoothing and Moving Averages
No, smoothing is not the verb associated to the art of making “smoothies” as you have noticed. Smoothing is an exploratory tool that is used to approximate a function that attempts to capture important patterns in the data (the chapter 7 of the famous American statistician John W. Tukey book entitled “Exploratory Data Analysis” published in 1977 gives a great overview on the topic). Moving averages introduced in the previous paragraph, which are certainly the most known and simplest kind of smoothing method, date back to the seminal work of Scottish statistican Udny Yule at the beginning of the twentieth century (Yule 1909) citing English statistician Reginald Hooker (Hooker 1901). As shown in the calculations, moving averages involves to smooth a curve by assigning weight to neighbored previous observations, in this case for predicting a new observation and comparing to the actual observation to create a marvelous KPI. When I was teaching “Strategic Marketing” at HEC Montréal, I must admit I always wondered why we were teaching moving averages, since most time-series we were using at the time were mostly monthly and quarterly time series, smoothing mostly gave bad results. I thought I was only teaching these methods because they were on the curriculum, not because I thought they were actually useful. However, in web analytics, with data available at the day-level (and if we want, we can even dig deeper at the hour-level and minute-level), even though there is a lot to learn from variations, sometimes we also need to smooth those variations using smoothing methods such as moving averages or even more advanced methods which I may write about it if there is interest.
KPIs related to growth rate sometimes give erroneous results and smoothing methods such as moving averages may be useful. Enjoy logical web analytics, but also, enjoy the sexy smoothing curves (as previously discussed) while drinking smoothies!
Some References, which I think maybe Useful
Hooker, Reginald H. (1901) “Correlation of the Marriage-Rate with Trade,” Journal of the Royal
Statistical Society, 64 (3), 485-492.
Tukey, John W. (1977). Exploratory Data Analysis, Addison-Wesley, Reading.
Yule, G. Udny (1909) “The Applications of the Method of Correlation to Social and Economic,” Journal
of the Royal Statistical Society, 72 (4), 721-730.
If you like this post, be sure to subscribe to Jean-Francois Belisle RSS feed!