… Of Sexy Smoothing Curves, Useless False Alerts and Web Analytics

Web Analysts are mostly men and generally men like curves and women, so now you partially understand part of the purpose of the title, for the other portion simply read the rest of the blog post. The job of a web analyst lays in-between three fields: (1) information systems, (2) marketing, and (3) statistics. Concerning my expertise, my strengths lays mostly on the last two components, since I have strong expertise in both marketing (doctoral-level + consulting) and statistics (doctoral-level + consulting). But mostly, I have strong training in thinking or if you prefer, as it was founded by Aristotle, in “logic”; actually for most of my life I have been paid to think, and more importantly, I have been paid to “think” to thereafter “act”, and I have acted too. Thus, based on the principle of “logic”, I have decided to write this post about one simple concept that is useful for web analytics purposes, the statistic principle of “smoothing”. I have decided to apply it to a “web analytics” context but I could also have easily applied it to multiple other contexts.

An Hypothetical (but Realistic) Scenario

Let’s take the evolution of a time-series representing the number of conversions on a website at the week-level. For the purpose of this example, I will refer to this conversion as a purchase, where the number of purchases is positively bounded between 0 and a finite number “S”. I decided to focus on the number of conversions instead of the conversion rate, since for most companies, what is most important from a financial perspective, are “absolute values” rather than “percentage values”. Let’s imagine you’re in charge of reporting this metric in a KPI fashion and to interpret. Here is below the graph and its associated table you have in front of you.

Evolution of Purchase Over Time
Figure 1 - Evolution of Purchase Over Time

So what would you do? Let’s say that you know there is nothing going wrong with observation 12 and don’t want to report a false alert (or a false alarm as they would say in “control engineering“) and therefore don’t want to alert anyone in the organization. You know that the only reason why observation 11 in Figure 1 is high is because of a short small offline advertising campaign that is quite hard to track (nobody listen to you when it comes to offline campaign, arghhh!). One last point, the crazy HiPPO in your organization wants “purchase” reported in a KPI fashion for sure, so you can’t escape from this hazardous task. So what would you clearly do fellow?

Which metric is the best?

1) Report the growth rate as a KPI? This would make (58-90)/90 = – 35.56%, this would create for sure an alert, which is a false alert as you already know it.
2) Draw a trend between the first period and the period before the last period and compare? (178-90)/12 = 7.33, 58/(100-7.33)) – 1 = – 29.84%, this would do the same thing as in 1) and would statistically suck anyway.
3) Do some eyeball regression? This would be excellent, since it would clearly identify that observation 11 was a special case, and that the first 4 observations were high because of Holidays. The problem is that you’re paid to report numbers, so where are the numbers fellow?
4) Do some basic smoothing?
Case 1 – Compare to a moving-average of order 3: (58 / ((90+68+54)/3)) -1 = -17.92%, still a drop but less.
Case 2 – Compare to a moving-average of order 4 or 5: (58 / ((90+68+54+58)/4)) -1 = -14.07%, still a drop but that’s better, you can stop to sweat a little.
Case 3 – Compare to a weighted-moving average of order 4 with the current weights (t-1 = 50%, t-2 = 25%. t-3 = 12.5%, t-4 = 12.5%) – (58 / (90*.5+68*.25+54*.125+58*.125))-1 = -28.40%, that’s quite a stupid approach since your objective is not to create a false alert buddy.
Case 4: Compare to a moving-average of order 3 by eliminating observation 11: 58/((58+54+68)/3)-1 = -3.33%, that’s the kind of results you want to show, but you eliminate one observation and it may look a little too optimistic.

In conclusion, the case 2 in the smoothing part in 4) would be my humble choice and should be favored compared to the growth ratio reported in 1). Furthermore, don’t forget that you need to present this KPI along with other KPIs in your analysis.

Historical Background on Smoothing and Moving Averages

No, smoothing is not the verb associated to the art of making “smoothies” as you have noticed. Smoothing is an exploratory tool that is used to approximate a function that attempts to capture important patterns in the data (the chapter 7 of the famous American statistician John W. Tukey book entitled “Exploratory Data Analysis” published in 1977 gives a great overview on the topic). Moving averages introduced in the previous paragraph, which are certainly the most known and simplest kind of smoothing method, date back to the seminal work of Scottish statistican Udny Yule at the beginning of the twentieth century (Yule 1909) citing English statistician Reginald Hooker (Hooker 1901). As shown in the calculations, moving averages involves to smooth a curve by assigning weight to neighbored previous observations, in this case for predicting a new observation and comparing to the actual observation to create a marvelous KPI. When I was teaching “Strategic Marketing” at HEC Montréal, I must admit I always wondered why we were teaching moving averages, since most time-series we were using at the time were mostly monthly and quarterly time series, smoothing mostly gave bad results. I thought I was only teaching these methods because they were on the curriculum, not because I thought they were actually useful. However, in web analytics, with data available at the day-level (and if we want, we can even dig deeper at the hour-level and minute-level), even though there is a lot to learn from variations, sometimes we also need to smooth those variations using smoothing methods such as moving averages or even more advanced methods which I may write about it if there is interest.

Exploratory Data Analysis - Smoothing
Exploratory Data Analysis , Smoothing & Smoothies


KPIs related to growth rate sometimes give erroneous results and smoothing methods such as moving averages may be useful. Enjoy logical web analytics, but also, enjoy the sexy smoothing curves (as previously discussed) while drinking smoothies!



Some References, which I think maybe Useful

Hooker, Reginald H. (1901) “Correlation of the Marriage-Rate with Trade,” Journal of the Royal
Statistical Society
, 64 (3), 485-492.

Tukey, John W. (1977). Exploratory Data Analysis, Addison-Wesley, Reading.

Yule, G. Udny (1909) “The Applications of the Method of Correlation to Social and Economic,” Journal
of the Royal Statistical Society
, 72 (4), 721-730.

Enter your email address below to subscribe to this blog

Delivered by FeedBurner

If you like this post, be sure to subscribe to Jean-Francois Belisle RSS feed!

8 thoughts on “… Of Sexy Smoothing Curves, Useless False Alerts and Web Analytics”

  1. I think it would have been useful to learn how to «smooth» data in class. Where I have work, the web analysis team’s job was essentially to present the monthly data in graphics and tables and to «smooth» it in the note section with different explanations for the results . Theses graphics and tables are than shown to the marketing team who actually «smoothed» it a second time before showing it to the clients.
    It was actually the most difficult part of my job.

  2. Thanks for the comment Vicki, I am actually happily surprised that smoothing was used that much for your work. I guess it could be taught in like class 8 of a 13-class course on Web Analytics. I am actually going to discuss about the topic with some people teaching web analytics courses to dig further on this issue.

  3. Hi – I appreciate what you were trying to do here but by showing so many options, it feels like you had a desired answer in your head and used statistics to manipulate the data until you got an answer you were comfortable with. Isn’t that kind of losing the point of performance reporting, let alone using web analytics to improve business performance? Especially if a different method is chosen each week depending on the available data?

    Wouldn’t it have been better to show that performance for one week should not neccessarily be compared against performance in previous weeks. Alternatives could be:
    – compare performance against a forecast which took into account the uplift from the offline marketing campaign and therefore the % difference for this week indicates a good or bad week.
    – segment the data to isolate traffic from the offline campaign showing how that spiked but that underlying conversions have remained consistent week on week.

  4. Thanks for your comments Peter, that’s quite appreciated. Here is my take on this one.

    1) The purpose of this task has two objectives: (1) to show how results can be different based on the approach taken and (2) to get an answer I am more comfortable with, and I believe in. Thus, this is pure data tweaking and this should not be done for firms with a strong culture of performance reporting. Also, I agree that in terms of data consistency, it completely misses the point.

    2) Concerning your comparison approach, its a way more mature approach and this is clearly where I am going in my next posts. However, I would prefer to expose less mature methods first and then go to more mature ones after to show the advantages. Forecasting with regressions or more advanced time-series approach such as Kalman Filtering or Sequential Monte Carlo are next on the line. Cool metrics related to the two alternatives you are proposing will then be brought to the table.

  5. Thanks for the clarification Jean-Francois. I have subscribed and looking forward to reading your ideas on other approaches. I know I don’t use enough statistics when analysing data myself and don’t feel it is necessary in a lot of cases but want to have the tools ready there for when I need them – hoping your future posts can help with that.

  6. Thanks Peter. I totally agree with you that many tasks give clear insights using only “exact” measurement using web analytics tools such as Google Analytics. I’m pretty sure some of my future posts may be useful to tackle some of these tasks from a statistical perspective.



Leave a Reply

Your email address will not be published. Required fields are marked *