5 Common A/B & Multivariate Testing Mistakes

Though site testing can have many benefits, these common A/B and multivariate testing “gotchas” can lead you down the wrong path. Don’t make these mistakes!

1. Leaving Web Analytics Un-Optimized

Before you get started testing, you want to make sure your web analytics are set up properly, not only so you can do proper analysis before designing your test, but so you are tracking the right things during and after the test.

Make sure your “goal paths” (the common navigational paths your customers take to reach your conversion goal, e.g. checkout) are configured so you can view your funnel abandonment. Also ensure revenue is properly tracked. If possible, include your COGS (cost of goods sold) to determine profit per visitor.

In Google Analytics, “Profiles” allow you to slice and dice your data by applying permanent segmentation rules, such as filtering out international traffic or restricting data to a sub-domain or store section of your website. Profiles do not work with historical data, so they must be applied before you start testing.

2. Not Understanding Customer Segments

Your site testing exists to improve your website performance, but averages may be hiding the real issues on your site. You may have an average bounce rate on your home page of 59%, but if you segmented by visitor type, you would discover new visitors bounce at 75% and returning visitors at 34%. So instead of setting a goal to reduce overall bounce rate, your goal might change to reduce new visitor bounce rate. Likewise, there are differences between domestic and international visitors, email subscribers, affiliate referrals, paid and natural search and comparison engine referrals. Your site may have a mix of B2B and B2C offerings. If you only analyze your data in aggregate, you’ll design the wrong tests and apply them to the wrong visitors.

Sadly, not all testing tools allow you to segment visitors – including Google Website Optimizer. There are workarounds – if your ecommerce platform uses targeted selling, you may be able to create custom pages that are only served to certain visitor segments, and then split-tested by your testing tool.

3. Applying the “Radical Redesign” Concept to Individual Variables

If you caught our recent article Choosing Between A/B or Multivariate Test Design, you’ll recall that there are 2 approaches to A/B tests: univariate and radical redesign. Univariate tests variations of one variable, while radical redesigns throw a bunch of things at the wall at once. Radical redesigns allow you to identify which design achieves best results, so you don’t end up micro-tweaking the design you currently have with multivariate tests when you’ve left better performing layouts on the table.

The downside to radical redesigns is when you change more than one thing at a time, you’re never sure what element is responsible for any observed improvement. Sometimes a univariate split test is conducted for what should actually be a multivariate test, because the one variable that is being tested is subtly radically redesigned across versions.

For example, you may wish to test a large thumbnail image with a real baby modeling a sleeper vs. a small thumbnail with the garment laying flat. The thumbnail image is one variable, but each version actually has 2 branched variables. To be truly valid, the test should include:

Large thumbnail, baby model
Large thumbnail, flat garment
Small thumbnail, baby model
Small thumbnail, flat garment

Therefore, it is actually a multivariate test. A true univariate test might be a small, medium and large image size of the garment modeled by a baby.

4. Measuring the Wrong KPIs

It’s easy to become myopic about conversion rate, but improved conversion rate is not the goal for every page. Few customers convert directly from the home page, rather they click through to a deeper page. The goals for the home page might be reduced bounce rates, increased click throughs, and repeat visits. Pay attention to the relationship between key performance indicators, or KPIs. Sometimes when one metric goes up, another goes down – you may celebrate higher average order value while overall revenue tanked. Track everything that is important.

You also want to measure revenue per visitor for most tests, especially when testing offers and prices. Conversion rate might go up, while profitability goes down!

5. Running a Test Too Short

Declaring a winner too soon (before enough data is collected and the test reaches statistical significance) increases your chance of a false positive or negative. Evan Miller shares a great example of the danger of this on his blog.

Related Articles

7 Responses to “5 Common A/B & Multivariate Testing Mistakes”

  1. Linda, a most excellent overview of what might take the average person a lifetime to implement. Okay, maybe not that long, but definitely we are talking about some late evenings!

    Who says there is no need for an IT/IS department in today’s age of internet savvy grandmothers?

  2. Dennis says:

    I think is good to educate people indeed a bit more on AB testing. I like the example on changing size and image at the same time and not seeing 4 variables in there.

    Our tool at http://www.reedge.com is also no ready to pickup these four variables and sees it as two. But I think the tools will become more sophisticated in the future and adjust the tests and reports to show real results.

    Since I think this should be something not everyone needs to know. The tool needs to help you and when it says 95% change to beat original it should be true.

    So its good to educate now, but I think its better when we makers of these tools go back and build it into the tool.

  3. Alvin Tan says:

    I think one thing that is difficult to determine is causation; just because everything is the same except that one feature doesn’t mean that all variables are constant. There are many variables (e.g. time of the day, visitor type, sample size) that are essentially out of control, no matter how their fluctuations are mitigated. Of course, some would think that this issue is merely of “academic interest” but to completely ignore this and place 100% faith in optimization testing is negligent at best. A/A testing is a perfect example to prove this point.

  4. Like #4, it’s all to common that people hunt “increased conversion rate”. A good example is if you test a cross-sell page that appears when people click “buy”. It often will lower conversion rate as there is more friction, and a few visitors will leave as a result, but if it increases avg. order value or profit per order for the rest of the converted visitors then the overall business impact might be positive.

    @Alvin, great point, which is why A/B/A testing is always a good idea if you got the time or you are testing small tweaks. Or just check up on your previous test by running them again a month or two later, to see if the result still is the same.

  5. re#1 is a clanger that all too often goes unnoticed with internal corp. traffic skewing any true end user behaviour. (I´ve seen it re-ocurr when IT dept. change internal company IP addresses without notice and marketing analytics teams are not informed!)

    re #2 is important to note. The devil really is in the detail. However it is possible with some intermediate level tweaking and plugins to segment and target a/b tests even with GWOptimizer. See the excellent: http://community.btbuckets.com/page/google-website-optimizer

  6. Linda
    The point about running the test too short -
    - the way most A/B test tools run tests, they use all the data that was collected during the testing period for computing statistical significance. That is simply wrong, statistical tests are designed to work random sampling with samples of size no more than 400. (I give reference to academic article in my post).
    When everything is held constant, increasing sample sizes increases statistical significance – even minor variances will look statistically significant while they may have no economic significance.
    You see the methodological errors with A/B testing in my article here http://iterativepath.wordpress.com/2010/06/26/8-flaws-in-ab-split-testing/

    -rags

  7. Interesting article. Has anyone ever tried experiemnting with alternative need building questions? We found these had a dramatic impact on how long visitors viewed a page ie ‘Discover why vistors dont want to do business because of your website’ compare to ‘Increase your website lead value by 20%’ sometimes the simple things deliver the best results!

Leave a Reply

© 2014 Get Elastic Ecommerce Blog. All rights reserved. Site Admin · Entries RSS · Comments RSS