Introduction to A/B and MVT: Optimization Testing 101

This is the first in a series of posts based on the material covered in our latest webinar: Taking Your Site Performance to The Next Level With Optimization Testing.

Part 1 provides brief overview of what site testing is and describes the differences between usability testing, A/B split testing and multivariate methods.

What is optimization testing?

First let’s talk about what it’s not.

Optimization testing is not the same as “user testing” or “usability testing.” With usability testing, you ask a small sample of Web users from your target market to perform various tasks under observation in a “lab” environment. The goal is to use qualitative research to uncover usability issues on your site, and to understand how people navigate your website.

The insight you can glean from user tests is very valuable. Often test subjects talk out loud as they perform tasks, so you can discover not just what people get stuck on but why they get stuck. For example, a comment like “I hate when sites ask for my email address in the checkout. I don’t want to receive spam” may prompt you to put a brief explanation of why you ask for an email address (to send confirmation email and receipt, rather than promotional messages).

The shortcomings of usability testing, however, are many. User testing requires time to select test participants, write questionnaires, conduct tests, analyze them and compile meaningful data. You may also need to compensate test subjects monetarily. Users are also under observation performing prescribed tasks, which may not be how your users really behave in the wild. And working with a small sample, you cannot collect statistically significant data. Quantifying the impact of a design on conversion rate, and more importantly, revenue, is impossible with usability testing.

Enter A/B and MVT

A/B split and MVT (multivariate testing) allow you to quantitatively test the impact of changes to your website on your key performance metrics. Rather than implement changes based on your gut-feel, the design bias of an agency or the opinions of a HiPPO (highest paid person in the organization), you can test your ideas on your real customers and use hard numbers to prove or disprove your hypotheses.

A/B testing

A classic A/B test sends 50% of traffic to a “control” version (existing page, element or process on your site), and 50% to a “treatment” or test version.

If you want to be conservative, you might show 80% your control and 20% your treatment to reduce the negative impact on your success metrics. If the experiment is a disaster, you’ve only exposed 20% of your visitors to it. However, when you veer away from a 50/50 split, you no longer have a 1-to-1 comparison. Your results will not be as reliable. Your test will also run longer to achieve statistical significance.

You may wish choose to include only a percentage of visitors in your test, or restrict your test to a certain user segment (e.g. only new visitors who are less likely to have seen your existing version).

A/B testing is not limited to version A and version B. You can test a control against up to 9 different versions. So, an A/B/C/D/E/F/G/H/I/J test is still an A/B test. The expert mathematicians have determined this maximum (I just choose to take their word for it)!

Univariate testing

A univariate test involves multiple versions of one variable, such as a headline or shopping cart button. Univariate is an A/B split test, though you are testing multiple versions of the variable.

Multivariate testing

When multiple variables are tested at once (e.g. the combination of thumbnail image, size, and color of cart button), it’s called a multivariate test.

The number of test versions depends on A) the number of variables and B) the number of versions of each variable. For example:

Variables = thumbnail image, cart button size, cart button color

Thumbnail images = 2 (one on model, one flat image)
Button size = 2 (small, large)
Button color = 4 (red, orange, green, blue)

Total test versions = 2 x 2 x 4 = 16

Or,

V1: Model image, small, red button
V2: Model image, small, orange button
V3: Model image, small, green button
V4: Model image, small, blue button
V5: Model image, large, red button
V6: Model image, large, orange button
V7: Model image, large, green button
V8: Model image, large, blue button
V9: Flat image, small, red button
V10: Flat image, small, orange button
V11: Flat image, small, green button
V12: Flat image, small, blue button
V13: Flat image, large, red button
V14: Flat image, large, orange button
V15: Flat image, large, green button
V16: Flat image, large, blue button

Using every possible combination in your test is called “full-factorial” testing. Otherwise, you are using a “fractional factorial” design. This includes the Taguchi method and others. Fractional factorial tests may save time, as it’s quicker to reach statistical significance – but they also are not as reliable. Any version you exclude from your test is possibly the best performing. Fractional factorial testing methods were developed for the manufacturing industry, where prototypes were expensive to develop. That’s not the case with websites.

Radical redesigns

A “radical redesign” tests multiple variables at once, but is not be confused with multivariate testing. A radical redesign tests one different look-and-feel vs. another (or another and another and another…) therefore is an A/B test.

For example, you might test your existing site design against a proposed new design with a completely different navigation menu, search box, home page merchandising, calls to action and home page content.

It’s recommended that you begin with a radical redesign test before you start tweaking individual elements with univariate or multivariate testing. Your goal is to further optimize a top-performing design, rather than invest time in a sub-optimal one. It’s also a good idea to split test redesigns over time rather than just flip the switch on your customers, like Amazon did with its latest major makeover.

Now you know the differences between user testing, A/B split and multivariate testing. Join us next time for Part 2: Why Test? Discover how site testing could be the single most profitable marketing activity you could invest in.

Related Articles

7 Responses to “Introduction to A/B and MVT: Optimization Testing 101”

  1. Dennis says:

    Hi Linda, nice article can I republish the first part (including Enter A/B and MVT) on our blog with a link to this article?

    • Hi Dennis, we’re fine with quotation of a paragraph with a link back to our post. In this case, since it’s just the introduction, sure go ahead up to Enter A/B and MVT.

  2. Alvin Tan says:

    Great information as usual.

    Also, despite the obvious expository, as opposed to argumentative, nature of this article, what would make this article more complete is to link to some of your older articles on the pitfalls of split testing. Anyone who wants to adopt A/B testing should be familiar with A/A testing and the concept of statistical significance before treating the results as useful and actionable.

  3. Amadesa says:

    Great into for A/B and MVT. Regarding fractional factorial tests not being reliable – what would be the point of running an unreliable test? In order to believe the results once the test concludes, reliability is a requirement up front.

    Regarding your comment “fractional factorial testing methods were developed for the manufacturing industry, where prototypes were expensive to develop. That’s not the case with websites.” Exactly! Traffic is [relatively] cheap and making website changes is nothing like changing a physical part in an automobile.

    Ultimately, the best scenario for MVT is to run the test and automatically remove poor performing variations as they become statistical significant. Enter “automated MVT”. See here for more: http://amadesa.com/blog/lala

    Thanks,
    Matt

  4. Amit says:

    Hi Linda,
    How would you define testing a completely new interface, like twitter did with a headline on top saying “Welcome to #NewTwitter! Read up on what’s new. You can still access old Twitter for a limited time.” This way i guess the user has a choice rather than a compulsion to be on one or the other interface.

    Would this qualify as a Radical Redesign?

    Thanks.
    Amit

  5. Hammer Bamhare says:

    I have just applied for a job asking about A/B and MVT testing. This is a brilliant article.

Leave a Reply

© 2014 Get Elastic Ecommerce Blog. All rights reserved. Site Admin · Entries RSS · Comments RSS