Skip to Main Content

Mar 3, 2010 | 1 minute read

A/B Test Case Study: Can Split Test Results Be Trusted?

written by Janis Lanka

In the spirit of the Winter Olympic games that ended last week, I would like to talk about the "winner" of the final A/B experiment we did for the Official Vancouver 2010 Olympic Store. We already achieved significant improvement with help of previous tests on the checkout process, product details page, and home page. So, shortly before Games time, in collaboration with Wider Funnel, we underwent our fourth A/B experiment on product list page template to take a last stab at conversion optimization.

Testing the Product List Page

The image below is the control version:

We looked at our previous tests and after several hypotheses and investigations we produced two alternative variations with following changes:

A) Introduced a vertical menu that shows all subcategories for easier access to other products
B) Provided color thumbnails to products that have alternative colors
C) With help of recommendation engine (cross-sell), showed most popular products from that specific category to increase revenue from top selling items
D) Introduced a minor face-lift to filtered navigation to improve usability

Treatment A:

Treatment B:

What We Learned

During the experiment all variations were equally distributed to 100% of all traffic. This was another tough experiment where even 2272 transactions and 10 days did not provide a statistically significant winner. But we gathered just enough visitor and ecommerce data for a decision to be made.

Variation B was chosen because, according to Google Website Optimizer (GWO), it was converting better than the control by 7.74%

What Surprised Us: Control vs. Control

Additionally, we wanted to do a little test on GWO itself. We created another variation which was an exact copy of the control variation. Maybe it was a "statistical coincidence," but this alternative "exact" variation performed 4.97% better! We didn't do this for any other tests, and, thus, can't confirm if there is a pattern for such behavior. So this is up for discussion. Have you tried a similar A/A test and found similar results?

This post is contributed by Janis Lanka (@janislanka, who manages front-end development for Elastic Path Software.