5 A/B Split Testing Mistakes Product Managers are responsible for

A/B Split Testing is the most powerful tool of a product manager for getting from gut feeling and HiPPO driven product decisions to actual progress based on measurable outcomes.
But while it’s comparably easy to asses a tool and implement your first testing setup, there are a couple of meta mistakes I see being made from time to time among product managers (and which I’ve made myself).

Here’s my personal selection and how to avoid them:

Sharing results too early with stakeholders

While ‘too soon’ is measured in two dimensions (more on that below), let’s just says there’s a certain point in time during which you just should not talk about the preliminary results of a running A/B test. Especially when you’re testing around a critical business KPI there are just too many dreams and visions emerging from (mostly non-technical) stakeholders – even though the test result doesn’t stand on solid ground.

In the past, I shared ongoing progress of a test we’ve implemented with my engineering team, in order to make their efforts implementing it transparent as possible. But when it came to upper management and marketing, I learned to wait a lot longer before I communicated a result. Best case is, you get approvals from two expert perspectives: A mathematical one (e.g. Digital Analyst/BI Manager) and a practical one (e.g. Director of Product or Product Management buddy).

A/B Testing for the sake of it

While A/B testing is something to aim for because of the above mentioned reasons, you shouldn’t run any testing idea which is floating through your regular stakeholder alignment meeting. While the costs of implementing and cleaning an A/B test up are important, it’s more about the environment you want to test in.

Do you actually have enough traffic flowing through your area if product in order to generate significant results within 4 weeks? Do you have the freedom to test variants which are differentiated enough from each other? Do you have the commitment from upper management to also implement the ‘unpopular’ version and they just let you A/B test in the first place because it’s a ‘modern approach’?

Solely relying on Significance Calculators

As I wrote earlier, the mathematical expert view on your test setup and results is a crucial one. But don’t let numbers alone influence your decision when a test is finished. You’ll run into scenarios in which a significance calculator will spit out a significant result after two days or so. But seriously, two days? Common sense should tell you to at least run a test through every day of the week twice in order to balance-out ‘seasonal’ impacts.
Also, significance calculators don’t know every twist and angle of your product and its dependencies like you do. So only you will be able to spot outliers and potential bugs influencing your testing setup.



 

Confusing A/B Test results with user feedback

‘Version A won, so our users like this one more!’ While this assumption doesn’t seem far-fetched, it’s quite misleading. It’s the perfect example why quantitative and qualitative testing (and research) need to go hand in hand for a complete picture.
Even though A/B testing tells you what users are doing (like clicking 5 times more likely on the yellow button instead of the red one), it doesn’t tell you why people are doing it. It’s the unfortunate discrepancy between what people say they do and what they actually do.

So, be happy about the measurable uplift your latest winning version has caused, but make sure to check back it’s mid- to long-term impact on your product by conducting some 1:1 user interviews digging into what users actually made them click more. E.g. in the worst case you just locked the users tighter into a checkout process and they had no other choice then to proceed. Again: Correct action to impact your target KPI, but it’s helpful to know the context around it.

Focussing on the wrong KPI to influence

Speaking of KPIs: When setting up your test, it’s important to only work towards an increase or decrease (e.g. when working on a churn topic) of the very metric you can directly influence with a test. Especially when implementing a test within a multi step funnel, you can never be sure to impact the very last checkout metric when only testing improvements within the address data step.
It’s intriguing to assume that all conditions before or after your area of testing stay the same, but then again you perform an A/B test to stop assuming, right?

What were you’re most impactful when designing, implementing and analysing A/B tests? Personally, my biggest learnings not only originated in practical doing but also tons of support from my former colleagues Patrick, Timo and Gudrun.