A/B Testing and Stastistical Significance

Asmita Pradhan
Mar 12, 2024
4 min read

As a prelude to this article please check the earlier published article about

"A Deep Dive into Our Latest Marketing Channel Campaign with Python"

Check out my blog post https://wix.to/EMUwnYJ

A/B testing for marketing: Another experiment that was part of this analysis was A/B testing, which is a randomized experiment which evaluates which variant performs better. This experiment needs a desired outcome, clear control and that each variant in the test should have only one major change from the control. The focus of this experiment was half the emails were generic upsells and other half were personalized messaging.

Allocation to the variant was relatively even and randomized and it was ensured that each user has only one subscription outcome by using the groupby() and max() methods and finally unstacking the DataFrame. Series of outcomes were created for each variant where it equals 'True' id user subscribed and 'False' if did not finally taking the mean of each series. Let us look if this was significant;

When performing the test it is important to consider what is the lift? which meant was the conversion rate higher for the personalized and by how much? Life is the difference between the personalized conversion and the control conversion and dividing by the control conversion rate. The result was the relative percent difference of personalized compared to control.

In data analysis and especially in the context of A/B testing or marketing experiments, "lift" often refers to the relative increase in a certain metric when comparing two different groups. Here, the lift of 38.85% with respect to personalization compared to the control group suggests that the personalized intervention or treatment resulted in a 38.85% increase in the specified metric compared to the baseline (control group).

The metric being measured is sales conversion rate, a lift of 38.85% would mean that the group receiving the personalized treatment had a 38.85% higher conversion rate compared to the group that didn't receive personalization.

This indicates that personalization has had a positive impact on the metric being analyzed, suggesting that it's an effective strategy or intervention in this context.

Another way of calculating statistical significance used was by conducting two-sample ttest. This test uses the mean and sample variance to determine if variation between two samples occurred by chance. The t-test gives us t-statistic and a p-value. Typically a t-statistic of 1.96 evaluates to a p-value of 0.05 which translated to a 95% significance level. To run the ttest I imported scipy package and ttest_ind() function which takes a list of outcomes for each variant. The outcomes in this test were weather or not each user was converted to subscriber. As we see below different variants were tested and a p-value less than 0.05 is considered statistically significant at 95% significance level we see which factors are responsible for the outcomes.

The observations from the age groups 0-30 yrs shows positive lift values. This indicates a significant increase in performance or effectiveness of personalization compared to the control group. Specifically, the metric being measured has increased by 121.4%, 106.24% and 161.19% respectively, when compared to the control group. This positive lift suggests that whatever personalization was offered has had a substantial positive impact within this age group.

Similarly, the t-statistic measures the difference between the two groups in terms of standard error units. A negative t-statistic indicates that the treatment group age group 0-30 yrs has performed significantly differently from the control group. In this case, a t-statistic of -3.0 suggests that the difference observed between the treatment group and the control group is statistically significant.

The p-value is a measure of the probability of observing the results (or more extreme results) given that the null hypothesis is true. A p-value of 0.00306 indicates that there is only a 0.306% probability of observing the results (such as the observed lift and t-statistic) if there were actually no difference between the groups (i.e., if the null hypothesis were true). Since this p-value is less than 0.05, we can conclude that the observed difference in performance between the age group 19-24 years and the control group is statistically significant.

In summary, within the age group 0-30years, the intervention or treatment being analyzed has led to a significant increase in performance compared to the control group. This difference is statistically significant, as indicated by the negative t-statistic and the low p-value.

Likewise, in the age group 30-55+ yrs it was observed that the lift is a negative number and ttest is a positive outcome indicating that there has been a significant decrease in performance or effectiveness compared to the baseline. Specifically, the metric being measured has decreased by -100%,-72.22% and -85.23%when compared to the control group or baseline condition. This negative lift implies that intervention i.e.: email personalization did not have a positive impact and actually resulted in a decrease in conversion rate for this age group.

A positive t-statistic value of 2.431,3.81, 2.06, 3.32 indicates that the difference observed between the treatment group (age group 30-55+ years) and the control group is statistically significant. This means that the observed difference is unlikely to have occurred by random chance alone.

The p-value is a measure of the probability of observing the results (or more extreme results) given that the null hypothesis is true. In this case, a p-value of 0.0179 for example, indicates that there is a 1.79% probability of observing the results (such as the observed lift and t-statistic) if there were actually no difference between the groups . Generally, a p-value is less than 0.05, we can conclude that the observed difference in performance between the age group 30-55+ years and the control group is statistically significant.

In summary, these observations suggest that within the age group 30-55+ years, the personalization has resulted in a significant decrease in conversion rate compared to the control group, and this difference is statistically significant.

Thus, recommending made based on data analysis that the target group for future groups for email campaigns must be the under 30 yrs demographic and over 30 yrs the personalization can be based on other marketing channels or alternate promotions should be planned.

Thank you, for following my journey. I appreciate your participation in this thorough scientific investigation of our marketing campaign. Through meticulous analysis, we've extracted essential data-driven insights crucial for informing strategic business decisions. These insights are instrumental in facilitating growth, maximizing revenue, and optimizing future campaign endeavors.

A/B Testing and Stastistical Significance

Recent Posts

Comments

Comments and Feedback appreciated