The continuous evaluation of an e-commerce company’s web applications is crucial for ensuring customer satisfaction and loyalty. Such evaluations are usually performed as split tests, i.e., the comparison of two slightly different versions of the same webpage with respect to a target metric. Usually, metrics that stakeholders are interested in include completed checkout processes, submitted registration forms or visited landing pages. To give just one example, a dating website could present 50% of their users with a blonde woman on the cover page while the other half see a dark-haired one. It is then possible to choose the “better” front page based on the number of registrations it generated—if you pay attention to the underlying statistics1.
While metrics of this type very well reflect how much money you make, you can’t make well-founded statements about usability based on such numbers (“you don’t know why you get the measured results”)2. Thus, in the long-term a way better solution is to provide your customers with a site they love to use instead of confusing them in such a way that they accidentally buy your products, isn’t it? This calls for the introduction of usability as a target metric in split tests.
We have developed WaPPU, the prototype of a usability-based split testing service. The underlying principle is to track interactions (mouse, scrolling etc.) in both versions of the tested interface. The one version additionally asks for an explicit rating of its usability by using a previously developed questionnaire3. WaPPU then takes all of these data and automatically trains models (based on existing machine learning techniques4) that are instantly used to predict the usability of the other interface from user interactions alone. This makes it possible to compare the interfaces based on their usability as perceived by users, e.g., “interface A has a usability of 85%, interface B of only 57%”.
The feasibility of our approach has been evaluated in a split test involving a real-world search engine results page. We were able to train the above mentioned models, from which we also derived general heuristics for search results pages, such as “better readability is indicated by a lower page dwell time” or “less confusion is indicated by less scrolling”.
We have described our novel approach and the corresponding evaluation in a full research paper5 and an accompanying demo paper6. Both will be presented at the 2014 International Conference on Web Engineering (ICWE). The conference proceedings will be published by Springer and the final versions of our papers will be available at link.springer.com. (Update October 4, 2020: The papers are available here and here.)
3 Maximilian Speicher, Andreas Both and Martin Gaedke (2013). “Towards Metric-based Usability Evaluation of Online Web Interfaces”. In Mensch & Computer Workshopband.
5 Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “Ensuring Web Interface Quality through Usability-based Split Testing”. In Proc. ICWE.
6 Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “WaPPU: Usability-based A/B Testing”. In Proc. ICWE (Demos).