What is ›Usability‹?

What is Usability?Earlier this year, I submitted a research paper about a concept called usability-based split testing1 to a web engineering conference (Speicher et al., 2014). My evaluation involved a questionnaire that asked for ratings of different usability aspects—such as informativeness, readability etc.—of web interfaces. So obviously, I use the word “usability” in that paper a lot; however, without having thought of its exact connotation in the context of my research before. Of course I was aware of the differences compared to User eXperience, but just assumed that the used questionnaire and description of my analyses would make clear what my paper understands as usability.

Then came the reviews and one reviewer noted:

“There is a weak characterization of what Usability is in the context of Web Interface Quality, quality models and views. Usability in this paper is a key word. However, it is weakly defined and modeled w.r.t. quality.”

This confused me at first since I thought it was pretty clear what usability is and that my paper was pretty well understandable in this respect. In particular, I thought Usability has already been defined and characterized before, so why does this reviewer demand me to characterize it again? Figuratively, they asked me: “When you talk about usability, what is that ›usability‹?”

A definition of usability

As I could not just ignore the review, I did some more research on definitions of usability. I remembered that Nielsen defined usability to comprise five quality components—Learnability, Efficiency, Memorability, Errors, and Satisfaction. Moreover, I had already made use of the definition given in ISO 9241–11 for developing the usability questionnaire used in my evaluation: 

“The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.”

For designing the questionnaire I had only focused on reflecting the mentioned high-level factors of usability—effectiveness, efficiency, and satisfaction—by the contained items. However, the rest of the definition is not less interesting. Particularly, it contains the phrases

  1. “a product”;
  2. “specified users”;
  3. “specified goals”; and
  4. “specified context of use”.

As can be seen, the word “specified” is used three times—and also “a product” is a rather vague description here.

This makes it clear that usability is a difficult-to-grasp concept and even the ISO definition gives ample scope for different interpretations. Also, in his paper on the System Usability Scale, Brooke (1996) refers to ISO 9241–11 and notes that “Usability does not exist in any absolute sense; it can only be defined with reference to particular contexts.” Thus, one has to explicitly specify the four vague phrases mentioned above to characterize the exact manifestation of usability they are referring to. Despite my initial skepticism, that reviewer was absolutely right!

Levels of usability

As the reviewer explicitly referred to “Web Interface Quality”, we also have to take ISO/IEC 9126 into account. That standard is concerned with software engineering and product quality and defines three different levels of quality metrics: 

  • Internal metrics: Metrics that do not rely on software execution (i.e., they are a static measure)
  • External metrics: Metrics that are applicable to running software
  • Quality in use metrics: Metrics that are only available when the final product is used in real conditions

As usability clearly is one aspect of product quality, these metrics can be transferred into the context of usability evaluation. In analogy, this gives us three levels of usability: Internal usability, external usability, and usability in use.

This means that if we want to evaluate usability, we first have to state which of the above levels we are investigating. The first one might be assessed with a static code analysis, as for example carried out by accessibility tools. The second might be assessed in terms of an expert going through a rendered interface without actually using the product. Finally, usability in use is commonly assessed with user studies, either on a live website, or in a more controlled setting.

Bringing it all together

Once we have decided for one of the above levels of usability, we have to give further detail on the four vague phrases contained in ISO 9241–11. Mathematically speaking, we have to find values for the variables product, users, goals, and context of use, which are sets of characteristics. Together with the level of usability, this gives us a quintuple defined by the following cross product: 

level of usability × product × users × goals × context of use.

We already know the possible values for level of usability:

level of usability ∈ { internal usability, external usability, usability in use },

so what are the possible values for the remaining variables contained in the “quintuple of usability”?

Product

The first one is rather straightforward. Product is the actual product you are evaluating, or at least the type thereof. Particularly, web interface usability is different from desktop software or mobile app usability. Also, it is important to state whether one evaluates only a part of an application (e.g., a single webpage contained in a larger web app), or the application as a whole. Therefore: 

product ⊆ { desktop application, mobile application, web application, online shop, WordPress blog, individual web page, … }. 

Since product is a subset of the potential values, it is possible to use any number of them for a precise characterization of the variable, for instance, product = { mobile application, WordPress blog } if you are evaluating the mobile version of your blog. This should not be thought of as a strict formalism, but is rather intended as a convenient way to express the combined attributes of the variable. However, not all values can be meaningfully combined (e.g., desktop application and WordPress blog). The same holds for the remaining variables explained in the following.

Users

Next comes the variable users, which relates to the target group of your product (if evaluating in a real-world setting) or the participants involved in a controlled usability evaluation (such as a lab study). To distinguish between these is highly important as different kinds of users might perceive a product completely differently. Also, real users are more likely unbiased compared to participants in a usability study.

users ⊆ { visually impaired users, female users, users aged 19–49, test participants, inexperienced users, experienced users, novice users, frequent users, … }.

In particular, when evaluating usability in a study with participants, this variable should contain all demographic characteristics of that group. Yet, when using methods such as expert inspections, users should not contain “usability experts,” as your interface is most probably not exclusively designed for that very specific group. Rather, it contains the characteristics of the target group the expert has in mind when performing, for instance, a cognitive walkthrough. This is due to the fact that usability experts are usually well-trained in simulating a user with specific attributes.

Goals

The next one is a bit tricky, as goals are not simply the tasks a specified user shall accomplish (such as completing a checkout process). Rather, there are two types of goals according to Hassenzahl (2008): do-goals and be-goals. 

Do-goals refer to pragmatic usability, which means “the product’s perceived ability to support the achievement of [tasks]” (Hassenzahl, 2008), as for example the aforementioned completion of a checkout process.

Contrary, be-goals refer to hedonic usability, which “calls for a focus on the Self” (Hassenzahl, 2008). To give just one example, the ISO 9241–11 definition contains “satisfaction” as one component of usability. Therefore, “feeling satisfied” is a be-goal that can be achieved by users. The achievement of be-goals must not necessarily be connected to the achievement of corresponding do-goals (Hassenzahl, 2008). In particular, a user can be satisfied even if they failed to accomplish certain tasks and vice versa.

Thus, it is necessary to take these differences into account when defining the specific goals to be achieved by a user. The variable goals can be specified either by the concrete tasks the user shall achieve or by Hassenzahl’s more general notions if no specific tasks are defined:

goals ⊆ { do-goals, be-goals, completed checkout process, writing a blog post, feeling satisfied, having fun, … }.

Context of use

Last comes the variable context of use. This one describes the setting in which you want to evaluate the usability of your product. It can be something rather general—such as “real world” or “lab study” to indicate a potential bias of the users involved—, device-related (desktop PC vs. touch device) or some other more specific information about context. In general, your setting/context should be described as precisely as possible. 

context of use ⊆ { real world, lab study, expert inspection, desktop PC, mobile phone, tablet PC, at day, at night, at home, at work, user is walking, user is sitting, … }.

Case study

For testing a research prototype in the context of my industrial PhD thesis, we have evaluated a novel search engine results page (SERP) designed for use with desktop PCs (Speicher et al., 2014). The test was carried out as a remote asynchronous user study with participants being recruited via internal mailing lists of the cooperating company. They were asked to find a birthday present for a good friend that costs not more than €50, which is a semi-open task (i.e., a do-goal). According to our above formalization of usability, the precise type of usability assessed in that evaluation is therefore given by the following (for the sake of readability, the quintuple is given in list form): 

  • level of usability = usability in use
  • product = {web application, SERP}
  • users = {company employees, novice users, experienced searchers (several times a day), average age ≈ 31, 62% male, 38% female}
  • goals = {formulate search query, comprehend presented information, identify relevant piece(s) of information}
  • context of use = {desktop PC, HD screen, at work, remote asynchronous user study}

In case the same SERP is inspected by a team of usability experts in terms of screenshots, the assessed type of usability changes accordingly. In particular, users changes to the actual target group of the web application, as defined by the cooperating company and explained to the experts beforehand. Also, goals must be reformulated to what the experts pay attention to (only certain aspects of a system can be assessed through screenshots). Overall, the assessed type of usability is then expressed by the following:

  • level of usability = external usability
  • product = {web application, SERP}
  • users = {German-speaking Internet users, any level of searching experience, age 14–69}
  • goals = {identify relevant piece(s) of information, be satisfied with presentation of results, feel pleased by visual aesthetics}
  • context of use = {desktop PC, screen width ≥ 1225 px, expert inspection}

Conclusion

Usability is a term that spans a wide variety of potential manifestations. For example, usability evaluated in a real-world setting with real users might be a totally different kind of usability than usability evaluated in a controlled lab study—even with the same product. Therefore, a given set of characteristics must be specified or otherwise, the notion of “usability” is meaningless due to its high degree of ambiguity. It is necessary to provide specific information on five variables that have been identified based on ISO 9241–11 and ISO/IEC 9126: level of usability, product, users, goals, and context of use. Although I have introduced a mathematically seeming formalism for characterizing the precise type of usability one is assessing, it is not necessary to provide that information in the form of a quintuple. Rather, my primary objective is to raise awareness for careful specifications of usability, as many reports on usability evaluations—including the original version of my research paper (Speicher et al., 2014)—lack a complete description of what they understand as ›usability‹.

(This article has also been published on Medium and as a technical report.)

1 “Usability-based split testing” means comparing two variations of the same web interface based on a quantitative usability score (e.g., usability of interface A = 97%, usability of interface B = 42%). The split test can be carried out as a user study or under real-world conditions.


References

John Brooke. SUS: A “quick and dirty” usability scale. In Usability Evaluation in Industry. Taylor and Francis, 1996. 

Marc Hassenzahl. User Experience (UX): Towards an experiential perspective on product quality. In Proc. IHM, 2008.

Maximilian Speicher, Andreas Both, and Martin Gaedke. Ensuring Web Interface Quality through Usability-based Split Testing. In Proc. ICWE, 2014.

Acknowledgments

Special thanks go to Jürgen Cito, Sebastian Nuck, Sascha Nitsch & Tim Church, who provided feedback on drafts of this article 🙂

Advertisements

[offene Masterarbeit] Was that Page Pleasant to Use? Usability-Metriken in einer echten Suchmaschine

Es gibt viel zu viele schlechte Webseiten! Schon mal versucht, auf www.finanzen.sachsen.de die Tagessätze für Auslandsreisekosten zu finden? Falls nicht, einfach mal ausprobieren und viel Spaß dabei! Oder schon mal auf der Seite der Uni Würzburg versucht, herauszufinden, wie genau eine Bewerbung für den Bachelor in Wirtschaftswissenschaften abläuft? Nein? Ist auch eigentlich besser so, weil der Versuch einen leicht in den Wahnsinn treiben kann.

Motivation: Usability? Nein, danke!

Viele Webseiten (auch großer Unternehmen) beweisen keinerlei Gespür für grundlegende Usability-Prinzipen, welche weder sonderlich neu noch sonderlich kompliziert sind. Häufig sind z. B. Informationen, die eine Großzahl an Nutzern betreffen, nicht direkt über die primäre Navigation erreichbar, sondern nur über verschlungene Pfade und zahllose Klicks. Und das trotz einer Fülle an Frameworks und Content-Management-Systemen, die modernste Webdesign- und Usability-Prinzipien unterstützen. Der wohl häufigste Grund für mangelnde Usability einer Webseite ist die Tatsache, dass entsprechende Tests nur unzureichend oder gar nicht durchgeführt werden, häufig aus Kosten- oder Zeitgründen.

The WaPPU dashboardUm dem entgegenzuwirken, habe ich als Teil meiner Doktorarbeit ein prototypisches Tool namens WaPPU entwickelt, welches es ermöglicht, wesentlich günstigere A/B-Tests auf Basis einer neuartigen Metrik für Usability durchzuführen. Das heißt, die Usability zweier leicht unterschiedlicher Versionen derselben Webseite wird während der Benutzung durch echte Nutzer in Form von Metriken in Echtzeit erfasst und in einem Dashboard visualisiert (siehe Abbildung).

Ziel der Arbeit

Mein Dissertationsprojekt ist eingebettet in die Forschungs- und Entwicklungsabteilung der Unister GmbH in Leipzig, welche aktuell eine neuartige Reisesuchmaschine entwickelt. Der entwickelte Prototyp soll im Rahmen einer Masterarbeit in diese reale Suchmaschine integriert werden, um verschiedene Interface-Variationen im produktiven Betrieb anhand ihrer Usability bewerten zu können. Weitere Informationen können der offiziellen Ausschreibung entnommen werden. Interessenten melden sich bitte unter der in der PDF angegebenen E-Mail-Adresse oder über mein Kontakformular.

Demo

Ein Demo-Video zum WaPPU-Tool gibt’s hier.

How to Infer Usability from User Interactions. My Poster Presented at #ICWE2014

WaPPU poster presented @ ICWE 2014The corresponding publications are:

  • Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “Ensuring Web Interface Quality through Usability-based Split Testing”. In Proc. ICWE.
  • Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “WaPPU: Usability-based A/B Testing”. In Proc. ICWE (Demos).

For more information about WaPPU, please see this previous post. Special thanks go to Fred Funke, who helped with designing the poster!

First Screencast Published in VSR Media Center

The demo video about usability-based A/B testing I created for the 2014 International Conference on Web Engineering is now featured in the media center of the VSR research group at Chemnitz University of Technology. The chair of VSR is Prof. Dr.-Ing. Martin Gaedke, who is the primary advisor of my PhD thesis.

The video above demonstrates the use of the WaPPU* service, which implements the novel principle of usability-based A/B testing. The underlying concept is that on one variation of an interface (A), we train a model from collected user interactions and an automatically presented usability questionnaire. Then, the other variation (B) involved in the A/B test uses this model to infer its usability from interactions alone.

Say, on interface A we perform a click within a particular element (#content) and then rate the site’s usability as good using the questionnaire. We reload the page, click outside that particular element and give a bad usability rating. The WaPPU service automatically trains a model that—simply speaking—knows the following:

                 click    --- usability = good
                /
element #content
                \
                 no click --- usability = bad

This model is instantly available to interface B. So if we now visit B and click outside of #content, WaPPU automatically infers a bad usability rating from this. The ratings of both variations of the investigated interface are available in a dashboard provided by our tool in real-time. This dashboard also features a traffic light that indicates whether one interface is significantly better or worse than the other based on a Mann–Whitney U test.

* “Was that Page Pleasant to Use?”

Usability-based Split Testing or How to infer web interface usability from user interactions

The continuous evaluation of an e-commerce company’s web applications is crucial for ensuring customer satisfaction and loyalty. Such evaluations are usually performed as split tests, i.e., the comparison of two slightly different versions of the same webpage with respect to a target metric. Usually, metrics that stakeholders are interested in include completed checkout processes, submitted registration forms or visited landing pages. To give just one example, a dating website could present 50% of their users with a blonde woman on the cover page while the other half see a dark-haired one. It is then possible to choose the “better” front page based on the number of registrations it generated—if you pay attention to the underlying statistics1.

While metrics of this type very well reflect how much money you make, you can’t make well-founded statements about usability based on such numbers (“you don’t know why you get the measured results”)2. Thus, in the long-term a way better solution is to provide your customers with a site they love to use instead of confusing them in such a way that they accidentally buy your products, isn’t it? This calls for the introduction of usability as a target metric in split tests.

The WaPPU dashboardWe have developed WaPPU, the prototype of a usability-based split testing service. The underlying principle is to track interactions (mouse, scrolling etc.) in both versions of the tested interface. The one version additionally asks for an explicit rating of its usability by using a previously developed questionnaire3. WaPPU then takes all of these data and automatically trains models (based on existing machine learning techniques4) that are instantly used to predict the usability of the other interface from user interactions alone. This makes it possible to compare the interfaces based on their usability as perceived by users, e.g., “interface A has a usability of 85%, interface B of only 57%”.

The feasibility of our approach has been evaluated in a split test involving a real-world search engine results page. We were able to train the above mentioned models, from which we also derived general heuristics for search results pages, such as “better readability is indicated by a lower page dwell time” or “less confusion is indicated by less scrolling”.

Usability-based Split Testing  paper @ ICWE2014We have described our novel approach and the corresponding evaluation in a full research paper5 and an accompanying demo paper6. Both will be presented at the 2014 International Conference on Web Engineering (ICWE). The conference proceedings will be published by Springer and the final versions of our papers will be available at link.springer.com.

1 http://www.sitepoint.com/winning-ab-test-results-misleading/
2 http://www.nngroup.com/articles/putting-ab-testing-in-its-place/
3 Maximilian Speicher, Andreas Both and Martin Gaedke (2013). “Towards Metric-based Usability Evaluation of Online Web Interfaces”. In Mensch & Computer Workshopband.
4 http://www.cs.waikato.ac.nz/ml/weka/
5 Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “Ensuring Web Interface Quality through Usability-based Split Testing”. In Proc. ICWE.
6 Maximilian Speicher, Andreas Both and Martin Gaedke (2014). “WaPPU: Usability-based A/B Testing”. In Proc. ICWE (Demos).

4 Submissions accepted at International Conference on Web Engineering (ICWE)

End of February, I submitted four contributions to the 14th International Conference on Web Engineering: two full papers, one demo and one poster. Of these four submissions, all were accepted and will be presented at the conference, which is to be held in Toulouse (see map below) from July 1 to July 4. In the following, I’ll give a quick overview of the accepted papers. A more detailed explanation of my current research will be the subject of one or two separate articles.

  • Maximilian Speicher, Sebastian Nuck, Andreas Both, Martin Gaedke: “StreamMyRelevance! Prediction of Result Relevance from Real-Time Interactions and its Application to Hotel Search” — This full paper is based on Sebastian Nuck’s Master thesis. He developed a system for processing user interactions collected on search results pages in real-time and predicting the relevance of individual search results from these.
  • Maximilian Speicher, Andreas Both, Martin Gaedke: “Ensuring Web Interface Quality through Usability-based Split Testing” — This full paper proposes a new approach to split testing that is based on the actual usability of the investigated web interface rather than pure conversion maximization. We have trained models for predicting usability from user interactions and from these have also derived additional interaction-based heuristics for comparing search results pages.
  • Maximilian Speicher, Andreas Both, Martin Gaedke: “WaPPU: Usability-based A/B Testing” — This demo accompanies our paper about Usability-based Split Testing. The WaPPU tool builds upon this new concept and demonstrates how usability can be predicted from user interactions using automatically learned models.
  • Maximilian Speicher: “Paving the Path to Content-centric and Device-agnostic Web Design” — This poster is based on one of my previous posts. It provides a review of motherfuckingwebsite.com, which satirically claims to be a perfect website. Based on current research, we suggest improvements to the site that follow a strictly content-centric and device-agnostic approach.

My PhD research is supervised by Prof. Dr.-Ing. Martin Gaedke (VSR Research Group, Chemnitz U of Technology) and Dr. Andreas Both (R&D, Unister GmbH) and funded by the ESF and the Free State of Saxony.

Offene Themen für Masterarbeiten: Benutzbarkeit von Web-Interfaces, Evaluierung und Metriken

An alle Masterstudenten in Leipzig und an der TU Chemnitz, die sich mit Human-Computer Interaction, Web-Interfaces und Usability befassen oder gerne befassen wollen: Aktuell habe ich in diesen Themengebieten zwei spannende Masterarbeiten zu vergeben!

Vergleich eines Metrik- und Interaktions-basierten Ansatzes mit etablierten Methoden zur Bestimmung der Benutzbarkeit von Web-Interfaces

Im Rahmen dieses Themas soll ein neuartiges A/B-Testing-Werkzeug, welches auf Nutzerinteraktionen und Benutzbarkeits-Metriken basiert, empirisch mit etablierten Methoden zur Evaluierung von Web-Interfaces verglichen werden. Solche etablierten Methoden umfassen z. B. Experteninspektionen, Heuristiken und Checklisten. Es soll erörtert werden, inwiefern sich der neuartige, WaPPU genannte Ansatz bzgl. Effektivität und Effizienz von den bestehenden Ansätzen unterscheidet. Ein fertiger Prototyp des A/B-Testing-Werkzeugs WaPPU wird gestellt.

Link zur Ausschreibung der Masterarbeit

Usability as a Service: Entwicklung eines WordPress-Plug-ins zur quantitativen Bestimmung von Benutzbarkeit

Ziel dieser Arbeit ist die Überführung eines bestehenden A/B-Testing-Prototypen in ein WordPress-Plug-in, welches es ermöglicht, die Benutzbarkeit eines Blogs basierend auf clientseitigen Nutzerinteraktionen vorherzusagen. Basierend auf Trainingsdaten von bestehenden Blogs sollen in einem zentralen Repository Template-abhängige Benutzbarkeitsmodelle gelernt werden. D. h. mehrere Blogs, die dasselbe WordPress-Template verwenden, tragen Daten zu einem gemeinsamen Modell bei. Ein neu aufgesetztes Blog, welches auf demselben Template basiert, soll dann mithilfe des Plug-ins unmittelbar Vorhersagen zur Benutzbarkeit seines Interfaces erhalten können.

Link zur Ausschreibung der Masterarbeit

Beide Arbeiten werden in Kooperation mit der Forschungs- und Entwicklungsabteilung der Unister GmbH (Leipzig) im Rahmen eines echten Industrieprojektes durchgeführt. Falls ihr an der TU Chemnitz seid, müsst ihr dafür natürlich nicht extra nach Leipzig ziehen ;). Bei Interesse an einer der Arbeiten wendet euch bitte an die in den Ausschreibungen angegebenen Kontaktpersonen, d. h. mich selbst oder Dr. Andreas Both. Studenten der TU Chemnitz können sich auch an Prof. Martin Gaedke von der Professur für Verteilte und Selbstorganisierende Rechnersysteme (VSR) wenden.

2013 in Review: Search Interaction Optimization

On January 1, 2013 I started my PhD studies at Chemnitz University of Technology in cooperation with Unister GmbH, Leipzig. My project is about automatic methods for optimizing search engine results pages (e.g., http://www.google.com/#q=Hello%2C+World!) with respect to result quality and interface usability. The official working title is Search Interaction Optimization: A Design Thinking Approach.

During my first year as a PhD student, I have published three papers (as first author). A full paper about my first milestone has been presented at the International Conference on Knowledge and Information Management (CIKM), which was held in San Francisco in October/November.1 This milestone was about deducing the relevance of search results from user interactions on the search engine results page. Using large amounts of anonymous interaction data from two real-world hotel booking portals, we could show that it is possible to learn according relevance models of reasonable quality.

A second (short) paper was presented at the PhD Symposium of the International Conference on Web Engineering (ICWE) in July.2 The paper addressed the attempt of learning a common model that predicts usability based on training data from a group of similar webpages (e.g., online news articles). However, we concluded that this is not easily possible, because differences in low-level page structure and user intention counter model precision. Thus, additional preprocessing steps are necessary to minimze these influences. The usability evaluation described in this paper was based on a novel instrument for measuring usability whose items have been specifically designed for correlation with client-side interaction features. This Interface Usability Instrument (Inuit) was presented at the workshop “Methodological Approaches to HCI” in September.3 The above described is part of the second milestone of my PhD project, which is about automatic methods for optimizing interface usability. This milestone is my current work-in-progress and will be finished in 2014.

Alright, so much for my research in 2013. I’ll keep you updated with more fine-grained results during the new year.

1 Speicher, Both, Gaedke: “TellMyRelevance! Predicting the Relevance of Web Search Results from Cursor Interactions” (http://doi.acm.org/10.1145/2505515.2505703).
2 Speicher, Both, Gaedke: “Was that Webpage Pleasant to Use? Predicting Usability Quantitatively from Interactions” (http://dx.doi.org/10.1007/978-3-319-04244-2_33).
3 Speicher, Both, Gaedke: “Towards Metric-based Usability Evaluation of Online Web Interfaces” (http://dl.mensch-und-computer.de/handle/123456789/3399).