Product Discovery ‒ A Systematic Approach for Customer Research Methods

TL;DR: There is a plethora of customer research methods out there and it can be difficult to stay on top of things when it comes to choosing the right method for a given research question. Specifically, in the realm of product discovery, when it comes down to being able to decide whether an idea generates business value or not, certain methods lack validity or are not properly applied. In this article, André Morys provides an overview of when to use which (combination of) methods(s) for product discovery.

Max: I know it’s been a while since I announced it, but now I’m extremely glad to introduce the first guest author I could win for my blog: André Morys!

André is an author, university lecturer, keynote speaker, and entrepreneur. He founded konversionsKRAFT in 1996, one of Germany’s leading consultancies for digital growth and experimentation.

Maybe you know that feeling. Hustling from sprint to sprint, feeding the release train with story points. It’s the daily madness in agile teams.

Roadmaps are planned tightly. Jumping from feature to feature doesn’t allow you to focus on incremental improvements. No time for proper product discovery. No time for proper validation. “I should better ask 5 people if they like the new feature instead” is your primary thought to save some time.

And yet a systematic deep dive into the reality of the customers would be such a great time investment. Knowing the thoughts and motivations, the fears and objections of users is so crucial for the success of your product. Your gut feeling tells you, you need more time for these things.

Additionally, you might be irritated seeing different colleagues having different opinions about how to get a deep understanding of your customers’ reality. Some like mouse tracking. Others are big believers in moderated user research. Others prefer user surveys. 

Thus, different methods make things even more complicated. Complexity is the enemy of a proper operationalization of success factors.

You’re right ‒ because I have made similar observations, I’d like to encourage you to implement a systematic approach in researching your customers’ reality. To avoid different understandings of the different methods, I’d like to present to you my personal summary of methods based on 20 years of experience in user research and customer experience management.

Preface: It’s about impact or effect, not always efficiency

“Agile” for many teams means following predefined routines and habits. Especially in bigger organizations, these routines and the given culture emphasize a focus on efficiency (“doing things the right way”) while there is a lack of focus on effectiveness (“doing the right things”). Product people have discussed this topic for several years, talking about the “Feature factory” (John Cutler) or the “Build Trap” (Melissa Perri). 

Long story short, you could visualize the challenge like this:

Diagram of efficiency vs. effectiveness in product development processes, highlighting the "build trap" (not efficient, not effective).

If you are not sure whether this problem is part of your working culture, please check if you can observe one of the following symptoms:

  • The real business decisions are made elsewhere ‒ your team is executing roadmaps.
  • Your backlogs are full of smaller features and changes.
  • Many meetings don’t leave any headroom for better research or discovery for your product.
  • As soon a a feature is shipped, you move forward to the next one, no room for improvement of existing features.
  • Real economic outcomes of your sprints are rarely measured directly, you can’t really control business values, instead you are hoping they will improve long term as a result of your work.
  • You do some A/B tests if you have enough time and there are different opinions or solutions ‒ but this comes only at the end of your workbench.

Why am I telling you this? Because your desire to be more effective, to have more time for a proper product discovery, to produce outcome instead of output, is completely correlating with the status of the whole system. 

There needs to be a starting point to change all this.

Starting with a proper understanding of research methods and implementation of a systematic customer insights process is the foundation for changing the impact of your work. You will get better results out of your product discovery, you will learn where to optimize your product, you will unveil new levers ‒ a fundamental step towards generating more outcomes and becoming more effective.

Bad News: There are many, many methods

And even worse: they are all different. There is not that one method that is the best. Together with a continuous lack of time, this is where all the challenges start. So let’s start sorting it all out.

First of all: How should you segment the different methods?

In 2014, the Norman Nielsen Group (NN/g) published an article pointing out that some methods focus on observing behavior while others try to find out the underlying attitude. Additionally, they suggest that another helpful and differentiating dimension is qualitative versus quantitative methods. The overview looks like this:

The landscape of user research methods, classified by behavioral vs. attitudinal, qualitative vs. quantitative, and context of product use. (c) 2014 Christian Rohrer.
Source: https://www.nngroup.com/articles/which-ux-research-methods/.

But that’s not enough ‒ the chart also shows you that some methods usually take place during the natural use of the product while others follow defined test setups or scripts in a less natural environment or even completely out of context.

I find this overwhelming, and as extensive as this collection is, it still doesn’t give me enough guidance when it comes to an advice when to use which method.

So what shall we do? 

My personal experience is that as soon as you understand the underlying challenge, things clarify automatically.

Your primary task during product discovery is to understand the reality of your customers, their desires and motivations and you compare it with the current customer experience to find levers for improving your product.

You need to find levers that create impact. Business impact. Your challenge is to understand customer behavior, to change behavior.

It comes easy to say that ‒ but the implications of this logic are a little more complex.

That’s why I’d like to add another dimension to evaluate the different methods. Finally we are going to focus on creating outcomes ‒ but what does this actually mean? In most cases, we are talking about business value, e.g., revenue per visitor, profitability, customer lifetime value. 

This is a big challenge, so I need to explain my background a little bit. My roots are in UX design and user research. About 15 years ago, the new possibilities of validating the economic impact of changes on a website by quantitatively A/B-testing the variations gave me a completely new perspective: 

By running an experiment, I was able to close the gap between design and business value. 

In the following years I learned a lot about the different methods ‒ especially about their ability to qualify my testing hypotheses. Once, participants in a user interview told me that they are annoyed by a video on the site. We validated that change with an A/B test and found out that without the video the website produced fewer conversions than before. So the method was not able to prequalify the hypothesis, maybe the participants answered based on the social conformity bias. 

In another case a client of mine ran a customer survey. People demanded lower prices and the company followed that desire. Two years later they filed for bankruptcy ‒ maybe a coincidence or maybe a result of customer research gone wrong. 

So what’s the case here?

The point is: We all want to make users happy, improve their lives, learn about their motivations, fears and objections. But the reality is that this does not necessarily correlate 100% with business value.

That’s why we have to validate changes quantitatively with online experiments. A/B testing is an easy way of conducting an experiment, it’s a scientific method to prove if you are right or wrong. This is why we realized early that basically two different processes are needed to create better outcomes [editor’s note: Also see Don Norman’s The Design of Everyday Things, chapter 6]:

  • A) Analyse: Learn about the customer journey, customer reality, motivations, fears and objections and compare this reality continuously with the experience you deliver (if you like, you could call it “systematic product discovery”).
  • B) Validate: Continuously validate the changes by measuring the delivered outcomes (business value) of your changes by using easy to apply methods like A/B testing that deliver the desired amount of statistical validity.

A couple of years ago, we started to call this approach the “Agile Growth Process” and it is applicable to different layers of business decisions. This is what it looks like:

Diagram of the "agile growth process" by konversionsKRAFT.

Additionally, somewhere in the interface between these two parts of the process you need to prioritize the findings of part A) to ensure you are focused on the biggest growth potential to stay effective.

Why do I tell you this? 

Because it’s important to understand the context of what I am referring to when I will later talk about “validity” of the different methods: business value. The term validity might be wrong from a scientific perspective here ‒ but I basically want to give you my perspective of “how valid / how true” is the output of the method.

There is a simple equation: if you want to create outcomes with your product team, you basically change things in your product to change customer behavior. The question is: are the methods you are using for customer research able to tell you what’s really important and what’s not?

You’re in the business of behavior change.

So: What methods help you to deduce product improvements based on the insights you get from different methods? What methods tell you more about “why do people use my product” and which don’t?

I’d like to shift the perspective of the NN/g collection a little bit by changing the dimensions slightly:

  1. Is the method qualitative or quantitative? Does it tell you “Why?” or is the result data that needs to be interpreted correctly?
  2. Is the method capable of delivering you insights about the underlying unconscious decision making processes of your customers or does it focus on the rationalised, cognitive levels of decision making?
  3. How big is the validity of gained insights in terms of business value? E.g. “Voice of customer” methods only show a small part of the user’s reality.
  4. How big is the effort? Still, there is a lack of time and some methods are quick and cheap while others are more complex and nees skills and training.

People often lack the skills to apply methods properly

As mentioned above, I saw people asking their clients “How important are cheap prices to you?” and of course the results were obvious. The deduced actions might have been the most expensive thing the company has ever done.

I also see many product managers validating their ideas by asking five people if they like a new feature. They even refer to Jakob Nielsen and his research why five participants in a 1:1 user research will find 80% of the existing usability issues. 

This is just another example of customer research gone wrong because the research question and/or method does not fit to the desired outcome ‒ product success is not solely based on usability as we should all know. And yet, design sprints refer to this method and thousands of innovations are approved like this each year.

So: Please differentiate between user research that leads to ideas (left part of the process) and the validation of ideas (right part of the process) measured mainly by business value. In customer research, many different methods deliver many different perspectives to enhance your picture of the customer reality while validation most of the time only is possible by conducting an experiment under natural conditions to measure mainly economic outcomes with a high statistical validity.

The examples I gave should anecdotally show you the possible misunderstandings and why a lack of skills when it comes to choosing the right method could be very expensive.

Here is my overview of methods that are the most common ones when it comes to customer research to create proper ideas that have a certain validity (“qualified hypothesis”, as Eric Ries calls the insights in his Lean Startup methodology).

Having a bucket of prequalified (“valid”) ideas (“hypotheses”) is all about risk mitigation while gaining the business value at the same time. 

Let’s go through the methods:

The most common methods and their flaws

Please remember ‒ this list has been created out of my experience as an optimizer over the last two decades. Based on my personal goals to improve customer experience and drive economic growth at the same time, my conclusions might be different to your experiences. Please leave a comment in the blog if you like to add your perspective, this is always very welcome!

An overview of the user research methods, as discussed in this article, along the dimensions qualitative/quantitative, conscious/subconscious, validity, and effort.

1:1 Moderated User Testing

Type: Qualitative

Level: Mostly the main focus is on conscious decision making, participants usually give their rationalised opinion about things as this method rarely is able to dig deeper.

Validity: Medium ‒ if you incorporate all levels of decision making, moderated user testing mostly is focusing on finding usability issues, but there are many, many more issues that block your customers from buying.

Effort: Medium to high, especially if you really want to dig deeper and find the right participants.

Most important flaw: Most studies don’t deliver (fully) valid results because the participants already know what product/website is tested ‒ so participants are biased a lot. Additionally the context is missing and decision making processes are distorted by the script of the researcher and the test design.

Tip: always make blind studies that compare different websites and ensure that participants don’t know what website is tested. Don’t interfere with the behavior and focus on observing. Ask questions later (retrospective think-aloud).

Remote User Testing

Type: Qualitative

Level: As this is the younger and more agile sibling of 1:1 user testing, this version also focuses on the conscious decision making process of customers, mainly it is also about observing behavior as a result.

Validity: During a real-life session, the moderator can gather many additional information from the real life situation which is much harder in a remote lab. The remote lab is even more focused on finding functional errors and usability issues so the validity is only medium ‒ based on the sample size and what your questions are. For finding usability mistakes it might be high, for understanding the whole customer reality and the underlying decision processes it reveals only a small and rationalised part of the customers reality.

Effort: small, that’s why it’s so popular.

Most important flaw: Low validity ‒ if you really want to understand the customers’ reality and the deeper decision making mechanisms, this is definitely not the best method. But because it’s cheap it also won’t create big harm.

Tip: Same as for 1:1 user research ‒ always make blind studies. Try to invest a little more time and effort and make synchronous tests so you can at least dig a little bit deeper into the customers’ reality.

Online Surveys

Type: Qualitative + quantitative, based on your study design and the type of questions.

Level: What users typically type into an online survey are rationalised thoughts so this method clearly focuses on conscious level and attitude. Even if you try digging deeper into emotional areas, there is a high risk that answers are based on social desirability.

Validity: Although regarded as a method to primarily gain insights about users’ attitudes, the socially biased answers sometimes lead to a medium or even low validity ‒ depending on the questions you ask. For example, no user will ever tell you the reason they bought a luxury item was to impress other people or to compensate for their sense of inferiority.

Effort: Low to medium, especially the effort for designing the survey and implementing it so you don’t experience any biases is often underestimated.

Most important flaw: Low validity and biased answers based on the type of questions and the place of the customer journey where the survey is done. You will get completely different answers for the question “How do you like our service” depending on if you are asking this top of the funnel or after users made it through the checkout process.

Tip: Focus on elements of the customer journey, where rational factors are important and don’t try to gather insights about deeper emotional motivators or fears. For example, a quick survey for new customers about their experienced fricitions was always helpful in the past to increase usability. Most helpful question so far was always “If you would have not bought from us, what was your alternative solution? Why?”. This is something that customers easily tell you without any biases.

Web Analytics Data

Type: Purely quantitative

Level: Web Analytics data always reflects the real behavior of users.

Validity: 100% validity as long as you don’t blend it with your own reality and interpretation. Many experts are overconfident with their interpretation of the data ‒ even worse, most of the time they are not aware about their interpretation. For example a high exit rate is often interpreted as “there is something wrong on that page” which is not correct. There is maybe something wrong on the page with a high exit rate but maybe the main reason is situated a couple of pages earlier in the customer journey. Additionally the data does not tell you what is wrong on the page ‒ quantitative data never tells you “why?”. As soon as the web analytics data is interpreted without any frameworks or supported by additional qualitative data, the validity often is very low.

Effort: To gather real insights from analytics data, the effort is much higher than anticipated from most people. As soon as you have a hypothesis, you can think around the question “can I prequalify the hypothesis with analytics data?”. But as soon as you need to understand why things happen and come up with a hypothesis, it is nearly impossible. So the effort in our context is rather high.

Most important flaw: Overinterpretation of analytics data might lead to wrong assumptions based on the reality of the expert who is assessing the data  (and not the customer).

Tip: Better use analytics data to prequalify existing hypotheses instead of assessing the data to create hypotheses. 

Eye Tracking

Type: Quantitative data

Level: Unconscious ‒ users can’t fully control their eye movements on a conscious level.

Validity: Same situation as with analytics data. The validity is high as long as your sample for the research is representative and as long as the results are not interpreted through a biased lens of the experts own understanding and blended with wrong assumptions. Unfortunately most studies suffer from a too small sample size, biased study setup and wrong interpretation which leads to a low validity of the gained insights.

Effort: High ‒ mostly the effort to create a study design where participants are not biased from the questions that are asked (you need an experienced researcher) and additional effort for a non-biased analysis of the data needs a lot of time. The “5 people” rule of usability tests does not apply for eye tracking, the collected data shows usually a high variance which leads to a high amount of participants for the study.

Most important flaw: Low validity of results from over-interpretation of data, wrong sample and biased research.

Tip: As eye-tracking data do never tell you “why?” and creates a lot of effort, you should probably think about mouse tracking as an alternative as long as you don’t have the possibility of running eye-tracking tests on your own and because you are experienced in that field of research. Mouse tracking has similar flaws but is cheaper to collect and automatically is based on a much bigger sample size.

Mouse Tracking

Type: Quantitative

Level: Mostly unconscious ‒ the movement of the mouse is usually a result of unconscious decision processes. (Just imagine how awkward it would be to have a permanent and conscious voice in your mind telling you where you should click. Sometime this happens, this is why it is rated as “mostly” unconscious.)

Validity: Again ‒ high, as long as the data is not interpreted wrongly and the sample is big enough. Problem here is again a bias from interpretation of the data. This effect could lower the validity of the insights as it is blended with the experts reality.

Effort: Low, tools are cheap and interpretation of big samples is eazy through click map visualizations.

Most important flaw: The mouse movement is a result of user behavior so the conclusion back to the stimulus or trigger that is responsible for the behavior is difficult to draw. As a quantitative method, mouse tracking does not tell you “why” so the weak point of the method is a wrong interpretation.

Tip: Always combine mouse tracking with qualitative methods to understand the user’s reality. What do they perceive? What’s the user’s interpretation? This is the source that leads to the mouse movement. 

Expert Evaluation

Type: Qualitative

Level: Depending on the expert’s capabilities, knowledge and empathy, an expert evaluation is able to evaluate primarily conscious parts of the user’s decision and also unconscious parts.

Validity: Strongly depends on the expertise of the rater that is conducting the evaluation. Most evaluations are similar to cognitive walkthroughs conducted in a group of experts that are biased from their own knowledge about the product. Rarely is it possible to eliminate the existing knowledge about the product, the strategy, the tech constraints, etc. which leads to very superficial results with low validity. Validity can be increased by following certain rules, e.g., conducting the review independently with several raters or applying a set of predefined criteria (see heuristic evaluation).

Effort: Moderate effort ‒ based on the amount of raters and the scope of the evaluation.

Most important flaw: Low validity of insights from bias raters and a lack of methodological knowledge.

Tip: Only use neutral raters that are able to take the user’s perspective. Do evaluations independently ‒ not in a group. Apply frameworks or criteria sets to increase validity (see heuristic evaluation).

Heuristic Evaluation

Type: Qualitative

Level: Depending on the applied framework or heuristics, the evaluation can cover both conscious and unconscious parts of the user’s behavior making.

Validity: Moderate to high ‒ depending on the applied heuristics. Most heuristic evaluations are based on usability heuristics as this part of user behavior has already gone through a lot of research in the past 25 years and became common knowledge. The challenge is that usability is only one part of the overall user experience, comparable with an important foundation. Without usability, there will be no business value. But even an interface with great usability could suffer from low economic results based on additional factors that influence the decision making and behavior of users. This is the reason why I started to develop our own framework that includes all potential levels that affect the likelihood of a conversion on a website based on the user’s perspective. It is called “The 7 Levels of Conversion” and you can download it here as a PDF for free.

Effort: Moderate effort ‒ depending on the scope of the evaluation and if you include competitors websites or not.

Most important flaw: Applying the wrong framework leads to biased results. Also the experience and empathy of the rater might affect the validity of the results even if a framework is used. 

Tip: Always conduct independent evaluations and connect the results later to avoid group effects during the evaluation. If you apply a metric rating scale you can calculate the inter rater reliability to see if you get proper results and raters are not biased. It is recommended to add competitors to the rating and start the evaluation of their websites first to eliminate biases.

Personae

Type: Qualitative + quantitative (if done properly)

Level: Conscious + subconscious level

Validity: low to high ‒ based on the used method. Due to a wrong approach, most personae suffer from a low validity and are often only superficial reflections of the organization’s lack of empathy when it comes to developing insightful segments or types of their customers. As soon as insights dig deeper into the motivational and mostly subconscious preferences of customers (qualitative approach) and are additionally validated through quantitative methods (comparing qualitative hypothesis with data clusters of real customer behavior), the validity will become high.

Effort: High, if the right combination of methods is used to create personae.

Most important flaw: Wrong approach leads to superficial customer understanding and reduces the value of the gained insights.

Tip: Always connect insights from qualitative research with data to validate personae.

Field Research / Customer Interviews

Type: Qualitative + quantitative

Level: Conscious ‒ what users say is always rationally processed which makes it hard to uncover subconscious factors.

Validity: Moderate ‒ high, depending on the sample, the design of the interview and the skills of the interviewer the validity can be high, but unfortunately most studies are conducted with manipulative questions, a lack of context and small sample size which even leads to low validity of the results.

Effort: Moderate ‒ high ‒ depending on the sample size and the amount of work that is put into creating standardized interview guidelines and making sure the interviewers are well trained to gather the right insights.

Most important flaw: The most common error is to mistake the good old usability rule, where 5 participants of a moderated user test uncover 80% of the functional defects of an interface, with the requirements for samples of an interview. As soon as an interview is about opinions, motivations and attitude of users, much bigger sample sizes are required compared to what is usually applied. Hence, the effort that is needed to create valid insights for proper study design might be the biggest flaw.

Tip: Apply frameworks like the “Job To Be Done” methodology to lower the effort and raise validity, especially if you are researching the motivational triggers of users to actually buy your product and understand the real customer journey.

Conclusion and Recommendation

As you see, there is not a one-size-fits-all solution. I am a big fan of combining different methods to get many different perspectives on all levels of the customers’ reality.

Only if you understand the customer journey as a whole, the key tasks of the customer during the stages of that journey, the personal goals and fears of the customer, etc. you will be able to deduct valid and effective solutions from your insights.

Effective product discovery is based on a systematic approach ‒ not the personal opinion of different people in different teams.

I know, this usually doesn’t fit into the day-to-day operations of an agile release train ‒ but you have to escape this trap at one point and have a fact-based discovery process ready to fight the HiPPO. My personal top 3 methods for people with little time are:

Step 1) Customer interviews based on the Jobs-to-be-done and/or Persona methods to understand the customer journey, customers’ pain points and desires, and the ideal customer experience.

Step 2) 1:1 user research in a blind study to compare that ideal scenario with the real customer experience, use retrospective think-aloud so you don’t bias the decision making process.

Step 3) Expert evaluation based on heuristics and behavior patterns  to find out more about the subconscious levels of the decision making process in each stage of the customer journey.

Additionally, if you are able to invest more time and effort, do this:

Bonus method 1) Face reading ‒ use advanced face reading technology to open a window into the subconscious mind of your customers. You can measure how people really feel with terrifyingly high validity. As soon as you start doing this, you will learn how low the validity of user research without this “gimmick” really is …

Bonus method 2) Data-driven personas ‒ as mentioned earlier, I’m not a big fan of artificially created personas from creative design thinking workshops. Having a rough idea about your target customers is better than having none ‒ but you should always use real and unbiased data to validate your personas. That will also raise the acceptance of this tool a lot in your organization.

Bonus method 3) Continuous online surveys. First, qualitatively ‒ you can continuously learn a lot from new clients by asking them “Why did you choose us? What nearly blocked you from buying from us? Where would you buy if not here?”. Second, you should start measuring the customer experience quantitatively with frameworks like CSAT, CES, NPS, etc. and add a new dimension to your A/B tests …

I hope sharing my experience with the different methods in a business context helps you to move your product discovery efforts forward. Please feel free to send me your questions, your comments or any kind of feedback! I am really open to feedback, also controversial discussions are highly welcome!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.