Believe me, believe me not: Analyzing survey data quality

Qualité données d'enquête

“I usually just answer ‘average’ for every question”, admits a focus group participant whom we invited to discuss surveys from the perspective of a consumer. “Yes, - agrees another discussant, - I lose my focus after… four questions, and then… I don’t really care about the answer”. This is bad news for the marketers who rely on survey research to uncover consumers’ thoughts and motivations.

Whether your marketing department is using consumer surveys to assess brand perceptions, brand usage, or the characteristics of your target consumer group, it is essential that obtained survey scores reflect your respondents’ true opinions. Consumers might be motivated to partake in a survey for any number of reasons - such as interest in the topic, enjoyment, curiosity, motivation to help, sense of obligation, or need for recognition. Despite this, marketing researchers are regularly faced with a phenomenon called “careless responding”. It is described as a response style that is random in regard to the question’s content – in other words, providing a “random” score. Careless responding seems to be especially common when it comes to online and mobile surveys. This could be because consumers experience higher level of anonymity and decreased accountability in such digital environments. In addition, the process of completing a questionnaire on a mobile device is so effortless that it can encourage rushing through the questions. By some estimates, inattentive responses may constitute 10% to 42% of online survey data.

Typically, people’s justifications for providing inaccurate survey responses fall into one of two categories: (1) reasons related to the question content (e.g., questions judged overly personal), and (2) reasons unrelated to the question content (e.g., impatience, disengagement, inattentiveness, or mischievousness). Whether attributable to questionnaire design flaws (e.g., lengthy questionnaires that make participants lose their interest and experience fatigue) or social contract breakdowns between researchers and survey participants, inaccurate responses threaten marketing research trustworthiness. Poor quality data resulting from such self-reports can mislead the managers and encourage business actions that irreparably harm companies. It is therefore paramount that after a data collection campaign is completed, analysts can identify and then purge problematic data. Various data quality control techniques can be employed for this purpose. These techniques differ in their implementation difficulty. You can select a suitable approach depending on the size of your marketing department, and on the skills set of your analysts.

Implementation difficulty: Easy

1. Response time assessment

For this method, your team would need to determine the minimum time required to complete the survey. You can test your questionnaire on a small group that is similar to your intended survey respondents. In some cases, an internal test on a group of employees can suffice. This will allow you to establish a “cut-off” and eliminate cases with impossibly fast times that indicate careless responding. For instance, if a survey pretest on 10 consumers showed that, on average, it takes 10 minutes to thoughtfully answer all questions, an analyst might want to exclude from analysis the respondents who took less than 3 minutes to answer the questions.

- Pros: Only basic analytical skills set is required. The method is easy to explain to management and other stakeholders.
- Cons: Time “cut-offs” are often arbitrary. The method does not allow for identification of respondents who provided careless answers, yet were logged into the survey for a longer period of time.

2. Self-reported “use me” prompt

This method requires you to include a single questionnaire item in which the respondent tells you if they think they were sufficiently attentive. Typically, it is presented at the end of the survey, and could be worded as follows: “In your honest opinion, should we use your data in our analyses in this study?”. If participation in the survey is rewarded via monetary or other tangible incentives (e.g., promotional coupons or a prize draw), it is important to let the consumer know that they will still receive their survey compensation regardless of the answer they will provide to this question. The answer can be recorded as a simple “Yes or No” question, in which case only the respondents who answered “Yes” must be used for further analyses; or as a continuum (e.g., 1 to 7, with 1 being “Definitely Not” and 7 being “Definitely yes”), in which case the analysts need to agree on a cut-off value (e.g., 4).

- Pros: The measure can be highly accurate, because the participant is the one who knows best if they were careless in their responding.
- Cons: This technique depends on the honesty of the participants, and their attentiveness at a time when the question was asked. Selected “cut-offs” can be arbitrary.

3. Self-reported effort

Similar to the previous method, this technique suggests directly asking consumers to disclose how much effort they put into completing the survey. This can be done with one or more questions, for instance: “I worked to the best of my abilities on answering this questionnaire”, “I put forth my best effort in responding to this survey”, “I would be interested in reading about the results of this study”. Further, consumers could be asked to indicate the degree to which the statement “I’m in a hurry right now” applies to them – obviously higher scores on this item would indicate less effort that the respondent put in the survey. If multiple questions are used, the self-reported effort score can be averaged across these questions. Once again, this method assumes that the analytical team sets a cut-off value for eliminating “bad” responses.

Pros and Cons: Same as above.

4. Attention probes

This technique calls for incorporating “trap questions” into your survey. These “hidden questions” are meant to detect whether a respondent is actually reading the text of the question, instead of mindlessly clicking through the survey. There are two types of such questions. First, you can use a simple instructed response item that will read “Please select strongly agree if you are reading this question”. Second, you can utilize so called “bogus questions”, where only one answer is possible. For instance, when answering a question that states “I sleep less than one hour per night” or “I see aliens every day”, an attentive respondent is bound to select “Strongly disagree”. After the data is collected, a total number of incorrectly answered attention probes is calculated for each respondent, with the latter treated as a data quality indicator (e.g., missing one probe is worse than missing zero probes, missing two probes is worse than missing one probe, and so on). The analytical team decides on the cut-off value that is judged to be an excessive number of missed attention probes.

- Pros: Only basic analytical skills set is required. The method is easy to explain to management and other stakeholders.
- Cons: Some survey researchers suggest that attention probes can influence respondents’ mood. In other words, the participants might feel distrusted, and annoyed by the attempts to test their attention.

Implementation difficulty: Moderate

Long-string analysis

This technique revolves around measuring the number of times a respondent answered with an unbroken sequence of scores. For instance, imagine that your survey contains 10 questions each asking the respondent to indicate to which extent they agree to a certain statement. If the scale runs from 1 (strongly disagree) to 7 (strongly agree), and a respondent answered “2, 3, 3, 3, 3, 3, 3, 3, 4, 4”, then the longest string is 7 identical scores (all “3”). This can be calculated for each respondent as the longest string in the whole survey, or on each page of the survey, or the average of the longest strings across multiple survey pages. Once you have calculated this for each participant, you need to search for outliers – in other words, participants who provided unusually long strings of the same answers. The assumption is that a consumer excessively using unbroken sequences is not reading the survey questions attentively, but rather provides the same score to each question. One way to search for the outliers is to create a histogram of the longest string for each respondent, and see if any respondents really “stand out”. If you see clear outliers, you can conclude that those respondents were careless, and drop these cases from further analysis.

- Pros: The technique is reliable, as it compares each respondent to the others who completed the same survey.
- Cons: More advanced analytical and software skills are required for the implementation. The structure of the survey and the type of questions might prompt even attentive respondents to provide an unbroken sequence of scores.

Implementation difficulty: Difficult

1. Individual consistency for synonyms

It is safe to assume that attentive consumers respond to the surveys in a consistent way. For instance, if they have an overall positive attitude towards the brand, they are likely to provide fairly high scores when indicating their agreement with the statements such as “I like this brand”, “I have a favorable opinion of this brand”, and “I would recommend this brand to my friends”. These similar questions are called “psychometric synonyms”, and tend to have a high “inter-person correlation”. In other words, if a participant is attentive when taking the survey, their scores on the questions that are alike will correlate. Correlation of +1 indicates a perfect positive correlation. If the questions are as similar as the ones listed above, an analyst should expect the correlation scores of at least +0.6 or higher. Typically, survey researchers select as many of highly related questions as they can find, and calculate the correlation for each pair of questions. If the numbers for some respondent are consistently low (perhaps closer to 0, showing no correlation between similar items), this is a red flag, and the data provided by this respondent are probably best discarded.

- Pros: This technique is highly accurate, and provides insights into each individual respondent’s behavior.
- Cons: Your analysts need to demonstrate their statistical prowess when computing the correlation coefficients. Further, this method is not suitable for shorter questionnaires, because it requires a lot of “question pairs” for the calculations to work. It is recommended to analyze a minimum of 30 pairs.

2. Individual consistency for antonyms

Similarly, some analysts look at the “psychometric antonyms” – the questions that indicate the opposite opinion. For instance, a consumer cannot agree with both of the following statements: “I have never purchased this brand” and “I regularly purchase this brand”. These questions are guaranteed to be negatively correlated, which must be indicated by a correlation coefficient between -0.6 and -1. If this strong negative correlation as absent, it is likely that the respondent did not give much thought to the question.

Pros and Cons: Same as above.

3. Even-odd consistency analysis

This method will require you to compute the correlation coefficients once again. For it to work, each scale in the survey needs to be split into two parts: even numbered and odd numbered questions. For instance, imagine that perception of brand luxuriousness was investigated using 6 questions: “This brand is high-end”, “This brand is associated with luxury”, “This brand is selective”, etc. If the respondent filled out the questionnaire seriously and attentively, then the scores for the odd numbered questions (the first, third, and fifth question) should correlate highly with the even numbered questions (the second, fourth, and sixth question). When this correlation is low, response behavior is inconsistent. It is advisable to discard the data from the respondents who scored below 0.30 on this correlation index. Some careless respondents know that providing an unbroken sequence of scores (e.g., “3, 3, 3, 3”) can look suspicious, so they attempt to alternate their answers instead (e.g., “2, 4, 2, 4”). These types of patterns are effectively detected by the even-odd consistency technique.

- Pros: This technique is highly accurate, and provides insights into each individual respondent’s behavior.
Cons: Your analysts need to demonstrate their statistical prowess when computing the correlation coefficients. Further, this method only works if you use several items to assess the same concept (e.g., brand attitudes, specific brand perceptions, and so on).

Conclusion

This list of data quality assessment techniques is not exhaustive. Depending on the training of your analysts, and the statistical software available to them, they might want to employ even more sophisticated methods. For instance, a multivariate technique of assessing Mahalanobis distances identifies unusual data points relative to other respondents who completed the same survey.

It might also be a good idea to create a Data Quality Index that your department will use internally to purge survey data of subpar quality. Such index can include multiple metrics, for instance, response time, missed attention probes, and self-reported effort. Combining several metrics into one index allows to capitalize on the advantages of each technique, while simultaneously mitigating their shortcomings.

Regardless of the approach that you will deem appropriate for your project, make sure that you always assess the data quality before conducting further analyses. After all, bad data can only produce bad managerial strategies.