Paul A Mayes
The paper considers important aspects of designing questions for children. It commences with a discussion of the constraints and limitations of conducting questionnaire-based research with children, covering such items as the difficulty of recording non-verbal reactions, and the limited attention span of younger children. From each of the points covered a criterion by which a question may be judged is derived.
The paper briefly considers some problems associated with collecting different types of frequently-required data from children. This is followed by a discussion of the benefits of collecting comparative rather than absolute data, and some comments on the analysis of scaled questions.
Finally, various types of questions, different scales and open-ended questions, are assessed according to the criteria derived earlier.
To put this paper into its correct context, and perhaps myself out of a job, my first point must be the question ‘should we really be conducting quantitative research with children?’.
Perhaps I should qualify that somewhat: we should not be carrying out research where what the child says forms our only input of data.
At its simplest, any-experienced qualitative researcher can tell you the difference between a child saying something is good while obviously itching to get his hands on it, and a child saying the same thin while gazing out of the window. Even given the unlikely and expensive option of a field-force composed purely of experienced child psychologists the problem remains: how do we construct a valid, quantifiable and above all consistent measure of non-verbal information, covering eye and body movements, verbal inflexions and whatever else may occur?
The answer is that for all practical purposes we cannot. We have to accept and recognise the limitations of the data we can collect, and interpret it in that light.
However, to be able to interpret it we need to conduct a preliminary qualitative pilot study, or at least know the product field well.
Constraints and Limitations
A problem which arises in many types of research, but which is at its most acute when researching children, is the length of time for which the respondent’s attention can be maintained. This is affected by a number of factors, the most crucial being, of course, the developmental stage of the child. It also varies substantially according to:
- The subject of the interview: bubble gum, for example, tends to be a high interest product field for children, and so they can talk about it at much greater length than they can about, say, fruit squashes and cordials, which for children are essentially a commodity market.
- The interviewer: an interviewer trained in,and used to, interviewing only adults cannot expect to take the same completely detached approach with children and maintain their interest. She needs to be more flexible in the speed she asks questions and her overall tone of voice, without, of course, biassing the answers in any way.
- Interview structure: quite simply, a questionnaire that proceeds logically from the child’s point of view will be more successful than one which repeatedly drags him off at a tangent. The careful use of stimulus materials can be of great help in boosting flagging interest
- Interview location: distractions should obviously be kept to a minimum. We have found the interviewer’s home to be a good location: it is a familiar type of environment, better than the child’s own home in that the tendency to give answers acceptable to mum is minimised – very important in fields where parental disapproval may be an influence.
If an experienced qualitative interviewer, who is controlling the flow and direction of the interview according to feedback from the child, and is allowing him to respond in a natural way, can maintain the child’s full attention for only 20 minutes, then how long will that child last with a housewife interviewer and a pre-structured, pre-coded questionnaire, especially if he is also getting his first glimpse of what the inside of a pub is like (as in the conventional British hall test)?
To maximise this time-period we must carefully select and thoroughly train interviewers, conduct the interview in a suitable location, and make the interview as short and ‘natural’ as we can. This gives us our first three criteria for a good question:
- It should not take long to ask and answer.
- It should not distract the child from the subject matter.
- The interviewer should be able to record the data without interrupting the flow of the interview.
Bill Belson has carried out research on word usage and understanding (1). Among other things, he found that only about half the UK population understood the word ‘incentive’, and only about a fifth understood ‘chronological’. If we do not have understanding of these words in common with our peers, how sure can we be of the understanding and word usage of children?
For example, our qualitative work in several product fields has indicated a discrepancy (another Belson 50% word) in usage of the words ‘flavour’ and ‘taste’ with ‘taste’ tending to refer to taste perceptions, and ‘flavour’ referring to the descriptive label, e.g. strawberry flavour.
This gives us our next criterion: researcher and respondent should share the same understanding of what the answers, and the questions mean. Also, the researcher must be aware of how the word usage changes across the developmental range of his sample universe, adjusting vocabulary where necessary to maintain the same meaning for the question.
Accuracy and Relevance
A client of ours, who must remain confidential, asked us, as an adjunct to a product test, to assess the effect a change in the product would have on sales. We pointed out the difficulties inherent in this, but agreed to carry it out purely as an experiment. We asked a purchase frequency question, and weighted the (overclaimed) result to reflect the previous year’s sales. At the end of the product test we used the same question format to ask intended purchase frequency, and applied the same weighting factor. The results indicated that a product which had been selling at a rate of 20 million units a year would, with a minor alteration, sell 660 million units a year. (The numbers have been changed to protect the innocent, but the proportions remain the same).
Thus, our next criterion is that: Our questions should provide accurate, or at least accurately interpretable, results, which should also be of some use.
One school of thought I have heard propounded is that researching children is easy because ‘kids tell the truth’. This seems to have a degree of validity – in the earlier example of a question on purchase intention the answer would probably have been close to the mark were the children always to have the correct amount of money available, the product to have perfect distribution, and all competitors to have zero distribution. Certainly children seem, qualitatively, to be less subject to some of the influences affecting adult response. ‘We’re doing some research for Yummychox. What do you think of their products?’ is likely to get a more honest answer from children than from adults. But, and this is a big but, children are not naive. If they have any reason to suspect that the lady working for Yummychox has a pocket full of sweets, this will inevitably colour their responses.
Thus, a further criterion for our questions is this: The questions should, if at all possible, avoid sources of respondent bias.
Types of Question
Before going on to consider a few question types in detail, I would like to briefly consider a few problems arising in broad question areas.
Behavioural And Awareness Questions
Defining a product field for purposes of, say, spontaneous brand awareness can pose problems both in adult and child research. An example is the product field referred to by UK marketing men as ‘savoury snacks’ (i.e. bagged snacks derived from potato crisps). The consumer, child or adult, seems to have no collective term for these, defining them simply as being related to potato crisps. The question ‘What names of savoury snacks can you think of?’, would inevitably lead to mentions of such items as ‘cheese on toast’ or ‘soup’. The only solution to this we have yet found workable is inventing fake brands and pack designs, obviously within the product field but having no characteristics suggestive of individual real brands. This admittedly difficult task allows us to define the product field for the respondent.
Our qualitative work, again, has shown that the brand name is not necessarily the prime identifier of the brand: indeed we know of several successful products with, even literate children, illegible on-pack logos and unpronounceable names.
So, the situation can arise, and has arisen, where a brand can have very low brand awareness figures because what is being measured is brand name awareness. The solution for this in prompted awareness situations is simple and obvious: use of photographic show cards illustrating the brands. If a measure of spontaneous brand awareness really is required we must go into the laborious process of getting children to describe the packs and then identify the brands from these.
Recall of purchase and consumption is even more a problem with children than with adults. It is often said that children ‘live in the world of today’, with very little consideration of the past or future – in some product fields we have found that they live in the world of the current half an hour! For example, we have carried out various sessions in a number of product fields at point-of-sale, followed by individual interviews with observed children. It is not uncommon in these situations to children unaware of what they had bought a few minutes before, So, when we ask a usage-frequency question, whether it is ‘when was the last time you got … ?’ Or ‘How often do you get … ?’ Or ‘how many time in the last week have you had …?’ We are working with very shaky estimates. Massive overclaiming is the norm, but this varies according to product field, brand within product field (over claiming favourite brand and underclaiming the others), and whether or not it was a self-purchase. Moreover, we suspect that the extent of over or underclaiming also varies according to demographic and developmental characteristics, so profile data is also unreliable. We have attempted various ways of combatting this problem: for purchase frequency, observation at point-of-sale; for usage frequency, wrapper collections; and for both, diaries. However, although these each solve some of the existing problems, they also pose new ones. I hope to report further on these complexities in a future paper.
Recall of how many units of a product were bought is rarely a problem, as children normally buy only one or two at a time. However, ownership of, say, toy cars, can be difficult to elicit. Firstly, because children will tend to include only their newer and ‘favourite’ ones, and secondly because of the difficulty young children have in handling the concept of large numbers: as in certain primitive societies, practical (as opposed to rote) counting goes ‘one, two, three, many’. The only solution to this (as the child’s mother will be unlikely to know the answers) is to get the child to bring out all his toys, and to physically count them. This, as some of our interviewers will tell you from bitter experience, can take a very long time indeed.
The first point to make here is that of saliency of attitude, and the problem of ‘putting words in children’s mouths’. To take an extreme example, we would not dream of asking a child: ‘Is Yummychox a self-indulgence product bought on impulse, or is it a carefully planned purchase bought for its practical benefits?’ However, we might well ask ‘Do you think that Yummychox is more for children than for grown-ups?’, when this may involve a concept the child has never entertained, the only relevant factor to him being whether he likes the product or not.
The problem is that, while adult thought processes tend to be verbal in nature, at earlier developmental stages they tend not to have such a structure. So, the very asking of the question can wrench the child into a completely new and alien mode of thinking.
The move examples both refer to attitudes towards abstract concepts. This Is clearly a big field, and deserves several papers to itself. We have neither the time, nor as yet, the knowledge to deal with it fully. In the meantime, we would suggest that.such concepts are dealt with purely qualitatively or at least, after very thorough qualitative piloting. The assessment of attitudes towards more concrete concepts is discussed in more detail throughout subsequent sections of the paper.
Absolute Versus Comparative Data
Physiological sensations can best be described comparatively, rather than absolutely. While the average human cannot, for example, tell you how hot a glass of water is by touch, he can detect that one glass is hotter than another (though not by how much) when the difference in temperature between them is as little as 4 degrees. This is known in psychological literature as the j.n.d. – ‘the ‘just noticeable difference’. With children this can be extended further: put three different bars of chocolate in front of a child and ask him what he thinks of them and the answer is likely to be that they are all good, because a bar of chocolate is, almost by definition, good. Ask him to rank them best, second best and third best, however, and he will do it for you clearly, confidently, and consistently. By the by, it is sometimes advisable to avoid asking children which one is worst as some children will baulk at the negative connotations of the word, and say that none of them are ‘worst’.
Our point is that in every product field we have done such work, comparative rather than monadic testing is to be preferred.
Let us give a quantitative example: from a test on two confectionery-type products, M and T, which were tested on 100 children aged six to nine. 48 tried T first, and 52 tried M first, and the ratings they gave the products (on a scale we will discuss in more detail later on) were as follows:
The results illustrate the similar ratings given to the first product asked about, with discrimination only in evidence when the second test product is evaluated. Perhaps, the rating given first is a measure of attitude towards the product field.
A further interesting point for discussion indicated by the results – though our sample size here is rather small is that the above seems to hold whether the second test product is not tried until the first has been evaluated, or whether both products are tried -before any questions are asked.
The above is obviously only an illustration, but it does reflect the experience we have gained over a number of product fields of the extreme insensitivity of such monadic testing among children.
Methods of Analysis – Scaled Question
Before looking closer at various types of scaled questions we should consider the type of analysis to be used. Our opinion is that, with children at least, the stage points are not equidistant on any scale we have seen – for example the difference in attitude between ‘quite good’ and ‘very good’ is not the same as the difference between ‘quite bad’ and ‘very bad’ – and so non-parametric tests should be used where possible.
To give a simple example:
In our test of the two products T against M, 49 respondents were asked to compare the shapes of the two products using the following scale:
- A much better shape
- a better shape
- both just as good
- not as good a shape
- not at all a good shape
- not at all as good a shape
Respondents were each asked to compare the products twice: once comparing T with M (“does T have a much better shape than M, a better shape than M …” etc.) and once comparing M with T (“does M have a much better shape than T …” etc). Not surprisingly, this dual question confused respondents (and interviewers), and we only ended up with 30 useable pairs of results. However, it is interesting to look at the cross-analysis of these:
The relationship is.clear: both “M much better shape than T” and “M better shape than T” are equivalent to “t not as good a shape as m”: the points on the scale are not equally spaced along the continuum of attitudes. This simple example, of course, explicitly rules out parametric analysis for only one, rather unusual, scale. It does show the need for pausing to think, however, before applying it to the other types of scale.
Assessment of Question Types
Having defined some, at least, of the criteria which constitute a “good” question, let us now consider the performance according to these of a number of types of question, each designed to assess attitudes, mainly with regard to concrete concepts.
In this we are regarding the main “target group” for our questions as 6 – 9 year-olds involved in a product test.
Diagrammatic – Smiley Scale
Our qualitative experience with Smiley (and other diagrammatic) scales has shown that a careful introduction is required: it must be explained that these are the reactions of the same child to different things, as respondents are quite likely to pick the face they like most – detailed sex-specific drawings are useful here. (Figure 1) It is also helpful if a carefully-chosen example can be given to check the child’s understanding of the scale. Unfortunately, all this introduction takes time, prolonging the length of the interview, and drawing the child’s attention away from the product field under discussion.
A positive feature of diagrammatic scales is, of course, that the introduction and explanation apart, there are no problems with word usage – there is no danger of using a problem word in one of the scale positions.
With specific regard to the Smiley scale, it is popular amongst interviewers:- they report that respond ents have very little difficulty in using the scale and seem to enjoy doing so. The main problem associated with it is also mentioned frequently by interviewers: “they don’t use the bottom end”. This is illustrated in the following example, with 50 children assessing the products M and T.
The use of the IVE Flachenscale was reported by Eberhard Lebender (2) In him paper at the last ESOMAR seminar on researching children, so I will not go into it in detail, except to mention that in our experiments with it interviewers commented that intensive explanation was required to get children to understand how to use it.
Numerical – Mark out of ten
It is, perhaps, unfortunate that the practice of giving children marks out of ten for school-work is no longer common in English schools, for it is clear to us that a large proportion of children are no longer automatically familiar with this concept. Thus the advantage that this scale may once have had over other non-verbal scales – that of requiring little explanation – has been somewhat dissipated. Indeed, some children have difficulty in grasping the direction of the scale: to them number one is naturally the best.
As with diagrammatic scales, numerical scales pose no problems with word usage other than the introduction and explanation, and interviewers find it easy to use and understand.
The eleven (including 0) points on the scale might lead us to hope for a more subtle and sensitive shading of opinion in the response patterns. Unfortunately, as the earlier section on comparative versus absolute data suggests, this is not the case. Observation of the scale in use shows that most children automatically, without stopping to consider, give their preferred product a high score, just dropping one or two points for their non-preferred product. In the words of one interviewer: “They just follow their previous idea, giving their best one 10 and the other one 9”. Indeed, in that case just over 50% (on a sampled 100) did actually give their preferred product 10, and over two thirds had 2 or less Points separating the test products. It would not be productive to labour the point unduly, but as mentioned earlier, response patterns can differ from product field to product field; but this does illustrate the type of response gained.
Perhaps I should add here that I have never learnt anything from the results of a mark-out-of-ten ques tion that I had not previously gained from a simple overall preference question.
The first problem with verbal scales is that of illiteracy among respondents. Interviewers have to read out the positions on the scales (which in itself can cause bias) while pointing to them. Children then have to remember hat each scale position is in order to answer the question, thus diverting their attention away from the products under discussion to the workings of the scale itself.
Another problem is the choice of words to use. I once came across an eleven point verbal scale, which from “fabulous” to “absolutely diabolical”. In case children would be much more likely to take description at its face value, whereas adults would tend to realise that they form a continuum. So in the example of the eleven point scale, adults would recognise that “brilliant” was better than “marvellous” from its scale position, whereas children would not necessarily do so. Current “buzz-words” are also likely to crop up – we have encountered “not too bad” being used as the highest praise possible to bestow!
One verbal scale we have experimented with stems from a suggestion made by David Cocks of CWA. Two examples of it are shown in Figure 2. It has a number of design features designed to combat some of the problems commonly arising with verbal scales:
- It uses the same core adjective throughout, defining the scale points with qualifiers. This avoids to a large extent the problem of one of the scale points including a “buzz-word” or an unfamiliar word.
- It includes a diagrammatical element in the hour-glass shape. This is designed as a aide to the child in remembering, interpreting and identifying the points on the scale. Quite simply, the stronger the feeling, the longer the line.
- Another feature is that the mid-point is not neutral. This is an attempt to get respondents to use more of the scale than the top two points.
A positive feature that this scale has in common with other purely verbal scales is that it.requires little or no explanation for its use.
The results obtained from the scale are illustrated in Table 3 which is from our product test on products M and T.
It is clear that the scale was used only down to the mid-point, and this corresponds with results obtained by the scale in other product fields. Obviously this scale is not, as yet at least, the panacea for all our problems.
Scales in general
Let us just summarise a few problems that the scales we have described have in common.
The main problem is that of saliency of attitude. While we can be fairly sure that overall liking is a relevant question to ask, the example showing misunderstanding of a question on sweetness show that we cannot go much further than that without carefully feeling our way first.
The whole purpose of scales is to measure degree of feeling. As discussed earlier, children do not necessarily have a “degree of feeling” that they can consciously express. What they can express is a simple comparison: better, worse; more, less.
Batteries of scales seem to lead very quickly to boredom and non-discrimination from the child, plus frustration, because he wants to tell us what he thinks and we are not really giving him the opportunity to do so.
A simple alternative to scales is ranking. This is obviously relevant only in the context of a comparative test, but as mentioned earlier, in most cases we regard this as preferable to a moradic test anyway.
The procedure for asking the question is quick and simple: the respondent is asked which product had the nicest taste, the strongest smell etc. This method does retain the disadvantage that the attributes to be discussed have to be pre-determined. An example of the type of results gained is shown in Figure 3.
Open ended questions
We tend to make extensive use of open-ended questions. Their advantages are that:
- They require no special instructions for the child.
- They do not distract the child from the subject of the interview
- The words the child uses in his answer are his own.
The drawbacks include:
- The data takes time to record, especially as it is essential to record comments verbatim, and children will respond at great length in some product fields
- There are occasionally problems with confabulation: the child has a concept he wishes to express, but cannot translate this non-verbal concept into words. For example: he may wish to say that a product has a stronger flavour, but ends up saying that it is “more green”. Results obviously require careful interpretation.
We generally find, ay least in a product field where children have a high level of interest and thus a good vocabulary, that several open-ended questions are desirable to gain full discrimination between the products: questions such as ‘what where the best things about …?’, ‘what were the worst things about …?’, ‘how could it be made better?’, and, of course, ‘why was that one the.best?’. The reason for this is that generic attributes are often mentioned as a reason for preference, and the difference in actual performance of the tested products on that attribute is only elicited through asking ‘how could each one be made better?’
Analysis and interpretation requires special care and attention given the above. A full manual analysis is not practicable on larger-scale surveys, but close executive involvement is required in the listing of answers and construction of a comprehensive code frame. A knowledgeable lister and coder is also necessary.
As a brief addendum to this review of question types we would like to mention some recent experience we have had in conducting mapping exercises with children. The methodology for this (carried out by qualitative executives) was for children to define for themselves the end-points of a five point scale by means of ranking. So, for example, the most expensive brand was put in scale position 5, the cheapest in scale position 1, and the remaining brands positioned accordingly. Many stimulus materials were used: product, packs, photographs and the interviewers were able to vary the order of approaching scales, fit the description of each scale to the respondents developmental stage and check on understanding and relevance. Even with all this, it was only possible to maintain involvement with 6 – 7 year olds for about 20 minutes. They were all perfectly willing, and indeed eager, to talk about. the product field in an unstructured way, but found the mapping process very demanding. In the limited period of time respondents were able to concentrate on the task, they could position the brands clearly and consistently, and having anticipated the possible problems in the design of the project it provided very useful information for the client.
It will be clear by now that the approach we have taken assessing question designs is fundamentally qualitative, in that we have concentrated on the process of asking and answering a question somewhat more than details of the results gained from them. In researching children, where we are all talking in a foreign language, it is clear to us that this qualitative approach to quantitative research must be fundamental.
Paul A Mayes of CWA/Kiddycounter
1. Belson, William A. The Impact of Television, London, Crosby Lockwood and Sons ltd., 1967
2. Lebender, Eberhard. Experiences with two types of scales for measuring attitudes in childrens market research” in ESOMAR Seminar: Researching Children, Aarhaus 1978.