I've just finished Tim Harford's excellent Messy (paperback just out) alongside Giuseppe Tomasi di Lampedusa's The Leopard and both have thrown up examples of phrases that brought a smile to my face and, along with an example from a conference earlier this year, highlight one of my concerns about automatic forms of text analysis. To be clear, I'm not anti-text analysis as part of a broader piece of work, but I am concerned about the limits of automated analysis and the place of it in research practices.
Language clearly carries meaning, so when we're looking to understand beliefs, narratives, cultural norms, it would be strange to ignore a fundamental element of communication. However, from experience both in my own practice and in working with clients, I've seen that there is a strong temptation to focus on language/text analysis* as the primary source of meaning - and that is something I'm against.
Example 1. To return to Tim Harford, in a chapter on Life, he talks about the limitations of categorisation when filing things:
Making three copies of correspondence and filing once by date, once by topic and once by correspondent is a logical solution for a world in which we cannot predict whether we might need to look up all the letters sent and received late in October 2015, or all the letters about the faulty rumbleflange, or all the letters from a Mrs Trellis.
For some of us, those last two words are a clear signpost that Tim is a well-educated connoisseur of radio comedy programmes.
Example 2. In Lampedusa's Leopard, much of the early part of the novel takes part against the backdrop of Garibaldi's campaign to unify Italy in 1960 - specifically in Sicily. Young men in Garibaldi's army were clearly identifiable by their clothing as shown by this passage later in the novel:
Don Fabrizio did not quite understand; he remembered both the young men in lobster red and very carelessly turned out. "Shouldn't you Garibaldini be wearing a red shirt, though?"
For any Star Trek fans or readers of this old post, there are two important words in there...
Example 3. At the UN Data Innovation Lab event earlier this year, one piece of research into Google searches highlighted that in one country (from memory, it was Colombia - apologies if I'm mis-remembering) there were two interesting insights - when the economy is dipping, people search more often for "savings" and when the economy is rising, people search more often for "shoes". Interesting as a possible indicator (presuming that the searches precede the economic move), but in terms of understanding what people need, "savings" isn't as helpful as it might be.
For the examples, there are deeper layers of meaning that require contextual understanding:
Human interpretation by culturally and contextually appropriate people will help elicit those multiple layers of meaning. And algorithmic interpretation may help to group and theme material if it repeats in text-based data.
When we are looking to understand the meaning of large volumes of qualitative material and micro-narratives, we are better off relying first on meaning that has been signified by the contributors - the meta-data added by respondents in SenseMaker® work. Using that meta-data is a better indication of meaning initially - and then we can identify clusters of stories and text that can throw further illumination. At that secondary level of analysis, I think text analysis can significantly help - but not before.
There are, however, three other significant issues when using text analysis (or even over-simplistic tools like Wordle, as I did in the early days of working with narrative a decade ago).
Ten years ago, before I used SenseMaker®, I would happily generate wordclouds from material I'd gathered with clients. Once I'd realised it was misleading but appealling, I stopped - and we haven't used wordclouds since. These days I'm prepared to use them, but only as a secondary data visualisation to cast light on what has emerged from clusters of meta-data.
But I'm operating on less-than-perfect information - does anyone have any experience or deep knowledge that might help me put some of my concerns above to rest?
*I'm using language/text analysis generically here - I haven't yet done the research into the various analysis tools available. I have no doubt that, like any tool, there will be some that are better than others. My concerns stand in the face of any automated tool that claims to make meaning from fragmented, natural language.