Blog

There is a disturbing trend in our industry and among many client companies to mistake the mining of big data (including web based transactions of any kind) by automated processed as valid “research” without question.  It can be, but is not unconditionally so.


It is data collection, and like ALL data collection, what is being collected requires validation and this seems to be sorely missing in many processes today.


I can understand this; much of this is due to necessity. There is so much data, that it cannot be managed by people, only by machines. We require filters to sift through the morass of minutiae that is growing by the second online.  But filters are crude tools. The underlying assumption in a lot of mining data (the use of filters notwithstanding) is that all data mined is equally valid because it exists.


Being trained as a historian, this makes me cringe. Everyone knows (or should know) the web contains a huge amount of erroneous data, lies, and misinformation.  There is no filter for bullshit and crap.


These data sources are indeed literally “facts” (in that they exist & are verifiable in their existence) but more properly I would say they are digital artifacts, and these artifacts are being gathered and analyzed in dashboards everywhere. Existence does not equal either truth or even valid opinions.


We in the research & analysis business need to be at the forefront of discerning what digital artifacts are valid source materials for analysis for market and social research projects. It's not an easy job. I think there will always be need for some human interaction with data in order to make sense of it to other humans (and clients still fall into that category).


I love the use of the latest technologies in my job, but as a means to an end, not as an end in itself.


And do not believe for a moment in the death of survey research just because we can “listen” instead of “ask”. It is easier to filter respondents than it is to filter terabytes of data. And we don’t have to sift through a morass of data that has nothing to do with the research process just to get to that which does.


A questionnaire is focused whereas listening to the generic “conversations” online is like gathering opinion data on the service in a restaurant by eavesdropping on conversations among diners. Many people are not talking about the food.