The Higher The Chocolate Consumption, The More ... Serial Killers?

A recent article published in PLoS ONE (an international, peer-reviewed, online journal) discusses the dangers of using Big Data inappropriately.

As large-scale databases and new statistical techniques have become more widely available, a number of nomothetic studies have appeared.  These are studies that use large, cross-cultural datasets to seek relationships between demographic variables and cultural traits.  They have uncovered some seemingly surprising links between chocolate consumption and the number of Nobel laureates a country produces, for example; or between a language's tense system and the propensity to save money; or the quality of the sounds of a language with the amount of extra-marital sex; or acacia trees and traffic accidents.  Not surprisingly, findings such as these often receive much media attention and have the potential to influence public policy or to be used to support particular agendas.



But Sean Roberts and James Winters, the authors of the article, argue that these studies tend to suffer from three important problems that increase the likelihood of finding statistically significant links/spurious correlations between cultural traits.  These problems are: Galton's Problem ("similarities between cultures are the product of borrowing and common descent, therefore researchers must control for diffusional and historical associations so as not to inflate degrees of freedom"); researcher bias; and inverse sample size problem ("the noise-to-signal ratio increases exponentially with an increase in the size of the dataset").

The authors explain that nomothetic studies can be valuable, but they must be grounded in, and motivated by prior theoretical and empirical work, and they must be statistically sound and rigorous (the right statistical controls should be included): "studies that are not grounded in theory and are also poorly controlled can be misleading.  It is difficult to distinguish these studies from 'fishing' for correlations from a large set of variables, then fitting a post hoc hypothesis to the strongest outcomes. [...] Since cultural phenomena are subject to non-intuitive constraints, such as Galton's problem, it is relatively easy to produce evidence for a link between almost any two cultural variables that has the appearance of rigour."



Read more:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0070902
Frederique Laubepin

1 comment :