In search of critical thinking. The new Data Scientist between correlation and causality

By Davide Fabrizio, Partner, Chief Analytics, IT & Consulting Officer at Conento

In this short post I analyze one of the main headaches of HR departments: the search for talent. This is intended as a reflection about the Data Scientist, currently one of the most highly valued profiles on the market. This professional must be able to combine technical skills with interpretive capacity and critical thinking, but the balance we can find is not always optimal or the desired one.

In search of critical thinking.  The new Data Scientist between correlation and causality.

In Conento, where we are focused on Analytics and Big Data projects, we are always engaged in ongoing selection procedures for new profiles. Our focus is on finding talented Data Scientists. While it is true that every day there are more and more Data Scientists on the market, it is also true that the difficulty in finding profiles that fit our needs is increasing. Out of every 100 Data Scientists that enter our selection process, after making a first CV filter, only 2 are hired. We are talking about 2%, with a rate that has been decreasing over the last few years.

What’s going on? On the one hand, a more competitive market compels us to using more rigid selection criteria. On the other hand, there is the feeling that universities and institutions, with their different master programs in Big Data, Data Science and Machine Learning, are “generating” many Data Scientists who suffer from what I call “the correlation syndrome”.

This means that the new prototype of Data Scientist seems to have correlation as a priority, not causality. It seems that it is no longer interesting to analyze data asking the why of things or whether our results make sense. What matters is to get, as quickly as possible, a result with the most sophisticated Machine Learning algorithm: “hit the button” and see what comes out, without looking back. This situation is becoming commonplace in the practical tests that we provide to candidates in the selection processes, with our increasing amazement and disbelief. The lover of correlation has a blind faith in algorithmic logic -which, after all, is a nihilistic and totalitarian vision- relinquishing the “narration” of data and numbers.

It is curious to observe how there is an ever-increasing talk about an artificial intelligence with more and more efficient and precise algorithms, but which needs to be coupled with human intelligence, the only one -still- always able to analyze in depth the why of things, that is, cause-effect relationships. But, in the case we are analyzing, something different happens: it seems that the Data Scientist wants to follow the steps hand in hand with artificial intelligence, becoming a clone of it, that is, focusing his attention on mechanical and repetitive tasks and renouncing to bring real added value: critical thinking.

This deficiency actually reflects a new dynamic of modern society, which mixes new living and consumption habits, technology and educational models: the difficulty of having a vision of things that is not superficial is obvious, in a world of speed and continuous acceleration that leaves no time to look back, reflect and contemplate. Technology reduces distance and time, and this would allow us, theoretically, to free up time to think; but, instead of doing so, we prefer to fill this new space with “empty” activities, replicating indefinitely -like machines- mechanical processes with no real value: adding strangers to our social networks, reading and discussing about contents which do not contribute anything, checking our email compulsively…

Acceleration makes us lose the ability to follow a process of standard data analysis (as traditional statistics has always performed). Prior to launching the modelling stage, a thorough evaluation of the quality of the available data is necessary, a good construction of metrics and a descriptive analysis to capture first associations between variables. And, following the modelling, a careful evaluation of the results and calibrations in order to strike a balance between mathematics and logic. A progressive construction path of the model, with different stages that increase knowledge and understanding of the problem we are analyzing. It seems we are losing all this, and turning back is very difficult.

We are concerned because we believe that, beyond the technological revolution and social changes, something in the educational and training processes is failing. It is not easy to identify potential solutions (and this would be another debate), but the love of causality would have to be again the guide in our journey: the Data Scientist who combines this aspect with technical knowledge will succeed in the labor market of the future.