Data science, AI, and Big Data have been the biggest buzzwords of the technological world over the last years. But even though there’s a lot of marketing fluff involved, these technologies do make a real difference in highly complex industries like healthcare, financial trading, travel, energy management, social media, fraud detection, image and speech recognition, etc. With the digitalization of the world economy and pretty much every aspect of life, data became the new oil (kudos to Clive Humby for coining the metaphor). And data science in turn became the sexiest job of the 21st century.
But that’s really cutting a long story too short. Let’s look at what happened in more details.
Ancient History of Data Science
It is hard to determine when exactly data science took shape as there are different schools of thought in this regard. Some even say that it all started with the emergence of writing since individuals (philosophers and scholars) interpreted data passed through the generations.
Others believe that it was born in 1946 with the first digital computer. But majority associates the term with two works: “The Future of Data Analysis” (1962) by John Tukey and “Concise Survey of Computer Methods” (1974) by Peter Naur. Here’s how the later introduces the term: “Data science is the science of dealing with data, once they have been established, while the relation of the data to what they present is delegated to other fields and sciences.” So in the interest of history, Tukey and Naur were the first to express the idea of data processing by methods of statistics.
Over the next years, as computers became cheaper and storage space grew, the amounts of data grew exponentially too. And so the era of data science started with data becoming big.
The Rise of Truly Big Data
According to the Parkinson’s law, data expands to fill the space available for storage. At the turn of the century we saw a perfect illustration for that as a period of all-consuming computerization of companies and governments began. Previously unimaginable amounts of data were being collected and processed.
But simply collecting data isn’t enough. The immense data sets, as any resource, need to work and bring value. So with time, organizations began to view data as a commodity upon which they could capitalize instead of just storing it.
At this time, academic world began to recognize data science as an emerging discipline. It was first put into context of computer science and data mining by William S. Cleveland in 2001. Along with the change in the definition, there were changes in the way of analysis. A new type (Predictive) was introduced to complement commonly used Descriptive and Diagnostic types.
During this time banking, finance, insurance, and pharma industries began to use software for data analysis, like SAS. And the markets were challenged with a problem: the creation of new complex mathematical tools and the analysis of large data sets require people with narrow specialization. Thus, to quote Kenneth Cukier:
… a new kind of professional has emerged, the data scientist, who combines the skills of a software engineer, statistician, and storyteller/artist to extract nuggets of gold hidden under mountains of data.
Data Science Now
Since 2010 the popularity of data science has exploded. The number of data scientist job between 2010 and 2012 increased by 15,000% and kept of growing since. There are academic majors that train specialists in data science. You can even get a Ph.D. in this field and dozens of conferences are held annually in data science, big data, and AI.
The main reasons for this level of interest are:
- The need to analyze a growing volume of data collected by corporations and governments.
- Price reduction of computational hardware, improvement of computational software, and the emergence of new data science methods.
High popularity of social networks and online services/products uncovered unlimited potential of monetization through analysis and personalization. Hence more companies each day invest heavily it data science and even form teams of people to analyze collected data.
As of now, data science is the field that combines specific domain expertise, programming skills, and knowledge of math and statistics to extract meaningful insights from data. This makes data scientist a specialist with a multitude of skills, including computer programming (R, Python), statistics, machine learning (e.g. scikit-learn), and visualization (Tableau, QLikview, ElasticSearch Kibana, Matplotlib etc.). But most importantly, they need to be creative and open-minded.
Data is the new kind of currency. But unlike oil, this resource is only valuable if it’s in the right hands of skilled professionals. In fact, it’s much better than any natural resource. See, we’ll never run out of data. Acquiring it does not require huge amounts of labor, it has more applications, and it gives more power to those who control it. Thus, the growing importance of data science is a natural result of entering the digital era.