A Brief History of Data Science
Data science, AI, and Big Data have been the biggest buzzwords of the technological world over recent years. But even though there’s a lot of marketing fluff involved, these technologies do make a real difference in highly complex industries like healthcare, financial trading, travel, energy management, social media, fraud detection, image and speech recognition, etc. With the digitalization of the world economy and virtually every aspect of life, data has become the new oil (a term coined by Clive Humby). Subsequently, data science has become the sexiest job of the 21st century.
But that’s really cutting a long story too short. Let’s look at the development of data science in more detail.
Ancient History of Data Science
It is hard to determine when exactly data science took shape, as there are different schools of thought pertaining to this topic. Some even say that it all started with the emergence of writing, since individuals (philosophers and scholars) interpreted data passed through the generations.
Others believe that it was born in 1946 with the first digital computer. However, the majority of thinkers associate the term with two works: “The Future of Data Analysis” (1962) by John Tukey and “Concise Survey of Computer Methods” (1974) by Peter Naur. Here’s how the latter introduced the term: “Data science is the science of dealing with data, once they have been established, while the relation of the data to what they present is delegated to other fields and sciences.” So, Tukey and Naur were, historically, the first to express the idea of data processing by methods of statistics.
Over the next years, as computers became cheaper and storage space grew, the amount of data stored also grew exponentially. And so, as data grew in volume, the era of data science began.
The Rise of Truly Big Data
According to Parkinson’s law, data expands to fill the space available for storage. The turn of the century provides a perfect illustration of this point, as a period in which all-consuming computerization of companies and governments began. Previously unimaginable amounts of data were being collected and processed.
But simply collecting data isn’t enough. Immense data sets, as any resource, need to work and bring value. So, with time, organizations began to view data as a commodity upon which they could capitalize, instead of just storing it.
At this time, the academic world began to recognize data science as an emerging discipline. It was first put into the context of computer science and data mining by William S. Cleveland in 2001. Along with changes in the definition, there were also changes in how analysis was done. A new type of analysis (Predictive) was introduced to complement commonly used Descriptive and Diagnostic types.
During this time, banking, finance, insurance, and pharma industries began to use software for data analysis, such as SAS. And, consequently, the markets were challenged with a problem: the creation of new complex mathematical tools and the analysis of large data sets require people with narrow specialization. Thus, to quote Kenneth Cukier:
… a new kind of professional has emerged, the data scientist, who combines the skills of a software engineer, statistician, and storyteller/artist to extract nuggets of gold hidden under mountains of data.
Data Science Now
Since 2010, the popularity of data science has exploded. The number of data scientist jobs increased by 15,000% between 2010 and 2012, and has continued to grow since. There are academic majors dedicated to training specialists in data science. You can even get a Ph.D. in this field, and dozens of conferences are held annually on the topics of data science, big data, and AI.
The main reasons for such a high level of interest in this field are:
- The need to analyze a growing volume of data collected by corporations and governments,
- Price reductions in computational hardware,
- The improvement of computational software, and
- The emergence of new data science methods.
The high popularity of social networks and online services/products has uncovered unlimited potential for monetization through analysis and personalization. Hence, more companies invest heavily in data science each day, even forming teams of people to analyze collected data.
As of now, data science is a field that combines specific domain expertise, programming skills, and knowledge of math and statistics to extract meaningful insights from data. This makes the data scientist a specialist with a multitude of skills, including computer programming (R, Python), statistics, machine learning (e.g. scikit-learn), and visualization (Tableau, QLikview, ElasticSearch Kibana, Matplotlib etc.). But, most importantly, a data scientist must be creative and open-minded.
Conclusions
Data is a new kind of currency. But, unlike oil, this resource is only valuable if it’s in the right hands - specifically, those of skilled professionals. In fact, it’s much better than any natural resource, for multiple reasons. Firstly, we’ll never run out of data. Furthermore, acquiring it does not require huge amounts of labor, it has a greater range of applications, and it gives more power to those who control it. Thus, the growing importance of data science is a natural result of entering the digital era.
Interesting For You
What is Data Science?
In recent years, data science has become increasingly prominent in the common consciousness. Since 2010, its popularity as a field has exploded. Between 2010 and 2012, the number of data scientist job postings increased by 15 000%. In terms of education, there are now academic programs that train specialists in data science. You can even complete a PhD degree in this field of study. Dozens of conferences are held annually on the topics of data science, big data and AI. There are several contributing factors to the growing level of interest in this field, namely: 1. The need to analyze a growing volume of data collected by corporations and governments 2. Price reductions in computational hardware 3. Improvements in computational software 4. The emergence of new data science methods. With the increasing popularity of social networks, online services discovered the unlimited potential for monetization to be unlocked through (a) developing new products and (b) having greater information and data insights than their competitors. Big companies started to form teams of people responsible for analyzing collected data.
Read article
Data Science in E-Commerce
More than 20 years ago, e-commerce was just a novel concept, until Amazon sold their very first book in 1995. Nowadays, the e-commerce market is a significant part of the world’s economy. The revenue and retail worldwide expectations of e-commerce in 2019 were $2.03 trillion and $3.5 trillion respectively. This market is developed and diverse both geographically and in terms of business models. In 2018, the two biggest e-commerce markets were China and the United States, with revenues of $636.1 billion and $504.6 billion respectively. Currently, the Asia-Pacific region shows a better growth tendency for e-commerce retail in relation to the rest of the world. Companies use various types of e-commerce in their business models: Business-to-Business (B2B), Business-to-Consumer (B2C), Consumer-to-Consumer (C2C), Consumer-to-Business (C2B), Business-to-Government (B2G), and others. This diversity has emerged because e-commerce platforms provide ready-made connections between buyers and sellers. This is also the reason that B2B’s global online sales dominate B2C: $10.6 trillion to $2.8 trillion. Rapid development of e-commerce generates high competition. Therefore, it’s important to follow major trends in order to drive business sales and create a more personalized customer experience. While using big data analytics may seem like a current trend, for many companies, data science techniques have already been customary tools of doing business for some time. There are several reasons for the efficiency of big data analytics: · Large datasets make it easier to apply data analytics; · The high computational power of modern machines even allows data-driven decisions to be made in real time; · Methods in the field of data science have been well-developed. This article will illustrate the impact of using data science in e-commerce and the importance of data collection, starting from the initial stage of your business.
Read article
Predictive Analysis in Business
Decision-making in business is often based on assumptions about the future. Many companies aspire to develop and deploy an effective process for understanding trends and relationships in their activity in order to gain forward-looking insight to drive business decisions and actions. This is called predictive analytics. We can define predictive analytics as a process that uses data and a set of sophisticated analytic tools to develop models and estimations of an environment's behavior in the future. In predictive analysis, the first step is to collect data. Depending on your target, varied sources are using, such as web archives, transaction data, CRM data, customer service data, digital marketing and advertising data, demographic data, machine-generated data (for example, telemetric data or data from sensors), and geographical data, among other options. It is important to have accurate and up to date information. Most of the time, you will have information from multiple sources and, quite often, it will be in a raw state. Some of it will be structured in tables, while the rest will be semi-structured or even unstructured, like social media comments. The next important step is to clean and organize the data - this is called data preprocessing. Preprocessing usually takes up 80% of the time and effort involved in all analysis. After this stage, we produce a model using already existing tools for predictive analytics. It is important to note that we use collected data to validate the model. Such an approach is based on the main assumption of predictive analytics, which claims that patterns in the future will be similar to the ones in the past. You must ensure that your model makes business sense and deploy the analytics results into your production system, software programs or devices, web apps, and so on. The model can only be valid for a certain time period, since reality is not static and an environment can change significantly. For example, the preferences of customers may change so fast that previous expectations become outdated. So, it is important to monitor a model periodically. There are plenty of applications for business based on predictive analytics. To conclude this article, we will briefly consider some of them.
Read article