Data scientists work with different business needs to discover insights from existing data. There is no single technology that encompasses data science. Different tasks need different technologies, very often several of them. Farther we consider the main tasks facing data scientist in solving problems for business.
Data visualization. Data visualization is a representation of abstract statistical information in an understandable and informative visual format (charts, graphs, heat maps, etc.). Data scientists often use visualization at all stages of analysis, because visualization of complex algorithms outputs is usually easier to monitor and interpret than numerical outputs. The insights and results are often worthless if they can’t be presented in an understandable, comprehensive way for decision makers, workers, or users.
Statistical analysis. Few decades ago, data science has basically the same meaning as statistical analysis, namely, the process of generating statistics from stored data and analyzing the results. Using statistical analysis in the pipeline of data science, you can gain deeper insights into data structure and find optimal techniques to get more information from it.
Unsupervised learning. Unsupervised learning is a type of Machine Learning algorithm that infers patterns from dataset without reference to labeled outcomes. This algorithm is used for discovering the underlying structure of the data. One of the most used applications in supervised learning is clustering. Clustering is an identifying similar groups (clusters) of data in a dataset. It is used in various domains, including Image analysis, bioinformatics, anomaly detection, and data compression.
In marketing, clustering is widely used for market search. Market researchers use surveys, test panels, and cluster analysis to partition consumers into market segments. The analysis enables to work better on target areas for each segment customer.
Supervised learning. Unlike to unsupervised learning, supervised learning is a type of Machine Learning technique that makes predictions based on some already tagged data. Many companies use supervised learning-based models in their activity to provide better customer service, to sell more products and services to customers, to manage risk from fraudulent activity, to better plan the use of their human resources, etc.
There are two examples of the supervised learning usage below.
- Banking companies use big data methodologies for predictive fraud propensity models and use those to create alerts that help ensure timely responses when unusual data is recognized.
- Client’s behavior can be used for a better management. For example, you can define how many people do you need to put on staff at any time period to improve customer service. Some public hospitals in Paris using data to predict the daily and hourly number of patients at each hospital.
Times series forecasting. Time series forecasting is a Machine Learning technique that is used to predict future values based on previously observed values. It is widely used in finance, in supply chain management, in production and inventory planning.
Optimization. Companies use optimization to reduce decision-making risk in budget usage, setting prices, managing a financial portfolio, etc. Optimization, as a prescriptive analytics technique, combines historical data, business rules, constraints and desired outcomes to find the best decisions.
****Natural Language Processing (NLP). Any computation or manipulation of natural language to get some insights about words meaning, construction of sentences is called natural language processing. NLP is focused on enabling computers to understand and process human (natural) languages. Today’s challenges in NLP are to create spoken dialogue systems and speech-to-speech translation engines, mine social media for information about health or finance, identify sentiment and emotion toward products and services. Various personal virtual AI assistants (Nina, Siri, Alexa, etc.) answer basic questions, search for information, can execute some commands. Although there are a lot of imperfections in such systems, virtual assistants already can reduce calls to contact centers, and other human assistants needs to 50%.
Image and speech recognition. Image analysis is the extraction of meaningful information from images (not only digital images) by means of digital image processing techniques which use AI. There are plenty of applications of the image analysis from reading bar coded tags to identify a person from their face. This analysis is in high demand in different fields. Here we will mention only a few of them.
Searching system of Google provides the option to search for images by uploading them. It uses image recognition and provides related search results. In some airports the self-service bag checking machines are testing. They use face recognition technology to confirm traveler’s identity by matching their faces with passport photos for the luggage delivery.
Another example for the image analysis application is using AI and deep learning in self-drive cars. Such software detects if the driver is in the vehicle and who exactly is in the car (husband, wife, young adult child) and can automatically adjust the seat, mirrors, and temperature to suit the individual. It can help a driver to watch the road and keep an eye on the driver, as well.
In medicine and healthcare, machine learning methods, content based medical image indexing and wavelet analysis for solid texture classification are using to detect tumors, artery stenosis, organ delineation, etc. The deep-learning based algorithms increase the diagnostic accuracy by learning from the previous examples and then suggest better treatment solutions.
Speech recognition technologies enable the recognition and translation of spoken language into text by computers. Using speech recognition, you can type a message, for example. Some banking institutions use consumer voice data to authorize a user to access their financial information.