Computer Vision

Computer Vision

Computer Vision (CV) is one of Artificial Intelligence’s cutting-edge topics. The goal of CV is to extract information from digital images or videos. This information may relate to camera position, object detection and recognition, as well as grouping and searching image content. In practice, the extraction of information is a big challenge, which requires a combination of programming, modeling, and mathematics, in order to be completed successfully. Interest in Computer Vision began to emerge among scholars in the 60’s. In those days, researchers worked on extracting 3D information from 2D images. While some progress was made in this regard, imperfect computing capacity and small isolated groups caused slow development of the field. The first commercial application using Computer Vision was an optical character recognition program, which emerged in 1974. This program interpreted typed or handwritten text, with the goal of helping the blind or visually impaired. Thanks to growing computing power and NVIDIA’s parallelizable GPU, significant progress was achieved in deep learning and convolutional neural networks (CNN).

How does Computer Vision work?

A machine interprets images or video frames as a series of pixels. Each pixel is characterized by its position on an image and specified value for its color (in 8 bits). So, to a computer, an image is a simple array of numbers. Then, the machine identifies a similar group of colors, and, using color gradient technique, the computer finds edges of different objects. For the next step, algorithms search for lines that meet at an angle and cover the image part with one color. Then, an algorithm must determine correctly the texture of the image and guess the object on the image using its database. The process is similar to solving a puzzle: you search for similar colors, identify edges, lines and textures, and finally you piece all the parts of the image together.

Tasks in Computer Vision

Computer vision forms the basis for a wide array of topics and applications. We have listed a few of these below:

Object classification (or object recognition). In this task, a model is trained to recognize one or several objects or object classes, and is then able to classify object/objects by categories on new images. For example, using convolutional neural networks, a model can be trained to automatically annotate diseases from chest X-rays. Another example mentioned above is the classification of handwritten digits. The success rate of solving such classification tasks depends directly on the images of objects being selected correctly. Here are some popular already-solved image datasets: MNIST and SVHN datasets of photos of digits, CIFAR-10 and CIFAR-100 datasets with 10 and 100 classes respectively.

Object Identification. Object Identification is a subtask of object recognition. While a classification task is a recognition of an object’s basic level category - like ‘dog’ or ‘human’ - identification is a recognition of an object’s subordinate category - like ‘German Shepherd’ or ‘Elvis Presley’. Fingerprint identification used in cyber security and fraud detection, identification of a specific vehicle, and bacteria identification in 3D microscopy data, are three typical examples of object identification.

Computer Vision

Computer Vision

Object Detection. This is also a subtask of object recognition. In an object detection task, the model is trained to recognize one or several objects in a single image. These models create bounding boxes and label objects inside of them. For example, in face detection, the model must detect all faces in a given photo with their bounding boxes. Driverless cars also need to solve detection tasks constantly and instantly, in order to react if a car or person suddenly moves into the path of the car being driven. From vehicle detection, to pedestrian counting, and security systems, the list of object detection applications seems to be endless.

Semantic segmentation (or object segmentation). In segmentation, as a kind of object detection task, the image should be divided into grouped pixels that are labeled and classified. Simply, the model gives an answer as to which pixels belong to the object in the image. One of the most known examples of object segmentation is ‘portrait mode’, which is available in cell phone cameras. The model recognizes pixels that belong to the object on the foreground and blur the background. Semantic segmentation is used in bio medical image diagnosis, precision agriculture, etc.

Motion analysis. In motion analysis we are dealing with a sequence of images or videos. There are several tasks in motion analysis. Object tracking is a process of following the movements of a specific object (or objects) in a video scene. The main challenge in this task is to associate this target object in consecutive video frames.

3 Computer Vision

3 Computer Vision

Conclusion

The task solution has a variety of uses in autonomous driving systems, compression, augmented reality, traffic control, to mention a few examples. Egomotion is the task of determining the 3D rigid motion of a camera within the environment. Such a task arises in autonomous robot navigation. Autonomous robots need accurate and real-time estimation of their position to enable the robot's perception and control tasks.

Let’s have talk
Let’s have talk

Interesting For You

Emotion Recognition

Emotion Recognition

It is obvious that emotions are peculiar to humans and some social animals, like apes, wolves, crows. Emotion recognition is an important part of the communication between people. The efficiency of humans’ interactions depends on how we can predict the behavior of the other person we are interacting with, and, as a result, adjust or change our behavior. Fear can indicate danger; satisfaction indicates that the conversation is successful. Emotion recognition is not an easy task, as the same emotion may be shown differently by different people. With this being said, most people have no trouble distinguishing basic emotions such as fear, anger, disgust, happiness, or surprise, to list a few examples. The question that arises here is whether we can teach a computer to recognize emotions. Because of the advancements made in recent years, the answer is yes. Automatic emotion recognition is a field of study in AI. It is a process of identifying human emotion by leveraging techniques from multiple areas, such as signal processing, machine learning, computer vision, natural language processing. But before we discuss automatic emotion recognition in detail, it is important to explore why this technology is necessary at all. Well, as we already mentioned above, emotions are a powerful source of information. Different surveys said that verbal components convey one-third of human communication, and nonverbal components convey two-thirds. So, successful human-computer interaction needs this channel of communication.

Read article

Deep Learning Platforms

Deep Learning Platforms

Artificial neural networks (ANN) have become very popular among data scientists in recent years. Despite the fact that ANNs have existed since the 1940s, their current popularity is due to the emergence of algorithms with modern architecture, such as CNNs (Convolutional deep neural networks) and RNNs (Recurrent neural networks). CNNs and RNNs have shown their exceptional superiority over other Machine Learning algorithms in computer vision, speech recognition, acoustic modeling, language modeling, and natural language processing (NLP). Machine Learning algorithms based on ANNs are attributed to Deep Learning.

Read article

Chatbots in NLP

Chatbots in NLP

Chatbots or conversational agents are so widespread that the average person is no longer surprised to encounter them in their daily life. What is remarkable is how quickly chatbots are getting smarter, more responsive, and more useful. Sometimes, you don’t even realize immediately that you are having a conversation with a robot. So, what is a chatbot? Simply put, it is a communication interface which can interpret users’ questions and respond to them. Consequently, it simulates a conversation or interaction with a real person. This technology provides a low-friction, low-barrier method of accessing computational resources.

Read article