Computer Vision
Computer Vision (CV) is one of Artificial Intelligence’s cutting-edge topics. The goal of CV is to extract information from digital images or videos. This information may relate to camera position, object detection and recognition, as well as grouping and searching image content. In practice, the extraction of information is a big challenge, which requires a combination of programming, modeling, and mathematics, in order to be completed successfully.
Interest in Computer Vision began to emerge among scholars in the 60’s. In those days, researchers worked on extracting 3D information from 2D images. While some progress was made in this regard, imperfect computing capacity and small isolated groups caused slow development of the field. The first commercial application using Computer Vision was an optical character recognition program, which emerged in 1974. This program interpreted typed or handwritten text, with the goal of helping the blind or visually impaired. Thanks to growing computing power and NVIDIA’s parallelizable GPU, significant progress was achieved in deep learning and convolutional neural networks (CNN).
How does Computer Vision work?
A machine interprets images or video frames as a series of pixels. Each pixel is characterized by its position on an image and specified value for its color (in 8 bits). So, to a computer, an image is a simple array of numbers. Then, the machine identifies a similar group of colors, and, using color gradient technique, the computer finds edges of different objects. For the next step, algorithms search for lines that meet at an angle and cover the image part with one color. Then, an algorithm must determine correctly the texture of the image and guess the object on the image using its database. The process is similar to solving a puzzle: you search for similar colors, identify edges, lines and textures, and finally you piece all the parts of the image together.
Tasks in Computer Vision
Computer vision forms the basis for a wide array of topics and applications. We have listed a few of these below:
Object classification (or object recognition). In this task, a model is trained to recognize one or several objects or object classes, and is then able to classify object/objects by categories on new images. For example, using convolutional neural networks, a model can be trained to automatically annotate diseases from chest X-rays. Another example mentioned above is the classification of handwritten digits. The success rate of solving such classification tasks depends directly on the images of objects being selected correctly. Here are some popular already-solved image datasets: MNIST and SVHN datasets of photos of digits, CIFAR-10 and CIFAR-100 datasets with 10 and 100 classes respectively.
Object Identification. Object Identification is a subtask of object recognition. While a classification task is a recognition of an object’s basic level category - like ‘dog’ or ‘human’ - identification is a recognition of an object’s subordinate category - like ‘German Shepherd’ or ‘Elvis Presley’. Fingerprint identification used in cyber security and fraud detection, identification of a specific vehicle, and bacteria identification in 3D microscopy data, are three typical examples of object identification.
Computer Vision
Object Detection. This is also a subtask of object recognition. In an object detection task, the model is trained to recognize one or several objects in a single image. These models create bounding boxes and label objects inside of them. For example, in face detection, the model must detect all faces in a given photo with their bounding boxes. Driverless cars also need to solve detection tasks constantly and instantly, in order to react if a car or person suddenly moves into the path of the car being driven. From vehicle detection, to pedestrian counting, and security systems, the list of object detection applications seems to be endless.
Semantic segmentation (or object segmentation). In segmentation, as a kind of object detection task, the image should be divided into grouped pixels that are labeled and classified. Simply, the model gives an answer as to which pixels belong to the object in the image. One of the most known examples of object segmentation is ‘portrait mode’, which is available in cell phone cameras. The model recognizes pixels that belong to the object on the foreground and blur the background. Semantic segmentation is used in bio medical image diagnosis, precision agriculture, etc.
Motion analysis. In motion analysis we are dealing with a sequence of images or videos. There are several tasks in motion analysis. Object tracking is a process of following the movements of a specific object (or objects) in a video scene. The main challenge in this task is to associate this target object in consecutive video frames.
3 Computer Vision
Conclusion
The task solution has a variety of uses in autonomous driving systems, compression, augmented reality, traffic control, to mention a few examples. Egomotion is the task of determining the 3D rigid motion of a camera within the environment. Such a task arises in autonomous robot navigation. Autonomous robots need accurate and real-time estimation of their position to enable the robot's perception and control tasks.