In 1966, Seymour Papert and Marvin Minsky, launched the Summer Vision Project, a scientific experiment to try and get a computer to identify objects and patterns within an image.
To accomplish the task, the computer had to correlate pixels with the object in order to identify it. For us, it is easy to utilize our cognitive abilities and identify objects or patterns within a particular frame as well recall smells, sounds, touch feels, and the sensory attachment to that particular scenario. But for a computer, that is a more mathematical task.
Computer Vision (CV) has drastically evolved during the years, from simple pattern identification to face recognition and autonomous vehicles (AV). This is one of the key areas of expertise within the field of intelligent systems (AI, ML, DL), and hence, an ever-growing arena that needs your attention.
Here are 5 reasons why you should learn CV:
1. Images are everywhere
Lots of data available. Data is the new oil. For example, your cell phone’s camera constitutes an inexhaustible source of content to exploit and develop applications that range from the avant-garde, innovative, to absurd, entertaining, and fun.
The huge volume of images that a platform like Snapchat generates every hour, would require a human being to spend 10 years of his life to review them all. And that is just the case of one of the countless platforms, social networks, and applications that abound in the market.
Instagram is the first one that comes to mind, but there is also YouTube, the second largest search engine on the Internet, owned by Google.
Likewise, platforms whose initial focus was on the text and other forms of static content, such as Facebook or Twitter, have definitively adopted video without hesitation as the main format for transmitting the information.
Now, with petabytes of visual data generated every day, computer vision is the field of Artificial Intelligence with the greatest potential for growth in the short, medium and long term, since it is impossible to process such a volume of information without the support of neural networks, as well as the developments to come.
2. It is the most influential area of AI
We are walking a path that will change the way we connect, interact, and develop on a daily basis. The impact of ongoing changes will only be comparable to the emergence of the Internet.
It is clear that the main actor in this play is Artificial Intelligence. However, AI is a very broad term, which includes countless sub-fields, each dedicated to exploring and exploiting a problem, a specific niche. Despite this, the most recent advances in the field, especially in deep learning, have been generated in the confines, in the internal forum of computer vision.
It was in 2012 that the term deep learning took the industry by surprise, and this is largely due to the release of AlexNet in that year’s ILSVRC -ImageNet Large Scale Visual Recognition Challenge- where, for the first time, a neural network dominated the scene, taking a 10-point advantage over its closest opponent.
From there on, each year the bar has been raised a little more. The teams participating in this challenge have shown greater audacity, vision, and courage, providing architectures, algorithms, and solutions that are not trivial, but that at the end of the day have nurtured, promoted, and empowered an entire community.
One of the great contributions to humanity from the area of machine vision is transfer learning, a mechanism through which we can take advantage of the knowledge of networks trained in massive data sets such as ImageNet and getting quite satisfactory results with relatively little data.
In short, computer vision has democratized deep learning.
3. Gigantic Potential
How many patients with diabetic retinopathy can a doctor see on the same day? How does a blind individual move?
Humans, as a species, have reached unimaginable heights in terms of life expectancy, health, comfort, distribution of opportunities, and access to resources. It is undeniable that technology has played a leading role in our achievements and, although we have done enough, we have not developed to our full potential yet.
But how? We all start the day with 24 hours. Even the most dedicated successful professional has a life, family, home, and responsibilities to attend to.
The answer does not lie in replacing human labor with AI, or working longer hours, but rather exponentiating the cognitive capacities, the expertise, and the years of hard learning of our experts through computerized vision.
All of the open questions in the first paragraph of this section pose problems that are still unresolved. Let’s take a look at them:
-How many patients with diabetic retinopathy can a doctor see on the same day? Dozens, but supported by a neural network trained in thousands of images, there could be thousands
-How does a blind individual move? With a lot of external help, when it is available, but through computerized vision, a fundamental part of the development of autonomous vehicles, with just your voice you could tell your car where to go, which would be guided by a series of cameras and sensors, mapping, understanding and making decisions based on the changing
4. The ideal mix of art & science
Seldom do we come across a discipline that embraces arts so closely, without neglecting the scientific method. Computer vision belongs to this rare category.
Without going too far, convolutional neural networks are inspired by the biological mechanism that operates in the eyes of most animal species, used to decompose and interpret visual information.
Neural networks are a conceptualization of the tangle of cells that exist in our brain, where our identity lives, our emotions are generated and our actions are planned. This biological inspiration is exploited by a deep knowledge of the laws of probability and statistics, through which we can conjecture about the level of understanding of an algorithm.
However, not everything we do in computer vision comes down to categorizing and finding elements in photos. We can also generate art and even hallucinations.
Not only we can dream; neural networks too. And, at the end of the day, all this is powered by math.
5. CV is a path to self-understanding
Isn’t it beautiful how all this expensive, spacious, and sophisticated technology is unable to compete with the most advanced neural network of all: our brain?
All the power of titans like Google, Amazon, or Microsoft is not enough to solve certain daily tasks that our grey mass, with just 1.5 Kilograms of weight, energized by a minuscule amount of electricity, can perform in a matter of a breath.
The key is in our heads. It was not until the early 1990s, when Yann LeCun introduced his seminal LeNet network, inspired by the visual cortex of animals, to the world that the field of neural networks awoke from his deep sleep, considered a utopia by many at the time due to the large number of resources and lack of practicality.
Thus, the computerized vision as an area of research and application, is a path towards self-discovery, since it is through it that we are forced to better understand how we work, how we operate, what makes us special and how our carbon-based design can be replicated to make the world a better place.