logo BLUE BI business intelligence & analytics

Deep Learning for image recognition



Deep Learning is an extremely powerful technology that has developed over the past decade. It is a branch of Machine Learning and Artificial Intelligence where data is analyzed in a hierarchical sequence, starting from simpler characteristics and patterns, to obtain a complete analysis. Thanks to this feature, Deep Learning models such as Artificial Neural Networks are particularly suitable for image processing.

Why Deep Learning is useful

Why do we want to use Artificial Intelligence to classify images? In addition to making our work easier, automating the most boring parts, we have now reached a point where AI has surpassed even human capabilities.


The graph above was produced by an important industry magazine and shows us how for some years now the human baseline of 94.9% has been abundantly exceeded. The analysis was carried out on the imagenet dataset, a Google dataset containing several million images belonging to thousands of classes. This obviously does not mean that we can eliminate humans from the cycle, but in the future their intervention will be less and less necessary and supervision will be enough to control and correct the rare cases of machine error.

Deep Learning vs Machine Learning

But why we talk about Deep Learning and how it differs from Machine Learning? The term deep arises from the structure of the models themselves, which are formed by several layers in succession and therefore are larger or “deeper” than those of the classic Machine Learning.


These models are divided into two main sections: the first is what is called feature extraction, where you try to extrapolate the important information, which will then be used in the second to define the content (classification). One of the main differences from the classic machine learning is that data should not be prepared by hand by a person; this drastically reduces the problems related to bias or human error. The feature extraction part is the one that deals with this processing.

But how does a model process complex data like images? Let me introduce you to the secrets behind deep learning.

The mathematics of Deep Learning

Deep Learning mainly uses two techniques that are inspired by the studies done on the visual cortex and try to simulate what happens in the human brain. In fact, when we look at an image or something in real life we analyze, unconsciously or not, more and more detailed details that allow us to understand what the object is or recognize specific people.

In the same way the Convolution and the Pooling try, respectively, to define the importance of the various pixels of the image and select only those considered more relevant.

Convolution applies a NxM matrix (called a filter) to a YxZ matrix, where N, M, Y and Z are the respective sizes. The gif illustrates the operation of the convolution: the matrix of outputha the same dimensions of the matrix of input, every its value is calculated from the sum of the multiplications between the kernel(the filter ) and the area NxM of the matrix of input, centered in the defined point origin("the center"). If the center is on one of the sides or angles, the missing values in the input matrix are treated as 0.

By selecting the most important pixels, the Pooling  also allows to reduce the dimensionality of the data, lightening the calculations to be performed and speeding up the analysis.

How do you learn data with Deep Learning?

So far we’ve talked about the general architecture of a Convolutional Neural Network, but it’s not over here. The model at the beginning does not yet have the correct parameters, so if we used it immediately it would make a rather large number of predictions.

To improve performance you need to train the model by providing images and labels, the class assigned to the image, and correcting it when wrong. Training typically takes a few hours to complete, as the same images must be provided several times to improve performance at each iteration. The term that identifies this process is Gradient Descent, because the error is gradually reduced each time.

There is the possibility of reducing the time spent using hardware accelerators such as Graphical Processing Units (GPUs). These devices are able to parallel calculations, thus taking just a fraction of the time that a CPU (Central Processing Unit) takes, but they are also quite expensive.

In short, although deep learning is an extremely useful and powerful technology, it is not easy to use, several specific skills are required, as well as a certain amount of data, time and resources not easy to obtain.

Blue BI image recognition solutions

In addition to great expertise in the field, Blue BI has developed a work pipeline that allows you to recognize the content of images, but is also able to search for products similar to the one analyzed within a database, quickly and efficiently. If you are interested in this technology, you may also be interested in our Next Level Showroom solution!

Table of Contents