TU Wien CAIML

Do machines see like we do?

Researchers at TU Wien have investigated how artificial intelligence categorizes images. The results show astonishing similarities to visual systems in nature.

Peyman M. Kiasari, Zahra Babaiee und Radu Grosu (from left to right)
Peyman M. Kiasari, Zahra Babaiee und Radu Grosu (from left to right)

How do you teach a machine to recognize objects in images? Huge progress has been made in this area in recent years. With the help of neural networks, for example, images of animals can be assigned to the respective animal species with a very high hit rate. This is achieved by training a neural network with the help of many sample images - the network is adapted step by step so that it ultimately delivers the correct answers as accurately as possible.

However, which structures are formed in the process, which mechanisms develop in the neural network that ultimately lead to the goal, usually remains a mystery. However, a team from the Vienna University of Technology, led by Prof. Radu Grosu, and a team from MIT (USA), led by Prof. Daniela Rus, have now investigated precisely this question - and have come to some astonishing results: Structures are formed in the artificial neural network that bear an astonishing resemblance to structures that occur in the nervous system of animals or humans.

Multiple Layers of Neurons

“We work with so-called convolutional neural networks - these are artificial neural networks that are often used to process image data,” says Zahra Babaiee from the Institute of Computer Engineering at TU Wien. She is the first author of the paper and carried out part of the research work together with Daniela Rus at MIT and the rest together with Peyman M. Kiasari and Radu Grosu at TU Wien.

The design of these networks was inspired by the nerve cell networks in our eyes and brain. There, visual impressions are processed by several layers of neurons. Certain neurons become active - for example because they are activated by light signals in the eye - and transmit signals to neurons in the underlying layer.

In artificial neural networks, this principle is digitally imitated on the computer: the desired input - for example a digital image - is passed pixel by pixel to the first layer of artificial neural networks. The activity of the neurons in this first layer simply depends on whether they are presented with a lighter or darker pixel. These activity values of the neurons in the first layer are then used to determine the activity of the neurons in the next layer: each of the neurons in the subsequent layer links the signals from the first layer according to a very specific individual pattern (you could also say: according to a very specific formula), and this value is then used to determine the activity of the neuron in the next layer.

Astonishing Similarity to Biological Neural Networks

“In convolutional neural networks, not all neurons in one layer play a role for every neuron in the subsequent layer,” explains Zahra Babaiee. “Even in the brain, not every neuron of a layer is connected to all neurons of the previous layer without exception, but only to the neighboring neurons in a very specific area.”

In convolutional neural networks, so-called “filters” are therefore used to decide which neurons have an influence on a specific subsequent neuron and which do not. These filters are not predetermined, but arise automatically during the training of the neural network. “While the network is being trained with many thousands of images, these filters and other parameters are constantly being adjusted. The algorithm tries out which weighting of the neurons from the previous layer leads to the best result until the images are assigned to the correct category with the highest possible reliability,” says Zahra Babaiee. “The algorithm does this automatically, we have no direct influence on it.”

However, at the end of the training, you can analyze which filters have developed in this way. And this reveals interesting patterns: the filters do not take on completely random forms, but fall into several simple categories. “Sometimes the filters develop in such a way that one neuron is particularly strongly influenced by the neuron directly in front of it and hardly at all by others,” says Zahra Babaiee. Other filters look cross-shaped, or they show two opposite areas - one whose neurons have a strong positive influence on the neuron in the next layer, and another whose neurons have a strong negative influence on the neuron in the next layer.

“The amazing thing is that exactly these patterns have already been observed in biological nervous systems, for example in monkeys or cats,” says Zahra Babaiee. In humans, the processing of visual data is likely to work in the same way. It is probably no coincidence that evolution has produced the same filter functions that arise in an automated machine learning process. “If we know that precisely these structures are formed again and again during visual learning, then we can already take this into account in the training process and develop machine learning algorithms that achieve the desired result much faster than before,” hopes Zahra Babaiee.

Original Publication

The research work was presented at ICLR 2024 in May 2024.

German Version of the Article

The German version of this article was published on the website of TU Wien.