Try the teamnext | Media Hub now for free
Try the Media Hub now
Test 14 days without obligation
Test phase ends automatically
YOU CAN REACH US AT
teamnext GmbH & Co. KG
34117 Kassel, Germany
+49 561 473 95 98 – 0
Table of contents
Image recognition is a skill that is increasingly being mastered by computers thanks to artificial intelligence (AI). Be it faces, objects or symbols – today, machine vision is used almost everywhere where optical information has to be captured. It is also advisable for end users to look into the technology, after all, it can be useful in so many areas. Think, for example, of Google Image Search, unlocking your smartphone with Face ID, managing photos on your PC, apps for determining plant photos or future autonomous driving. As you can see, computer-aided image recognition already affects a wide variety of areas of life. But before we look at the status quo of this versatile technology, it’s worth taking a brief trip down memory lane. After all, it took many masterminds to achieve today’s performance.
The great benefits of machine character and pattern recognition were recognized over a hundred years ago. As early as the 1910s, two such machines were devised: the optophone, which could convert printed letters into sounds, and Hyman Eli Goldberg’s “controller”, which also read printed text and translated it into teletype code.
In 1931, Emanuel Goldberg introduced a machine in Dresden that could search for metadata on microfilm rolls using light measurement and pattern recognition. In 1949, there were first experiments around barcode technology (Bernard Silver together with Norman Joseph Woodland) – and in the 1970s, optical character recognition (OCR) was taken to a new level (mainly by Ray Kurzweil). Letters and numbers could now be reliably recognized even with changing fonts.
At that time, of course, it was “only” about recognizing signs and simple patterns. Even OCR software cannot yet speak of image recognition. But the basic principle is similar: scans of printed text are segmented (one segment per character), the pixel patterns within the segments are matched against known patterns in a database, and if there is a match due to high similarity, a value is set that corresponds to a specific letter or punctuation mark.
A digital photo is technically also just a set of pixels, but the task of pattern matching is of course infinitely more complex. What are 26 letters compared to the diversity of an entire world? Nevertheless, the technology is already so far developed today that it has an impact on almost all areas in which the processing of optical information plays a role.
The best-known application is probably facial recognition. Here, it’s not just that an algorithm recognizes where human faces can be seen in images. That alone would be comparatively trivial. In most use cases, the individual biometric characteristics of found faces are also captured. If there are high matches, they are probably images of the same person. Matching with biometric databases also enables the precise identification of persons depicted.
Human faces have unique features. So it stands to reason that facial recognition technology will be used when it comes to identification, e.g. at:
The technology continues to reach its limits with faces of identical twins, but since this only affects about 0.3 percent of the world’s population, this issue is neglected in most applications.
In 2008, Google introduced a face recognition routine as a new feature of Picasa. Many will still know Picasa. Picasa was a free image management software and the predecessor of Google Photos. The author of this text was deeply impressed by the new possibilities at the time. It took a while until a stock of several thousand photos was processed, but after that you could reliably search for pictures of specific people and create corresponding albums, collages or videos in no time at all.
Today, it is hard to imagine image management without face recognition. Professional solutions in the field of digital asset management* in particular have been relying on such functions for a long time. After all, they make it much easier to find the people depicted and significantly speed up workflows.
* Digital asset management (DAM) is the technical term for the professional management of images and other media files.
A normal reverse search with an image is familiar to many from Google, Bing and Co. You upload an image file or insert a URL and versions in a different resolution, similar images and relevant search terms are displayed. The face search goes one step further: Here, you upload a frontal portrait photo in order to have other photos of the person depicted displayed as search hits. For example, if you use a photo of Kurt Cobain, thousands of hits are obtained, because the fan community is large and the web is full of pictures of the musician.
But it also works for non-celebrities. Once a few photos of a person have spread across the Internet (e.g. via corporate websites or social media platforms), they are indexed by services such as PimEyes and made available for face search. For investigative purposes, of course, these are powerful tools. However, because they tempt illegitimate use (e.g. by stalkers), they are controversial.
Security authorities have, of course, recognized the potential of this technology from the very beginning. The U.S. domestic intelligence agency FBI, for example, holds biometric facial data on some 117 million U.S. citizens. For this purpose, all available driver’s license photos were digitized and evaluated using intelligent algorithms. Many wanted persons can thus be identified quickly and reliably from photos and videos. This helps in both preventing and solving crimes and makes the world a bit safer, especially in terms of counterterrorism. Of course, the technology can also be misused for illegitimate surveillance. The next item is dedicated to this topic.
Back in 2010, Facebook began automatically tagging users of the platform on uploaded photos; a practice that was criticized from the start. Additional it turned out that the practical added value was low from the user’s point of view. In 2021, the function was discontinued worldwide, certainly also due to legal pressure.
In the European Union, there are data protection laws that prohibit the unprovoked use of facial recognition technology in public spaces (e.g., the analysis of live images from public surveillance cameras). However, it is likely that not all players will comply. Intelligence services, as we know, do what is technically possible.
In China, by the way, surveillance using facial recognition software is already something commonplace. So the people there know where they stand – although that may be of little comfort to many.
Strictly speaking, faces are of course also objects – and face recognition is therefore a subarea of object recognition. Hopefully you will forgive us for making this division anyway.
For autonomous driving, the detection of objects is of course essential. After all, the driving system must not only recognize lanes, but also interpret light signals (traffic lights) and traffic signs and reliably detect which objects are in the vicinity of the vehicle. From a technical point of view, we are moving into the field of computer vision.
In autonomous driving, data processing must of course take place in the millisecond range, because even a slight delay in initiating braking or evasive maneuvers can have fatal consequences. For obvious reasons, one particularly important area of image recognition in such driving systems is pedestrian detection. However, objects do not only have to be classified correctly; it is also a matter of determining spatial positions quickly and accurately (in addition to cameras, ultrasound, radar and lidar sensors also help here). In addition, the system must be able to estimate well in which direction other road users will move.
The highly complex skills required here as a whole fall largely into the area of artificial intelligence (AI). Of course, you can’t get by with classical programming. Today, computer-aided image recognition relies largely on machine learning using artificial neural networks. One method that is frequently used in this process is called Deep Learning.
In Phoenix, Arizona, Waymo One, a fully autonomous cab service, is already approved and in daily use. Waymo is backed by Google’s Alphabet group. The next step is to launch the service in San Francisco. Once the technology has proven itself there, it will conquer other major cities in the next few years, that much is certain.
While fully autonomous driving is still a dream of the future in Germany, many will already be familiar with the next application type from everyday life (especially Android users). We are talking about searching by photo to find out what the photo shows. The best-known all-round solution comes from the Alphabet group and is called Google Lens. Here, too, the reliability is gradually improved by machine learning. For special topics, however, it is better to use an app that has been developed and trained precisely for the desired topic area. When it comes to identifying plant photos, for example, apps such as Flora Incognita, PlantNet or PictureThis are a good choice. The technology is also used for article and product searches by photo after appropriate training.
Keywording images and other media is mainly about what objects are on them. Until a few years ago, that meant tedious typing and monotonous work. Fortunately, those days are over. If it is not just image material with special subjects, imaged objects are recognized with a high degree of reliability. These are, for example, everyday objects such as a chair, a drill or a bicycle. So the generation of automatic keywords already works very well in practice. Especially in the stock photo industry, the technology has already proven itself for several years.
If you need an exact determination of make, brand or type, individual training procedures will of course be necessary here as well.
When recognizing biological species (e.g., insect species), machine vision often reaches its limits because the distinguishing features can be very nuanced and subtle.
Also worth mentioning is the phenomenon that completely different objects can have strong visual similarities. So well known that you can talk about a meme, in this context became the example chihuahua or muffin.
Similarities of this kind present challenges for object recognition routines that can certainly be overcome. Deep learning methods can also be used to recognize stylistic devices used. It remains questionable, however, to what extent image concepts and other levels of meaning can be captured by machine. So it will probably be a few years before computers can recognize and output image statements with a high degree of reliability.
In the medical field, object recognition methods can be used to improve the collection of diagnostic data. For this purpose, X-ray images and CT scans, for example, are analyzed automatically. This ensures that even very minor abnormalities are captured; details that might have escaped the physician’s eye. In addition, such systems work with millions of comparative data, so they can be an important diagnostic adjunct even for experienced physicians. AI-based image recognition is particularly promising for the early detection of various cancers. There are numerous approaches that are currently being clinically tested.
Local Amazon Go stores are particularly progressive. In the now 42 stores, new standards have been set in terms of image recognition. The technology there is so advanced that customers can shop without the need for a visible checkout process at the end. As is common in supermarkets, goods with and without barcodes (fresh produce) can be taken at will from the shelves and displays and placed in shopping baskets or carts. Afterwards, customers can leave the store without any further action. This is realized with hundreds of cameras and state-of-the-art object and face recognition. This way all products can be assigned to the right person and all persons to the corresponding customer account. Welcome to the brave new world.
There are now so many areas in which object recognition has been successfully applied that only a small selection could be reproduced here. Perhaps it should at least be mentioned that the weapons industry is often the first to use new technologies. The same applies here. Consider, for example, the development of autonomous drones or robots, which of course rely on image recognition. Quality control in the manufacturing industry would also be an area where object recognition has been used for many years, for example in the automatic inspection of components or sensitive foodstuffs such as eggs.
A whole new area of application has emerged in the insurance industry. Here, intelligent object recognition routines are used for the automatic evaluation of damage patterns. As a result, claims can be processed more quickly and repair costs can be forecast more accurately.
And last but not least, AI image generators like DALL-E also use technology that comes from the field of machine vision.
If you want to manage larger volumes of images and apply image recognition to your own photo collections, then you need a professional solution that is technically up-to-date and at the same time complies with local data protection regulations. Think of services like automatic face recognition. This is sensitive data that should not fall into the wrong hands. With our solution, the teamnext | Media Hub, all software modules were developed in-house in Germany. Biometric similarity vectors never leave their sphere of action with us and are hosted exclusively on servers within the European Union.
In addition, our solution is flexibly trainable; both in the area of face recognition and in the area of object recognition. For the recognition of certain persons already two pictures are sufficient as training material. To be able to detect special products or individual logos, a little more material is needed, but after completing the training process, the corresponding objects are detected with very high reliability.
If you would like to get to know our solution, then you can start a free 14-day trial here. Additionally, you can book an appointment for an online product demo with one of our experts at any time. Please use our contact form for this purpose.