Keywording images and other media is mainly about what objects are on them. Until a few years ago, that meant tedious typing and monotonous work. Fortunately, those days are over. If it is not just image material with special subjects, imaged objects are recognized with a high degree of reliability. These are, for example, everyday objects such as a chair, a drill or a bicycle. So the generation of automatic keywords already works very well in practice. Especially in the stock photo industry, the technology has already proven itself for several years.
Technical hurdles
If you need an exact determination of make, brand or type, individual training procedures will of course be necessary here as well.
When recognizing biological species (e.g., insect species), machine vision often reaches its limits because the distinguishing features can be very nuanced and subtle.
Also worth mentioning is the phenomenon that completely different objects can have strong visual similarities. So well known that you can talk about a meme, became in this context the collage chihuahua or muffin (by Karen Zack). You can find it here: karenzack.com/work/recognition-series.
Similarities of this kind present challenges for object recognition routines that can certainly be overcome. Deep learning methods can also be used to recognize stylistic devices used. It remains questionable, however, to what extent image concepts and other levels of meaning can be captured by machine. So it will probably be a few years before computers can recognize and output image statements with a high degree of reliability.