Natural language is nothing other than human language, both in spoken and written form. Fully developed sign languages are also included. However, only the written form is relevant for us. Of course, words can also be spoken and gestures recorded, but it would amount to the same thing, because the information always has to be translated into binary-coded characters for machine speech processing.
In practice, visual searches can simply use words, word combinations, sentences or half sentences to find images. There are no special rules to be observed beyond everyday language usage. You are therefore extremely flexible when formulating a search query. This can also be very specific and could look like this, for example:
Photo of an elderly man with sunhat sitting in a rowing boat and fishing
If no hits are achieved, less important search criteria should be gradually removed. Example:
An elderly man sits in a boat and fishes
Etc., whereby the rules for upper and lower case are not relevant. The same applies to the position of sentence elements (as long as the meaning of the sentence is retained). The sentences A man fishing at the lake and At the lake a man is fishing should therefore lead to the same search result.
AI visual search also works with less common languages (but not always with the same precision). Implementation is already possible for over a hundred languages – from Afrikaans to Zulu.