Header image: DALL-E and other AI image generators

DALL-E & Co – What are AI image generators and how do I use them?

Reading time: 12 minutes

Table of contents

DALL-E is a computer program based on artificial intelligence (AI) that generates digital images when “stimulated” with appropriate text inputs. Such programs are called ai image generators.

In this article, you will learn how DALL-E works, how you can use AI image generators for your work, what alternatives there are to DALL-E, and how copyright issues play a role in AI images.

What is DALL-E?

The name DALL-E (stylized spelling DALL·E) alludes to the Disney character WALL-E (a small robot) and the Spanish painter Salvador Dalí. If there is a 2 after DALL-E, it means the second version. We always refer to DALL-E 2 in this article.

We have already established that DALL-E is based on artificial intelligence. This is true in principle, but for a broader understanding it requires a closer look. Let’s try to sharpen the terms:

The form of artificial intelligence that makes DALL-E work is based on machine learning. There are several approaches to machine learning. Nowadays, artificial neural networks (KNN) are mostly used. The work with KNNs can also be of different types. In this article, it will suffice to note that DALL-E relies on deep learning methods. In Deep Learning, there are numerous hidden layers between the input and output layers.

Who invented it?

DALL-E was developed by the US company OpenAI, whose backers include Microsoft and Elon Musk. OpenAI is also known for the text generator ChatGPT. Both services, DALL-E and ChatGPT, are based on GPT-3.

A short digression: What is GPT?

GPT stands for Generative Pre-trained Transformer (and the 3 for the version). That sounds very abstract. But it’s not that complicated after all. The term transformer, in the context of machine learning, refers to a method by which a computer can translate one sequence of characters into another sequence of characters.

So, roughly speaking, we are dealing with a language reproduction system that utilises a pre-trained language model. You have to think of the pre-training as a large-scale project. For this purpose, the neural network was fed with a huge pool of text from the Internet. In total, the text inventory used comprised approximately 500 billion words.

How does DALL-E work?

Deep Learning is used in DALL-E to convert a pure text input into an output that consists of an array of pixels and thus represents a digital image. DALL-E can create completely new image compositions in all imaginable styles.

You may now be wondering where DALL-E gets its “inspiration” from. Well – DALL-E not only “knows” most of the texts and image motifs that can be found on the Internet, but has also been trained with 650 million text-image pairs. This refers to images that have a caption or at least appropriate tags.

What goes on in detail when a text input is processed by DALL-E is beyond the scope of this article. Anyone interested in this can read about it here, for example.

For us, the first thing that matters is this: based on text input that is semantically precise (unambiguous), DALL-E can produce results that are both accurate and unique.

How can I use DALL-E?

To use DALL-E, you must create an account through the openai.com website. You will receive a free starting quota of 50 credits. A set of 4 images can be generated with each credit. Every month you get 15 more credits. If these are not sufficient, they can be rebooked (minimum 115 credits, cost: 15 USD).

In the next section, we move on to practice and generate our first own DALL-Es.

Our first time with DALL-E

In our first example (honestly not the first attempt) the input was: “sliced air-dried sausage with bread and butter, photo”.
And exactly such an image was created: an image that looks like a photo of a sliced air-dried sausage with bread and butter, although in northern Hesse, of course, one speaks of ahle Wurst (meaning: old sausage). But DALL-E is not that far yet. So far, English is the language that leads to the best results.

Deceptively real, only the knife is a bit off.

As mentioned, DALL-E always generates four images for each text input. As a user, you can then select the best result and, if necessary, have further variants created on this basis.

How to awaken DALL-E’s superpower

Replicating an image motif that already exists in a similar form is impressive in itself. DALL-E’s real superpower, however, lies in creating motifs and compositions for which there is no exact template. To make the best use of this superpower, three conditions must be met:

    1. The text input should be as precise as possible (avoid vagueness and ambiguity).
    2. The text input should be creative or imaginative (e.g., combine things that do not usually occur together).
    3. It should always be specified in which medium or technique the image is to be created. For example: Photo, impressionistic painting, 3D rendering, etc. This specification is ideally made at the end of the text input and is separated by a comma.

If these conditions are met, DALL-E develops images that positively surprise and not infrequently look as if a professional illustrator had been involved.

Example 2

In the next example, the input was: “copper statue of hercules drinking beer, digital art”.

The colorful squares at the bottom right of the image act as DALL-E branding.

The generated image impressively proves what “creative” skills DALL-E has. It does not show a naked Hercules (landmark of the city of Kassel), but that was not specified. When you see the result, you could almost think that DALL-E is developing an idea or concept before the output image is generated. In fact, of course, they are pure calculations without a spark of wit.

Each picture is unique

It should be mentioned the fact that DALL-E always generates new images if the text input remains unchanged. These are similar to the previous images, but not identical to them. This is due to the fact that a new seed key is used for each run, which serves as a basis for the generation of further (pseudo-random) numbers. Only then, if the initial value would not be exchanged (and the AI model would remain unchanged), DALL-E could generate identical images. However, this option is currently not available in either the end-user or developer environment, but could become helpful when it comes to generating the same motif in higher resolution. As of today, the resolution of DALL-E images is limited to 1024 pixels.

DALL-E and copyright

You now know the possibilities offered by AI image generators like DALL-E. But how does this technology affect copyright? Are images created with DALL-E usable without restriction?

DALL-E itself cannot acquire copyrights, because such are provided only for humans. Moreover, we have already noticed that DALL-E always generates unique images. This also speaks for the possibility of an unrestricted use, because a completely new image can hardly violate copyrights or trademarks, or can it?

You will have guessed it already: Unfortunately, rights can still be violated. If, for example, a Disney cartoon character or a Pepsi can or the style of a living artist appears in the text specification, then DALL-E can generate a corresponding image; after all, the underlying AI model “knows” almost all image motifs and thus also protected content. However, the AI does not yet “know” which content is protected and which is not, so it may generate images that are legally problematic. You should therefore always check whether the rights of third parties could be infringed. Of course, an AI-based testing method is also conceivable. As of today, however, this is still a long way off.

Problematic training data

Another legal dimension arises when one asks whether it was even permissible to train the AI models underlying DALL-E and other AI image generators with copyrighted material. Due to the topicality of the issue, there are no court rulings on this yet. But a movement against the unsolicited use of images available on the Internet has already been launched. A dataset of 5.8 billion image links can be searched via the Have I Been Trained? page. If you find your own works there, you can have the image links removed from the data set after registration. However, it is questionable whether the actors who conduct AI training are even taking this into consideration. In any case, the aforementioned website indicates that of the larger market participants, only Stable Diffusion has committed to this so far.

The bottom line is that not every work created by DALL-E is unproblematic simply because it is unique, and some fundamental legal uncertainties surrounding AI training data have yet to be resolved by the courts here.

Alternative AI image generators to DALL-E

As mentioned earlier, there are other AI image generators on the market besides DALL-E. We list the three most important ones here.

Stable diffusion

With Stable Diffusion, the best-known alternative product has already been mentioned. Unlike DALL-E, Stable Diffusion is an open source project. It is being driven by scientists and experts from the University of Munich (LMU), the London-based start-up Stability.ai and the German non-profit organization LAION, among others. In the course of developing Stable Diffusion, there is an open approach to what training data is used to train the models.

Craiyon

Another popular generator is Craiyon. It was formerly called DALL-E mini, but had to be renamed due to pressure from OpenAI. Craiyon was originally based on the model of DALL-E 1, but has since been further trained using unfiltered data from the Internet. Craiyon can only be used free of charge to a very limited extent; if you don’t want to have watermarks in the image, for example, you have to take out a paid subscription.

Midjourney

The third and last alternative to DALL-E is Midjourney. Midjourney is an AI image generator being developed by a research lab of the same name. Even though this sounds like science, it is a commercial project. It is believed that the underlying technology is more or less based on Stable Diffusion. Midjourney obtains new training data via the Discord chat platform, which is particularly popular with gamers and developers. Outside of Discord, Midjourney cannot be used yet. This image generator is therefore aimed specifically at a particularly IT-savvy community.

A look into the future: What are the consequences of using AI image generators?

AI image generators will greatly change the media world. Already today, many bloggers and operators of smaller news portals rely on image material generated by DALL-E and Co. Why? Because you save on licensing fees for stock photos, but still get customized content.

The misuse of image generators is as obvious as it is consequential. As output quality improves, the production of fake news becomes even easier. So in the future, you’ll always have to ask yourself “is this a real photo? Or has an AI been involved here?”

Of course, some professions will also change or become less in demand because of the new technology. Who would book an illustrator when DALL-E does it almost as well, and at lightning speed, and a hundred times cheaper? So in the future, the illustrator’s job could be to improve or finalize AI designs.

Incidentally, there are already calls for applications for a whole new profession: that of “AI whisperer”. Because that, too, needs to be learned: stimulating the AI model in such a way that it delivers good results.

Conclusion

DALL-E and its siblings are still children, but they already give us an inkling that they will bring great changes. In the future, more images will probably be generated by AI image generators than by cameras. The Internet will be full of computer creations and the distinction between real and fake will be almost impossible. It is therefore important to get to grips with this technology at an early stage.

More DALL-E creations

In this gallery we show more images that we created in the course of article research with DALL-E.

Manage your DALL-Es in an image database

The teamnext | Media Hub is a cloud-based software for the management of media files, which is used via the browser. All common image formats are supported, of course also the PNG format in which DALL-E images are output. If you want to share your DALL-Es safely or convert them to another image format, then teamnext | Media Hub is the ideal solution.

In addition, teamnext places a particularly high value on the data protection-compliant storage of all media files. We do not use any third-party tools and process your data exclusively on servers in European data centers: highly secured and DSGVO-compliant.

If we have made you curious and you just want to try out the various functions of our image management, you can get started right away with a free 14-day trial for teamnext | Media Hub. Additionally, you are welcome to book an appointment for a free online product demo with one of our experts. Simply use our contact form for this purpose.

You might also be interested in

Photographing a volcano with the iPhone - Symbol image HEIF / HEIC format
image recognition: face recognition and identification - symbolic image
Young photographer tags photos - symbolic image tagging and image management