AI Posioning

# Published: 2022-12-13 by h4kor

The age of AI generated “content” (I hate that word!) has come. Stable Diffusion, Dall-E and Midjourney are used to create artworks winning contests. Especially Stable Diffusion, sharing their weights openly, has led to a rapid adoption of AI image tools into workflows and tools used by artists every day. ChatGPT can create high-quality texts which can no longer be distinguished from texts written by humans. Platforms like StackOverflow have felt the need to temporarily ban their usage, until they figure out how to deal with this.

AIs (still) rely on vast amounts of training data. To build improved models, larger and larger datasets are required. Models will have to be retrained periodically to include new concepts in the model. Image generators have to be updated to be adjusted to the ever shifting taste of the humans using them. Text generators have to learn the latest memes to stay relevant.

The explosive popularity of AI generated content will lead to new challenges when building new datasets. I see two ways how the progress on these generative models can be slowed down or even stopped entirely by their own popularity.

Self Poisoning

The datasets used for this a generally created by scraping the required data from the internet. This means everything posted to Reddit, Twitter and the rest of the internet is collected, cleaned and prepared for training AIs. The cleaning part will become harder and harder with each new iteration of generative models, as you want to avoid content created by the AI in your training set.

An AI system trained on data created by the system itself will not become better, as the information in the data is already known by the system. If the data used are valid examples, this might just waste computation time while training data model. But any poor examples will degrade the performance of the resulting model.

If a considerable portion of the training data comprises examples created by a previous version of a generative model, the next model will learn to recreate the style and errors of the previous model.

Adversarial Poisoning

A lot of artists hate generative models, especially for their ability to mimic the style of individual artists. This has led some to hide their artworks, because they don’t want to provide even more data to these systems. If the concerns of artists are ignored, some might try to sabotage the systems which steal their art style.

A common technique for using Stable Diffusion is to give a prompt like “A cat sitting on a bench, by Artist X”. For artists with a large portfolio, this creates results which, on a first glance, could have been created by these artists. As datasets are generated automatically, it might be possible to introduce adversarial examples into the training data which destroy such prompts.

Artists might publish decoy “artworks” on their feeds. These decoys would be easily recognisable by humans, but scraping systems would include them in the training sets. If an artist has more decoys than real artworks associated with their name, the AI system will mimic the style of the decoy.

Artworks can be published with surrounding noise. Instead of just publishing an image, the image might be extended by random frames. The descriptions could be extended by additional nonsense descriptions.

All these countermeasures can be circumvented, but this will be expensive. For general models, trained on massive data models, such cleaning measures will most likely be too expensive. However, fine tuning a model only requires a relatively small training set. Cleaning such a dataset for a single artist will be a simple task. The poisoning will only be a minor inconvenience for people creating such specialized models.

Libove Blog

AI Posioning

Self Poisoning

Adversarial Poisoning

Interactions