How To Recognize AI-Generated Text And Why It's Important

Recent reports suggest that vast portions of the internet are AI-generated 'slime' or low-quality content that may or may not be accurate. Is this a momentary glitch in the matrix or the new normal? Why does it matter?

ChatGPT is the fastest-growing consumer internet app, amassing an estimated 100 million users just two months after its launch in November 2022.

While ChatGPT can do more than generate text, that is the feature we will focus on here. It allows users to effortlessly generate (mostly coherent) text based on a prompt. The prompt could be as simple as "Write me an article about the dangers of Generative AI."

You can use ChatGPT for anything requiring writing, from social media posts, blogs, and eCommerce listings to eLearning content, whole books, and even writing code. Another widespread use of this technology is translating between languages.

Strong opinions about generative AI technology are now in full swing, given that it is actively disrupting well-established industries. In pursuit of a deeper understanding of this technology, holding off on making rapid judgments is essential.

At BlueOx, we focus on digital security, so let's talk about how this technology may impact you and what you need to know in that regard.

AI-generated Text Is Already Everywhere

A recent Amazon Web Services AI Lab research study indicates that over half (57.1%) of web text may have been translated into two or more languages. The idea here is that AI is likely being used to translate massive amounts of English content into clickbait in other languages. While the study hasn't been peer-reviewed, if we extrapolate a little, we can see where the phrase "Internet Slime" comes into play.

While ChatGPT3, ChatGPT4 (the latest), and other Large Language Models (LLMS) like them can produce extremely high-quality work, they can also confidently produce factually incorrect text. This problem has been dubbed hallucination.

The problem here is two-fold: if an LLM is used to generate the base text (let's say English articles, blogs, etc), and it's not checked for accuracy, and then it's poorly translated to other languages, we now have an increasing amount of the internet (both English and non-English) that is not accurate ("Internet Slime").

This problem isn't new, of course. Humans have been 'generating' factually inaccurate text without issue long before Generative AI. However, AI will likely worsen things in the short term.

Finding A Signal In The Noise

First and foremost, awareness is crucial. Understanding that generative AI (and LLMs) are generating massive amounts of text on the web is a good start.

Knowing that social media comments may or may not be made by real humans could save you a lot of time and anguish.

Next, it is vital to scrutinize where you're getting your information more closely. While not all AI-generated text is bad, you should be skeptical about AI-generated content for sources like eLearning that are generally expected to deliver factual information.

We're working on a lesson that digs deeper into identifying AI-generated text. You can find that lesson here. In general, here are a couple of telltale signs:

Lack of personal touch - The founders of OpenAI (the makers of ChatGPT) have intentionally created their products NOT to be human-like. While we can't say this will be the case forever, AI-generated text doesn't have a personal/human touch. It won't express feelings.
Lack of changing tones - Humans often change tones when writing. AI will generally stick to a single tone.

If you want to dig deeper, some tools are floating around that attempt to identify AI-generated text. I've linked a couple at the bottom. Ironically, some of these tools offer services to take AI-generated text they have identified and 'humanize it' (thus to avoid detection) for a fee.

We can see the same game of cat and mouse that has been playing out in information security for decades. If you test out any of these tools, please don't share private or sensitive text with them!

Wrapping Up

Generative AI is responsible for a quickly growing amount of text on the web. The ease of generating text makes this an unprecedented scenario for the internet.

Because Generative AI (and LLMs) can produce factually incorrect information, knowing what is happening and how is essential.

If you haven't used an LLM, it's an excellent time to try it. Using an LLM will help you get a feel for what AI-generated text looks and 'sounds' like. There are also tools you can utilize that attempt to classify text and decide if AI generated it.

Before long, we can expect search engines (if they aren't made obsolete by AI :) to try to find out if AI generated the text of a webpage. That could provide some positive incentives for content creators and relief to content consumers, but this is not easy, even for massive companies like Google.

Finding trusted sources of information will be more critical than ever in the coming years as that first big wave of AI-generated content sweeps the internet.

Disclaimer: While a human wrote this article, I used an AI-powered grammar tool to assist me with writing. Beyond spelling and grammar, it also helps with tone, clarity, and other things.

What is and isn't AI-generated text? It is not black and white.

Further Reading

BlueOx Lesson On Detecting Generative AI

Internet AI-Generated Slime

9 Problems With Generative AI in One Chart

How To Detect AI-Generated Content

The Dead Internet Theory Explained

AI Content Detectors:

https://copyleaks.com/ai-content-detector

https://contentatscale.ai/ai-content-detector/