A quick rant about language modeling.

(I never thought I would write a title like that)

Everyone is going crazy over the OpenAI thing.

You can go crazy if you like, that’s fine with me, seems to be what all the cool kids are doing these days. I’d point you to Zachary Lipton’s post for how I generally feel about the research aspect of this. The short version is: this is a better version of very standard natural language processing. That’s it. It’s also not going to destroy us all, so rest easy.

But all of that being said, this raises some interesting questions about AI policy. Specifically, it feels like we’re currently in the middle of a moment where a whole lot of people want AI to take over. That sounds very dramatic, so let me unpack it.

Let’s start at the very beginning (a very good place to start). Recall, dear reader, the ELIZA program (click for an online version). ELIZA was an early chatbot built to mimic a very specific sort of psychotherapist. There is a lot written about the history of ELIZA, and it is fascinating. Some of the subjects who interacted with ELIZA formed an emotional attachment to the program. If you interact with ELIZA now you can see very quickly that not only is it a program, but it is one with pretty severe limitations.

Which brings us to the new announcement. This week, OpenAI announced that they had built a language model that was very good. The language model is essentially a better version of stuff that already exists and has existed for a very long time (again, see Zachary’s post). But there is huge concern right now that if this particular bit of research were to get into the wrong hands, bad things would happen. This is silly. But let’s indulge ourselves for a minute.

Imagine that you are a person (shouldn’t be too hard). You are consuming news via your favorite app/dead tree/whatever. (The fact that I bolded news is important). As you read through a story, you may or may not check the byline to determine who wrote the piece. Knowing the author can signal quality or trustworthiness if the author is good or trustworthy. If you don’t know the author, you can at least trust the news source. So, if you finished your article and thought to yourself, “that was really well-written” and then I jumped out from behind you and yelled “Aha! But it was written by an AI!” what would you do? Other than kick me out of your house (justifiable)? You would probably go back and consider the article more closely. You may even say “Hmm, I don’t know if I trust an AI to write news, I should fire up Google dot com and see if this is being reported elsewhere.” But it’s unlikely that you will say “OK!” and be done with it. Why? Because you are a human! You are not a news-ingesting machine that simply takes in news uncritically.

Your new skeptical approach to news written by an AI should really be your default method for reading the news, but that is a post for another time.

Now, new scenario. Imagine that you are a person (I know, I’m really making you work here). You are reading some collection of words that are not news. This could be a novel, a poem, or (importantly) a commentary written about something in the news (I hope we’re all on the same page that these commentaries aren’t news). If I jumped out after you finished reading this and told you that it was written by an AI, what would you do? My argument here is that your answer should be “I don’t know and I don’t really care. I was entertained.” AI-generated entertainment shouldn’t be cause for concern.

If some form of artificial intelligence (in this case, a language model) can create entertainment via text to the point where you can read it and enjoy it, that is a remarkable accomplishment that should be celebrated.

If some form of artificial intelligence can create textual descriptions of news events, that is an even more impressive feat.

If the AI can write what looks like news but has factual errors (even if the structure is good), then the text isn’t news. We already have defenses against this kind of not-news: the ability to do research, find facts, and make informed decisions about things.

What OpenAI is saying with their lack of transparency over this whole thing is that this new language model is dangerous. But I would ask the following: if this model is so good that we don’t trust humans to read its output, has OpenAI just passed the Turing test? Doubtful. There simply is a model that does a better job of matching patterns in writing as determined by a very large corpus of writing.

So what should we do moving forward?

OpenAI should release the model, or change their name
Humans should regain some confidence
You should change the locks in your house so I can’t jump out from behind you anymore
You should spend less time reading stuff on the internet

The last point is most important. Internet companies want you to spend time on the internet so that they can show you ads. They have an incentive to provide content in order to keep you on their pages. They unfortunately do not currently have an incentive to provide you with facts. The more time you spend on the internet, the more content you see, the more ads you see, and the more money these companies get. So, if they need more and more content, and fancy AI systems are writing top notch content now, then they will use said systems to reduce the cost of content.

It’s not clear if there will be an increased cost associated with the potentially-less-than-totally-true nature of the AI content. But looking at fake news now, there isn’t much cost associated with putting fake news online, so I don’t see there being big costs to putting AI-fake-news online either.

So if you want to prevent AI from taking over the world of online publishing, then don’t read as much entertainment online. Or if you are reading something online and you have your doubts as to whether it is true or not, do some digging! Find out for yourself.