Only a few months ago ChatGPT was launched and it changed many people’s perception of what AI can do. That was based on GPT-3.5 from OpenAI, which was also integrated into Microsoft’s Bing and Skype, Edge too. Now the company has confirmed that it has switched over to the new and more powerful GPT-4 model.
In fact, it did so a while ago – if you’re part of the Bing Preview then you have been using GPT-4 for the last five weeks (you can sign up for the preview here). This isn’t the plain GPT-4, by the way, but a version that has been customized by Microsoft for search.
So, what’s new in GPT-4? For starters, it is a “multimodal” model, which is fancy way of saying that you can attach images to your query, not just text. Here is an example of GPT-4 explaining a joke found on Reddit. Note that the output is text only (i.e. you can’t generate images like with Stable Diffusion, MidJourney, etc.).
The new model is smarter too, the OpenAI team tested it with practice exam books from 2022 and 2023. Note: the model doesn’t know anything after September 2021, so these exams (and their answers) weren’t part of the training data.
GPT-3.5 took the bar exam (which lawyers need to pass) and it scored in the bottom 10%. GPT-4 scored in the top 10%. The justice system isn’t ready for robo-lawyers yet, but they are on the horizon. GPT-4 also scored in the 88th percentile on the LSAT exam, v3.5 was in the 40th. For SAT Math, GPT-4 was in the 89th percentile, GPT-3.5 in the 70th. You can check out OpenAI’s announcement for more exam results.
The most important new feature in version 4 is “steerability”. Previously, ChatGPT was coerced into acting like a digital assistant by prepending some rules. It was possible to trick the AI into revealing those rules, e.g. here’s what Microsoft told “Sydney” to do as Bing (including not revealing its Sydney code name):
Microsoft and OpenAI have worked to hide such rules (to prevent so-called “jailbreaking”), but now there is a better way to do it – companies can control the AI’s style and task with a system message. Here’s an example:
It’s important to note that GPT-4 still has limitations, especially when it comes to facts. Like its predecessor, the model can make things up, these are called “hallucinations”. The new version is significantly better (scoring 40% higher on internal testing) than GPT-3.5 at sticking to the facts and not making logical mistakes, but it is still not perfect. Still, GPT-3 was released in mid-2020, GPT-3.5 arrived in early 2022 (a later enhancement was used for ChatGPT), so the pace of improvement is nothing short of incredible.
Now all we want to know is this – can we have a GPT-4 powered Cortana?