Sunday, 24 Nov 2024

Why does ChatGPT keep lying? (Clue: It doesn't – it's hallucinating)

ChatGPT has been caught lying again – this time in an actual court.

It all started when a passenger tried to sue the airline Avianca, claiming a metal trolley hit and injured his knee on a flight to New York. As part of the defence, the passenger’s lawyers submitted several relevant rulings in a bid to show legal precedent and persuade the judge not to throw out the case.

The problem? The rulings didn’t exist.

Having been asked by a lawyer to provide research into similar cases, ChatGPT generated a number of realistic-looking citations, including courts, judges, docket numbers, dates and quotes.

The lawyer even asked ChatGPT for confirmation, in one case writing ‘Is Varghese a real case?’, to which the chatbot replied ‘Yes, [it] is a real case’.

He continued, asking: ‘Are the other cases you provided fake?’

ChatGPT replied: ‘No, the other cases I provided are real and can be found in reputable legal databases.’

Except they couldn’t, landing the passenger’s legal team in quite the predicament. Judge P Kevin Castel, overseeing the original case, has set a hearing for June 8 to consider possible sanctions.

The incident throws up a number of questions – the first being why didn’t any of the legal team check the citations themselves? On the one hand, proponents for artificial intelligence argue Large Language Models (LLMs) such as OpenAI’s ChatGPT, Google’s Bard and others will improve productivity and creativity by taking on such humdrum tasks.

On the other, as workers fear for their jobs, the fact that chatbots can’t yet be fully trusted and still require a human to verify their work should be seen as a positive.

That leads on to the bigger question, which is, why does ChatGPT lie in the first place?

First of all, many argue that calling it a lie is incorrect, and further pushes us towards anthropomorphising the technology. A lie is a deliberately false statement, and therefore requires an understanding of the intent.

Remember, ChatGPT doesn’t understand anything it says.

At the most basic level, LLMs function by learning which words or phrases will sound plausible when following one another. It knows the word ‘dog’ is more likely to be followed by the word ‘walking’ than ‘antidisestablishmentarianism’, and ‘surf’ by the word ‘board’ than ‘3.1415926535’.

This is all part of the natural language processing upon which functions such as autocomplete or autosuggest are built, in which the system is predicting what will most likely come next.

There’s no doubt whatsoever that ChatGPT excels at this function – just ask it to compose a poem in the style of WB Yeats for proof: 

‘So let us pause, and in this tranquil scene,
Find solace and a moment so serene,
For Galway’s bay, at sunrise’s embrace,
Invites us to partake in nature’s grace.’

That’s just the final verse, the rest was equally as impressive – likewise for Shakespeare and other renowned writers.

It also highlights the second core principle behind LLMs. ChatGPT was able to create a highly realistic Yeats-style poem because it was trained on all of his work. 

LLMs are fed huge amounts of data. Staggering amounts. Hundreds of billions of words. Essentially, the entire internet.

This means chatbots have the ability to answer millions of questions – or create plausible answers to others if the information isn’t directly to hand. 

It is this that continues to catch out users – but ChatGPT itself will tell you, when asked, that it cannot be trusted 100%.

‘As an AI language model, I don’t intentionally make up answers,’ ChatGPT responded when asked why it makes up answers. 

‘However, it’s important to note that I generate responses based on patterns and examples I’ve learned from a vast amount of text data. Sometimes, when faced with queries or topics that fall outside of my training data or knowledge cutoff, I may not have access to the most up-to-date information or accurate facts. In such cases, I might provide speculative or incomplete answers.

‘My purpose is to assist and provide helpful information, but I can still make mistakes. It’s always a good idea to verify the information I provide from reliable sources.’

Again, that sounds remarkably self-aware, but don’t be fooled. 

However, it does offer a valuable reminder that ChatGPT-4, the most recent version of OpenAI’s flagship model, stopped learning in September 2021, and has not been trained on any new data since that point. While answering questions relating to specific dates – such as who won the 2023 Super Bowl or who is the UK home secretary – ChatGPT may highlight the knowledge gap, but not all prompts will elicit such a reminder.

Of course, if it did make up a Super Bowl winner the consequences would likely be minimal, perhaps looking foolish in a pub quiz. Other ChatGPT hallucinations have had far greater impacts for those involved, such as the Australian mayor accused of bribery when he was a whistleblower in the case, or the US law professor accused of sexual misconduct – an allegation ‘supported’ by a fabricated Washington Post article.

Also of concern is the varying frequency that ChatGPT hallucinates in different languages, again a result of the datasets on which it is trained. Answers in a particular language are more likely to come from data already written in that language, rather than searching for the best answer and translating it.

For example, answers in Chinese dialects may state that the illegal mass detention of Uighur Muslims in the country is ‘a vocational and educational effort’, parroting the official government line.

‘When you have kids, you can understand what kind of education they get, what books they read, to try to give them a base set of fundamental values and morals,’ said Tony Fadell, inventor of the iPod and co-creator of the iPhone. ‘Then over time they use those as a base to learn and grow and adapt to society.

‘With AIs, what we do is we give them the entirety of the internet, then say “don’t do this, don’t do that”.’

Speaking at the launch of Starmus Earth, the science and arts festival founded by Queen guitarist Brian May, Fadell continued: ‘We need to look at the data sets, make them more robust and really understand what we’re training these AIs with, who’s training them, and how they’re being used.’

As with all technology, LLMs are user sensitive. They can be used well or badly depending on the understanding of them (for an example of bad understanding, see the university professor who asked if ChatGPT had written his students’ assignments), and in the future, may be used on a larger scale to better humanity or with malicious intent.

Regulation will be key, as Mr Fadell added, but difficult to achieve on a global scale.

Until then, it will be left to individuals to make chatbots work for them. 

And to do that, always remember:

Check. Everything.

Source: Read Full Article

Related Posts