Home » Chat-GPT – A Large Language Model

Chat-GPT – A Large Language Model

by Sanjenbam Jugeshwor Singh
0 comment 8 minutes read

Open-AI introduced a long-form question-answering AI called Chat-GPT that answers complex questions conversationally. It’s a revolutionary technology because it’s trained to learn what humans mean when they ask a question. Many users are awed at its ability to provide human-quality responses, inspiring the feeling that it may eventually have the power to disrupt how humans interact with computers and change how information is retrieved. Chat-GPT is a large language model chat-bot developed by Open-AI based on GPT-3.5. It has a remarkable ability to interact in conversational dialogue form and provide responses that can appear surprisingly human. Large language models perform the task of predicting the next word in a series of words. Reinforcement Learning with Human Feedback (RLHF) is an additional layer of training that uses human feedback to help Chat-GPT learn the ability to follow directions and generate responses that are satisfactory to humans. Chat-GPT was created by San Francisco-based artificial intelligence company Open-AI. Open-AI Inc. is the non-profit parent company of the for-profit Open-AI LP. Open-AI is famous for its well-known DALL·E, a deep-learning model that generates images from text instructions called prompts. The CEO is Sam Altman, who previously was president of Y Combinatory. Microsoft is a partner and investor in the amount of $1 billion dollars. They jointly developed the Azure AI Platform .Chat-GPT is a large language model (LLM). Large Language Models (LLMs) are trained with massive amounts of data to accurately predict what word comes next in a sentence .It was discovered that increasing the amount of data increased the ability of the language models to do more.
According to Stanford University “GPT-3 has 175 billion parameters and was trained on 570 gigabytes of text. For comparison, its predecessor, GPT-2, was over 100 times smaller at 1.5 billion parameters. This increase in scale drastically changes the behavior of the model — GPT-3 is able to perform tasks it was not explicitly trained on, like translating sentences from English to French, with few to no training examples. This behavior was mostly absent in GPT-2. Furthermore, for some tasks, GPT-3 outperforms models that were explicitly trained to solve those tasks, although in other tasks it falls short.” LLMs predict the next word in a series of words in a sentence and the next sentences – kind of like auto complete, but at a mind-bending scale. This ability allows them to write paragraphs and entire pages of content. But LLMs are limited in that they don’t always understand exactly what a human wants. And that’s where Chat-GPT improves on state of the art, with the aforementioned Reinforcement Learning with Human Feedback (RLHF) training. GPT-3.5 was trained on massive amounts of data about code and information from the internet, including sources like Reddit discussions, to help Chat-GPT learn dialogue and attain a human style of responding. Chat-GPT was also trained using human feedback (a technique called Reinforcement Learning with Human Feedback) so that the AI learned what humans expected when they asked a question. Training the LLM this way is revolutionary because it goes beyond simply training the LLM to predict the next word.
A March 2022 research paper titled Training Language Models to Follow Instructions with Human Feedback explains why this is a breakthrough approach: “This work is motivated by our aim to increase the positive impact of large language models by training them to do what a given set of humans want them to do. By default, language models optimize the next word prediction objective, which is only a proxy for what we want these models to do. Our results indicate that our techniques hold promise for making language models more helpful, truthful, and harmless. Making language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users.” The engineers who built Chat-GPT hired contractors (called labelers) to rate the outputs of the two systems, GPT-3 and the new Instruct GPT (a “sibling model” of Chat-GPT).Based on the ratings, the researchers came to the following conclusions: “Labelers significantly prefer Instruct GPT outputs over outputs from GPT-3. Instruct GPT models show improvements in truthfulness over GPT-3.Instruct GPT shows small improvements in toxicity over GPT-3, but not bias.” The research paper concludes that the results for Instruct GPT were positive. Still, it also noted that there was room for improvement. Overall, our results indicate that fine-tuning large language models using human preferences significantly improves their behavior on a wide range of tasks, though much work remains to be done to improve their safety and reliability. “What sets Chat-GPT apart from a simple chat-bot is that it was specifically trained to understand the human intent in a question and provide helpful, truthful, and harmless answers.”
Because of that training, Chat-GPT may challenge certain questions and discard parts of the question that don’t make sense. Another research paper related to Chat-GPT shows how they trained the AI to predict what humans preferred. The researchers noticed that the metrics used to rate the outputs of natural language processing AI resulted in machines that scored well on the metrics, but didn’t align with what humans expected. The following is how the researchers explained the problem: “Many machine learning applications optimize simple metrics which are only rough proxies for what the designer intends. This can lead to problems, such as YouTube recommendations promoting click-bait.” So the solution they designed was to create an AI that could output answers optimized to what humans preferred. To do that, they trained the AI using datasets of human comparisons between different answers so that the machine became better at predicting what humans judged to be satisfactory answers. The paper shares that training was done by summarizing Reddit posts and also tested on summarizing news. The research paper from February 2022 is called  Learning to Summarize from Human Feedback.
The researchers write: “In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.” Chat-GPT is specifically programmed not to provide toxic or harmful responses. So it will avoid answering those kinds of questions. An important limitation of Chat-GPT is that the quality of the output depends on the quality of the input. In other words, expert directions (prompts) generate better answers. Another limitation is that because it is trained to provide answers that feel right to humans, the answers can trick humans that the output is correct. Many users discovered that Chat-GPT can provide incorrect answers, including some that are wildly incorrect. The moderators at the coding Q&A website Stack Overflow may have discovered an unintended consequence of answers that feel right to humans. Stack Overflow was flooded with user responses generated from Chat-GPT that appeared to be correct, but a great many were wrong answers. The thousands of answers overwhelmed the volunteer moderator team, prompting the administrators to enact a ban against any users who post answers generated from Chat-GPT.
The flood of Chat-GPT answers resulted in a post entitled Temporary policy Chat-GPT is banned “This is a temporary policy intended to slow down the influx of answers and other content created with Chat-GPT. The primary problem is that while the answers which Chat-GPT produces have a high rate of being incorrect, they typically ‘look like’ they ‘might’ be good…” The experience of Stack Overflow moderators with wrong Chat-GPT answers that look right is something that Open-AI, the makers of Chat-GPT, are aware of and warned about in their announcement of the new technology. The use of Chat-GPT is currently free during the “research preview” time. The chat-bot is currently open for users to try out and provide feedback on the responses so that the AI can become better at answering questions and to learn from its mistakes.
The Official announcement states that Open-AI is eager to receive feedback about the mistakes: “While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system.” There is currently a contest with a prize of $500 in Chat-GPT credits to encourage the public to rate the responses. “Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface. We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that help us uncover and understand novel risks and possible mitigations. You can choose to enter the Chat-GPT Feedback Contest3 for a chance to win up to $500 in API credits. Entries can be submitted via the feedback form that is linked in the Chat-GPT interface.”
(Writer can be reached at:[email protected])

You may also like

Leave a Comment

ABOUT US

Imphal Times is a daily English newspaper published in Imphal and is registered with Registrar of the Newspapers for India with Regd. No MANENG/2013/51092

FOLLOW US ON IG

©2023 – All Right Reserved. Designed and Hosted by eManipur!

Adblock Detected

Please support us by disabling your AdBlocker extension from your browsers for our website.