Member-only story

The AI Powering ChatGPT

A clever combination of the InstructGPT architecture with reinforcement learning models.

--

Image Credit: https://lifearchitect.ai/chatgpt/

I recently started an AI-focused educational newsletter, that already has over 150,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

ChatGPT has been one of the most popular artificial intelligence(AI) agents ever created. The model has taken the data science community and the internet by storm pushing the boundaries of creativity across all industries. Despite the immense popularity of ChatGPT, there have been very little discussion about the AI techniques behind its magic. Many of the techniques behind ChatGPT are going to be the foundation of the upcoming GPT-4 which promises to be one of the most impressive models in AI history.

The main ideas behind ChatGPT were pioneered by another OpenAI’s , InstructGPT which was released earlier this year. InstructGPT fine tunes GPT to follow instructions which opens the door to a wider set of human interactions . ChatGPT takes some of the ideas pioneered by InstructGPT to a whole new level with a very novel architecture and training process.

Inside ChatGPT

Similarly to InstructGPT, the core architecture of ChatGPT relies on a “human-annotated data + reinforcement learning” (RLHF) methods. The main idea of using RLHF is to continuously fine-tine the underlying language model to understand the meaning of human commands. However, ChatGPT includes some differences in the data collection setup by including supervised fine-tuning with human AI trainers for both the user and an AI assistant. The core ChatGPT training process is segmented in three main phases:

Phase 1: Supervised Policy Model

--

--

Jesus Rodriguez
Jesus Rodriguez

Written by Jesus Rodriguez

CEO of IntoTheBlock, President of Faktory, President of NeuralFabric and founder of The Sequence , Lecturer at Columbia University, Wharton, Angel Investor...

Responses (4)

Write a response