Inside Sparrow: The Foundation of DeepMind’s ChatGPT Alternative

Sparrow uses a combination of large language models and reinforcement learning to enable a safer conversational experience.

Jesus Rodriguez
3 min readFeb 8

--

Created with Midjourney

I recently started an AI-focused educational newsletter, that already has over 150,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

In the middle of the ChatGPT frenzy, DeepMind’s CEO Demis Hassabis gave an interview to Time Magazine in which he mentioned their intentions to launch a similar model this year. The foundation of the rumored ChatGPT competitor is based on Sparrow, a model outlined research paper DeepMind published in late 2022. The original goal of Sparrow’s research was to enable safer conversational agents but now seems to be positioned as the core component of DeepMind’s ChatGPT alternatives.

Large language models (LLMs) have achieved success in tasks such as question answering, summarization, and dialogue. However, dialogue agents powered by LLMs can sometimes express inaccurate or invented information, use discriminatory language, or encourage unsafe behavior. To tackle this challenge, DeepMind has explored new methods of training dialogue agents using reinforcement learning based on human feedback. The latest development in this field is Sparrow — a dialogue agent that is both useful and reduces the risk of unsafe or inappropriate answers. Sparrow is designed to talk with a user, answer questions, and search the internet using Google to inform its responses.

Sparrow uses reinforcement learning based on people’s feedback to train a model of how useful an answer is. Participants are shown multiple model answers to the same question and asked which answer they prefer. This data is used to train Sparrow on what makes a dialogue successful. To ensure…

--

--

Jesus Rodriguez

CEO of IntoTheBlock, President of Faktory, I write The Sequence Newsletter, Guest lecturer at Columbia University and Wharton, Angel Investor, Author, Speaker.