Member-only story
OpenAI Helps Us Understand How Deep Learning Training Scales
Understanding the optimimal size of a training set remains one of the most interesting challenges of supervised learning models.
I recently started an AI-focused educational newsletter, that already has over 150,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
In the last few yew years, there have been an increasing interest in training parallelization methods that can be applicable to large deep learning models. Those training parallelism efforts have focused on both model-based and data-based approaches with the latter being more popular given their simplicity. Conceptually, data parallelism involves splitting a training dataset into batches of data, distributing those across multiple computing devices and aggregating the resulting gradients. One of the…