Technology Fridays: Driverless AI Wants to Bring You a Data Scientist in a Box
Welcome to Technology Fridays! Each week we try to deep dive into products that are revolutionizing emerging technology markets that sill remain under the radar. Today, I would like to focus on the self-service data science space through the lenses of the H2O.ai Driverless AI platform.
Implementing machine learning or deep learning models remains a task constrained to advanced technologists or artificial intelligence(AI) researchers. Even more painful is the fact that, in real world scenarios, data scientists spend a fraction of their time implementing a model while a large percentage of the effort goes into complementarily areas such as data exploration, parameter tuning, model testing and other, let’s say , more mechanical activities. Combine this with the generalized deficit in data science talent in the market and we get a glimpse of the challenges faced by most organizations. Not only data science talent is not available but it is often underutilized un tasks that can be better automated.
Driverless AI is the latest addition to H2O.ai’s popular machine learning suite. While the core of the H2O.ai platform it targeted to machine learning developers, Driverless AI focuses on non-AI experts. The platform democratizes the process of building machine learning models by providing sophisticated visual interfaces that automate many of the mechanical tasks in that type of solutions.
From my own analysis, I believe the biggest contribution of Driverless AI is that while it abstracts the implementation of machine learning models it does so in a way that “treats AI researchers with respect”. Here is what I mean: There are several platforms in the market that attempt to provide self-service solutions for creating machine learning models. However, despite their unquestionable simplicity, many of the models produced by those platforms are just too simplistic to be applied in real world scenarios. Driverless AI does provide point-and-click interfaces to implement machine learning models but also allow developers to deep dive into the underlying model and make the necessary optimizations.
Data exploration is one of the key capabilities of Driverless AI. The platform includes a component known as the Machine Learning Interpretability(MLI) that generates visualizations that identify patterns in datasets and explain machine learning models.
The balance achieved by the Driverless AI platform is partially the result of the friction between two of its components: AutoDL and AutoML. While the AutoDL stack is responsible for generating new features of a model, AutoML focuses on recommending machine learning algorithms and combine them into an ensemble of models that provide the best answer to the target problem (see my article about ensemble learning).
Typically, a data scientist or developer will first use AutoDL to generate new features based on the attributes of a training dataset and perform other tasks such as optimizing the encoding of the attributes. After that, the new dataset can be processed by AutoML to discover and rank different machine learning models. Finally, the user can leverage MLI to interpret the models using different visualizations.
Driverless AI provides first-class support for GPU runtimes which enables the scalability of the models. Additionally, the platform includes a large portfolio of machine learning algorithms which facilitates the implementation of highly sophisticated ensemble solutions. Finally, Driverless AI integrates seamlessly with the H2O.ai platform enabling the implementation of end-to-end machine learning experiences.
Driverless AI is a welcomed addition to the nascent space of self-service machine learning platforms. Among its competitors, we can include technologies such as RapidMiner, BigML or DataIKU which provide similar capabilities. Technologies like the recently acquired DataRobot take a more developer-centric approach to enable a low touch experience for implementing machine learning models.