Technology Fridays: Rapid Miner Wants to be the Tableau of Data Science
Welcome to Technology Fridays! Today we are going to explore one of the emerging areas in the machine intelligence(MI) market: self-service data science and one of the platforms leading the charge in this new field: Rapid Miner.
When we talk about self-service business intelligence, immediately names such as Tableau, Qlik or Microsoft’s PowerBI come to mind. With the emergence of MI technologies, there are a new group of platforms that are looking to extend that value proposition into the MI space with new tools that enable non-data scientists to create models and algorithms against their data sources. Rapid Miner is one of the pioneers in that market with a highly sophisticated platform that allows the creation of self-service data science experiences. In that context, you can think about Rapid Miner as a Tableau-like experience for the creation and operationalization of data science applications. Another way to look at this emerging market is as a successor of data intelligence products such as SAS that, although incredibly successful, require a lot of resources to be implemented in enterprise environments.
The Rapid Miner platform is based on four fundamental components: Studio, Server, Cloud and Radoop. While the Rapid Miner Studio enables the creation of self-service data science models, the other three components of the platform represent runtimes to run and scale those models.
Rapid Miner Studio is the main entry point to the data science platform. The tool provides an interactive environment for the creation of data science application. Upon launching the Studio, users have access to hundreds of data science processes that illustrate the different capabilities of the MI engine. Those samples are powered by the Rapid Miner Repository which serves as a catalog of data science models. Users can explore the models visually and even execute them locally, setting up breakpoints and exploring intermediate results. Data sciences programs can also be executed in a background mode in a local environment. The local runtime allow users to test the models without the need of an expensive runtime which is incredibly useful in real world data science applications.
When a data science model is ready fro primetime, , it can be deployed to the Rapid Miner Server. This component provides a scalable environment for the persistence and execution of data science programs. The Server runtime also provides tools that enable the evaluation, tracking and monitoring of data science processes.
Rapid Miner Cloud extends the capabilities of Rapid Miner Server onto an on-demand cloud runtime. The Cloud environment is fairly symmetric with Rapid Miner Server which streamlines its adoption in hybrid topologies. One of the main advantages of Rapid Miner Cloud is the ability of configuring the runtime requirements (memory, GPUs…) for each individual data science process.
The integration with third party technologies is another area of strengths of Rapid Miner. The platform provides many connectors that integrates with mainstream data and line of business systems. Among those, Rapid Miner excels at the integration with Hadoop environments powered by Rapid Miner Radoop. Conceptually Radoop provides a self-service graphical interface for analyzing data in a Hadoop cluster. Radoop can be executed directly from the Studio or Server environments. The Radoop Next is a key component of the platform. Nest enables the connection to a Hadoop cluster as well as the execution of data science processes. Under the covers, Radoop and Next leverage Hive to power is data analysis capabilities.
Together the four components of the Rapid Miner platform( Studio, Server, Cloud and Radoop) provide one of the most comprehensive stacks for the implementation of data science solutions. Rapid Miner is definitely one of the leaders in the self-service data space but it does faces stiff competition in this rapid growing market.
Rapid Miner competitors can be segmented in two main groups. One one side, we can place end-to-end machine learning platforms that, even when they are not in the exact same market, have an overlap with Rapid Miner feature set. H2O.ai is the prototypical example of a platform in that market segment. One the other side, we have an increasing group of self-service data science platforms that. despite the difference in terms of capabilities, the are fundamentally going after the same customer base as Rapid Miner. Platforms such as DataIKU, DataRobot, BigML or H2O.ai’s own Driverless are relevant in this market group.