Apache Apex is an Intriguing Addition to the Stream Data Processing Market

The stream analytics and data processing market seems to have become incredibly crowded in the last few years. Everywhere we look there seem to be newer and popular technologies for implementing stream data computation applications. Just within the Apache foundation, the number of stream data processing platforms seems to have grown out of control: Strom, spark Streaming, Flume, Samza, Flink Streaming are some of the most popular stream computation frameworks that have been well received by the customer and developer communities.

Recently, I started learning more about Apache apex, yet another Apache-based, stream data computation platform. Based on Hadoop YARN, Apache Apex might, on the surface, seem like another addition to an already crowded space. However, Apex seems to have made the necessary investment and taken the required steps to catalyze its mainstream adoption in complex environments.

One of the main challenges developers and businesses face when evaluating stream data processing platforms, is the complexity that many of those stack introduce when implemented at scale. While most stream analytic technologies have done a remarkable job providing really innovative stream data processing capabilities, many are still lacking the programming model simplicity and the toolset to be adapted by mainstream developers. this is where Apache Apex excels at. The Apex platform seems to have taken into account the aforementioned challenges by providing a very simple programming model and infrastructure for the implementation of stream analytic solutions.

At the core of apache Apex we have the Malhar framework, an open source library that includes connectors, parsers, stream manipulation and computation routines that can be assembled into really sophisticated stream data processing applications. Apex Malhar includes connectors to common databases, messaging platforms and line o business systems while also supporting multiple transport and messaging protocols. In addition to its integration capabilities, Malhar includes stream data computation primitives such as JOIN, GROUP BY, LIMIT, ORDER BY and dozens of others that can be combine to enable the real time processing of data streams. Malhar also provides data manipulation, statistics, filtering and pattern matching routines that are relevant in sophisticated stream analytic solutions.

Apache Apex developers can leverage the straightforward programming model across several programming languages such as JavaScript, Python, Ruby or R. Ultimately, Apex models applications using a direct acyclic graph(DAG) representation in which nodes represent individual routines. The flexibility of this data structure enables the use of Apex in arbitrarily complex stream data processing scenarios.

Another aspects that contributes to the growing popularity of Apache apex is its architecture based on Hadoop YARN. The YARN model allow Apex applications to scale linearly across tens of thousands ot nodes while enjoying capabilities such as fault tolerance, data partitioning or processing guarantees without having to invest on a proprietary infrastructure.

A robust stream data computation framework, a simple programming model, integration with mainstream systems and a widely adopted infrastructure are some of the strategic elements contributing to the rapid adoption of Apache Apex. As mentioned before, stream analytics is an incredibly crowded space but Apex is certainly an intriguing addition to it.

Written by

CEO of IntoTheBlock, Chief Scientist at Invector Labs, Guest lecturer at Columbia University, Angel Investor, Author, Speaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store