Technology Fridays: Tamr Catalog Simplifies Data Discovery in the Enterprise
Welcome to another Technology Friday on which I try to cover innovative technologies in hot markets that are somewhat flying under the radar. Today, I would like to focus on the challenging market of data discovery and discuss a platform that is literally redefining the space: Tamr.
Data discovery, is, very often, the forgotten child of enterprise data pipelines. In large enterprise environments with thousands of back office systems, sophisticated analytic solutions often fail because the simple fact that people don’t know what data sources are available and how to interact with them. This problem has gotten drastically worse with the proliferation of unstructured and semi-structured data sources . For decades, traditional data quality management vendors attempted to provide enterprise data discovery solutions but none of those solutions achieved mainstream adoption.
Tamr tackles the enterprise data discovery challenges using a metaphor from the consumer market: data catalogs. Data marketplaces have become a popular concept in the consumer internet as a way to discover and explore public data sources. Tamr Catalog combines some of the user experience concepts of consumer data marketplaces with advanced machine learning techniques that provide a novel model to enable data discovery and exploration in the enterprise.
Tamr Catalog is distributed as an open source solution for data discovery and exploration. At the core of the platform, there is a catalog-like user experience that enables the browsing of registered data sources. Tamr provides connectors to mainstream data systems such as SQL Server, Oracle, MongoDB, Teradata. Similarly, Tamr Catalog can connect to popular SaaS and line of business applications. Developers can extend Tamr Catalog by providing connectors to new databases of back-office systems.
For each registered source, Tamr Catalog leverages static algorithms to highlight relevant metrics such as Attributes (fields or columns), Records( number of rows), Items( tables…) and several others. Tamr Catalog users can add new metadata constructs to Sources in the form of comments, tasks and other artifacts.
Tamr Catalog leverages advanced machine learning and data visualization techniques to streamline the discovery and exploration of data sources. Treemaps are a premier example of this capability. Tamr Catalog’s Treemaps provide an intuitive visualization for large volumes of data. Treemaps cleverly use color-scales and shapes to help users explore data sources and metrics.
The capabilities of the Tamr Catalog platform expand beyond its sophisticated user interface. The platform includes APIs and SDKs that enables its integration with third party applications. Additionally, the Tamr CLI provides a command-line interface to automate data discovery and exploration tasks.
Enterprise data discovery is experiencing a renainesence in the enterprise. Consequently, Tamer is experiencing active competition from new platforms such as Alation as well as data quality management innovators such as Trifacta or Paxata. Cloud data discovery services such as Azure Data Catalog are also relevant competitors of Tamer in some contexts.