Databricks Launches Delta to Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

  • Infrastructure
  • 25.10.2017 10:11 am

Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark(TM), today announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, up to 100x increase in the query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a key component of the Databricks Unified Analytics Platform that runs in the cloud, eliminates the architectural complexity and operational overhead of maintaining three disparate systems: data lakes, data warehouses and streaming systems. With Delta, enterprise organizations no longer need complex, brittle extract, transform, and load (ETL) processes that run across a variety of systems and create high latency just to obtain access to relevant, business-critical data.

"At Edmunds, obtaining real-time customer and revenue insights is critical to our business. But we've always been challenged with complex ETL processing that slows down our access to data," said Greg Rokita, executive director of technology at Edmunds.com. "Databricks Delta allows us to overcome this roadblock by blending the performance of a data warehouse with the scale and cost-efficiency of a data lake.  We now have a simplified data architecture that enables immediate access to business-critical data."

"Many enterprise organizations are struggling with the limitations of data lakes and data warehouses as well as the complexity of managing both and moving data between them," said Ali Ghodsi, cofounder and chief executive officer at Databricks. "Delta combines the performance of data warehouses with the scale of data lakes and low-latency of streaming systems. With this unified management system, enterprises now benefit from a simplified data architecture and faster access to relevant data - increasing their ability to make decisions that drive results. We have solved a massive struggle facing organizations that are on a mission to run their business in real-time."

Databricks Delta delivers the following capabilities to simplify enterprise data management:

  • Manage Continuously Changing Data Reliably: Industry's first unified data management system simplifies pipelines by allowing Delta tables to be used as a data source and sink. Delta tables provide transactional guarantees for multiple concurrent writers - batch and streaming jobs. Delta natively supports the real-time needs of the business by enabling a streaming data warehouse to return the most recent, consistent view of the writes. Upserts in Delta provide a clean way to change data after it has been written, instead of running the entire job again.
  • Perform Fast Queries Without Manual Tuning: Delta automates performance management, removing the need for tedious performance tuning approaches. Self-optimizing data layout ensure data queried together is stored together. Delta automates compaction of small files for efficient reads. Intelligent data skipping and indexing leads to massive speedups by not reading unneeded data. Automated caching leads to subsequent reads being an order of magnitude faster.
  • Provide cost efficiency and scale of Data Lakes: Delta stores all its data in Amazon S3 for cost-efficiency and massive scale. The data in Delta is stored in a non-proprietary and open file format to ensure data portability and prevent vendor lock-in. 
  • Integrate with Unified Analytics Platform: Databricks Delta data can be accessed from any Spark application running on the Databricks platform through the standard Spark APIs. Delta also integrates into the Databricks Enterprise Security model, including cell-level access control, auditing, and HIPAA-compliant processing. Data is stored inside customer's own cloud storage account for maximum control.

Related News