Spark 2.0 Takes an All-In-One Approach to Big Data

July 28, 2016

Apache Spark, the in-memory processing system that’s fast become a centerpiece of modern big data frameworks, has officially released its long-awaited version 2.0. Aside from some major usability and performance improvements, Spark 2.0’s mission is to become a total solution for streaming and real-time data. This comes as a number of other projects — including others from the Apache Foundation — provide their own ways to boost real-time and in-memory processing.

Most of Spark 2.0’s big changes have been known well in advance, which has made them even more hotly anticipated. One of the largest and most technologically ambitious additions is Project Tungsten, a reworking of Spark’s treatment for memory and code generation.

RELATED ARTICLESMORE FROM AUTHOR

Celebrating the Second Year of Linux Man-Pages Maintenance Sponsorship

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

RELATED ARTICLES MORE FROM AUTHOR