logo
logo
logo
logo
logo
logo

May 6, 2025

May 6, 2025

May 6, 2025

May 6, 2025

Highlights from the Iceberg Summit 2025

Highlights from the Iceberg Summit 2025

Highlights from the Iceberg Summit 2025

Highlights from the Iceberg Summit 2025

Highlights from the Iceberg Summit 2025

by Przemek Delewski

Over the past few years, Apache Iceberg™ has emerged as the leading data standard for modern data lakehouses. It enables seamless data usage across multiple platforms (e.g., Spark, Snowflake, Trino) without duplication, delivering true compute-storage separation. Although Tabular, the company behind Iceberg, was acquired by Databricks in a $2 billion deal, Iceberg remains a thriving open ecosystem driven by a vibrant community.

In this article, I will share key insights and my perspectives on Iceberg’s future, including the upcoming Table Spec V3 and beyond.

While Iceberg previously hosted a virtual conference in 2024 and had notable presentations (such as the renowned “Why You Shouldn’t Care About Iceberg” at 2022’s Data Council), 2025 marked its first-ever in-person summit. This inaugural one-day Apache Iceberg Summit gathered around 500 attendees in San Francisco, featuring prominent speakers from AWS, Snowflake, Databricks, Microsoft, Cloudera, Apple, Airbnb, Starburst, among others.

Keynote: “v3 and Beyond: Iceberg's Ongoing Evolution”

Ryan Blue's keynote highlighted advancements across Iceberg implementations in Java, Rust, Python, Go, and the newer C++ efforts. He shared a vision where it could converge with Delta Lake, broadening its scope further. Although Iceberg was originally designed for handling large, distributed datasets (initially at Netflix), an intriguing area for future development was discussed: support for smaller table scenarios.

Currently, Iceberg’s community is head down, focused on shipping Table Spec V3, with features such as semi-structured data (VARIANT type), view, materialised view, binary delete vectors and row lineage tracking. 

Ryan’s vision extends beyond V3, already looking toward V4. He is bullish on single-file commits and more adaptive metadata. Currently, Iceberg can have significant overhead when regularly committing small amounts of data (see The file explosion problem or Small File problem). This makes it unsuitable for real-time use cases in which specialised databases such as Hydrolix or ClickHouse shine. The goal would be to close that gap by involving the broader community.

Iceberg Beyond Java: A Layered Approach

I attended sessions exploring Iceberg implementations beyond the core Java version, including Python—called PyIceberg and referred to by this name throughout the text—and Go. The PyIceberg talk, presented by Fokko Driesprong, highlighted a layered approach: PyIceberg closely mirrors Java's features, while Go builds upon the PyIceberg implementation. Notably, PyIceberg leverages Rust bindings for intensive compute operations—a strategic combination that blends Python’s ease of use with Rust’s performance. Each implementation serves distinct purposes: PyIceberg excels in rapid prototyping due to its simplicity and flexibility, whereas Go provides performance benefits by avoiding common pitfalls in Python.

Matt Topol, in his talk “Gophers Continuing Up the Iceberg”, highlighted two significant strengths of Go: ease of development and deployment due to its simplicity and native compilation capabilities. Matt further identified valuable use cases for non-JVM Iceberg integrations:

  • Lightweight command-line tools for exploring Iceberg catalogs and metadata

  • Direct integration with non-Java languages without JNI or Spark

  • Embedding low-level Iceberg capabilities into existing systems without extensive rewrites or additional bridges

While independent implementations may initially lag—as seen with Apache Parquet (discussed in "Query Engines: Gatekeepers of the Parquet File Format")—they are critical for Iceberg’s long-term success.

The Deconstructed Database

Another insightful talk, “Apache Iceberg and the Deconstructed Database” by Julien Le Dem, offered a retrospective on how data systems evolved from monolithic structures to modular, composable components, improving flexibility and scalability.

Notable examples include:

  • Apache Iceberg – Separating storage management from storage engines, providing schema evolution, transactional guarantees, and time-travel queries.

  • Substrait – Standardizing query representations to decouple query logic from execution, enhancing interoperability across analytics tools and engines.

  • Apache Parquet – Optimizing on-disk storage with columnar formats, enhancing scan-heavy analytical workloads.

  • Apache Arrow – Facilitating efficient in-memory data interchange, enabling zero-copy data sharing across languages and components.

Together, these innovations establish Iceberg as a robust foundation for modern analytics, embodying the modular "deconstructed database" architecture shaping future data infrastructures.

Summary

Apache Iceberg has decisively established itself as the go-to storage standard for analytical databases, offering compelling advantages for vendors and end users alike. Particularly relevant in the AI era, Iceberg provides an ideal foundation for data-driven workflows requiring isolated environments for agents.

Despite existing challenges—such as limited support for small-table scenarios, real-time use cases, and non-Java implementations—the thriving community ensures continued innovation and broad adoption by mainstream companies. The Iceberg ecosystem has clearly crossed the adoption chasm, positioning it to drive the next wave of innovation in data management.

Table of Contents

Title
line
Title
line

Table of Content

Title
line

Table of Content

Title
line

Table of Content

Title
line

Table of Content

Title
line

Stay tuned for feature releases, product roadmap,
support, events and more!

© Quesma Inc. 2025

Stay tuned for feature releases, product roadmap,
support, events and more!

© Quesma Inc. 2025

Stay tuned for feature releases, product roadmap,
support, events and more!

© Quesma Inc. 2025

Stay tuned for feature releases, product roadmap,
support, events and more!

© Quesma Inc. 2025

Stay tuned for feature releases, product roadmap,
support, events and more!

© Quesma Inc. 2025