Iceberg V3 Geometry, One Year Later: From Standards to Implementations

Iceberg V3 Geometry, One Year Later: From Standards to Implementations

One year ago, we wrote about why Apache Iceberg and GeoParquet could reshape the geospatial data stack. And that post itself was the continuation of a much longer effort. We first introduced GeoParquet back in 2022, when the idea of treating geospatial as a first-class citizen in the open data ecosystem was still mostly a proposal.

Standards work is slow, and it requires patience: years of working-group discussions, specification drafts, and coordination across engine vendors before a single line of the promise becomes something you can actually run. So it is genuinely exciting to reach the point where the implementations are finally arriving.

At the time of last year’s post, the story was about standards. Native geospatial support was arriving in Apache Parquet and Apache Iceberg, and for the first time the open lakehouse ecosystem had a credible path to make spatial data a first-class citizen: native geometry types, interoperable metadata, multi-engine access, and file-level pruning for spatial predicates.

For those of us involved in the GeoParquet working group and the Iceberg geospatial effort, that was an important milestone. The goal was to let geospatial data behave like the rest of your analytical data: open, governed, queryable across engines, and optimized by metadata rather than by bespoke spatial infrastructure.

But a standard is only the beginning.

The second half of the story is implementation.

A year after Iceberg V3 was approved, we wanted to understand how much of that promise had actually made it into the engines people use every day. Which engines can read Iceberg V3 tables with native geometry columns? Which can run spatial predicates correctly? And which can go all the way and use Iceberg’s geometry bounds to prune files automatically?

So we built a public testbed and ran the experiments.

From promise to implementation

The original promise was simple: combine GeoParquet’s native geometry typing at the file level with Iceberg’s table metadata, schema evolution, catalogs, and manifest-level statistics.

In theory, that gives us an open geospatial lakehouse stack where engines can understand geometry columns natively and use spatial bounds to skip irrelevant files.

In practice, each engine has to implement several layers correctly:

  • read the Iceberg V3 table metadata;
  • understand the native geometry and geography Iceberg types;
  • read the corresponding GeoParquet 2.0 geometry encoding;
  • execute spatial predicates correctly;
  • and use Iceberg’s geometry bounds for file pruning.

That is why we evaluated support as a ladder, from basic table readability to full spatial pruning.

And geometry is only one row in a much larger story. The Iceberg Matrix does a great job of capturing just how rich, and how uneven, the broader Iceberg ecosystem has become, tracking feature support across dozens of engines, from row-level operations and partitioning to catalogs and the new V3 data types.

The Iceberg Matrix: a feature-by-engine compatibility table covering row-level operations, partitioning, table management, V3 data types, and catalog support across AWS, GCP, Azure, Databricks, Snowflake, and third-party engines.
Iceberg feature support across engines. Source: icebergmatrix.org. Our testbed zooms into a single cell of this matrix: the Geometry / Geo Types row under V3 data types. It asks not just "supported or not," but whether spatial predicates and metadata pruning actually work end to end.

This post focuses on that one row, because native geometry is where the long-standing gap between GIS and the lakehouse finally closes, or doesn’t.

What we built

We created iceberg-geo-testbed, a small public repository with reproducible geospatial Iceberg fixtures and per-engine probes.

The testbed includes:

  • a static Iceberg REST catalog (more on that soon) exposing both V2 and V3 tables on public GCS and S3;
  • GeoParquet 2.0-typed Parquet files as the data layer, with geometry stored using Parquet’s native Geometry(crs=) logical type rather than plain BINARY;
  • per-engine runners that probe each table the way a real user would: open it, scan it, run a spatial predicate, and check whether pruning happens.

Each fixture contains the same data: 10 geographically disjoint regions, with 1,000 points per region, for a total of 10,000 rows across 10 Parquet files.

A California-window predicate gives us a clean signal. An engine that fully understands the table and uses the manifest geometry bounds should narrow the scan to 1 file out of 10.

We graded each engine on a simple ladder:

  • L0 — the table cannot be read at all;
  • L1 — a full scan works;
  • L2 — spatial predicates return the correct rows;
  • L3 — file-level pruning fires, and the manifest narrows the scan.

The short version

The architecture works.

Snowflake proves it. Native Iceberg V3 geometry can be read, queried, and pruned correctly using manifest geometry bounds. That is the end-to-end behavior the specification was designed for.

DuckDB is very close. It can read the geometry and run spatial predicates with an in-flight fix, but it does not yet prune files using the geometry bounds.

Most other engines are not blocked on Iceberg V3 itself. They can read V3 tables without geometry columns. What they lack is support for the new geometry and geography types.

That distinction matters. The ecosystem is waiting for engines to finish implementing the spatial part of the standard, not for the table format itself.

Where V3 native geometry stands today

EngineV3 native geometry status
SnowflakeL3 — works end-to-end. Managed and externally-written paths both deliver correct spatial predicates and manifest geometry-bound pruning. Snowflake is the clear leader today.
DuckDB 1.5.3L2 — reads correctly and spatial predicates work with PR #1013. File pruning is the next step.
BigQuery / BigLakeL0 for geometry. BigQuery can read non-geometry V3 fixtures, but rejects the native Iceberg geometry type.
Dataproc Serverless 2.3L0 for geometry. V3 tables without geometry work; native geometry currently fails through the Spark Iceberg runtime.
EMR Serverless 7.13L0 for geometry. Same Spark Iceberg runtime gap as Dataproc.
Sedona + Iceberg-Spark 1.7.1L0 for geometry. The same upstream Spark gap blocks reads, and the write path is also missing the required type mapping.
DatabricksL0 for Iceberg geometry. Geometry and geography exist in Delta, but the Iceberg-compatible writer rejects them. We have heard rumors that native Iceberg geometry support may be coming soon.
Oracle ADB 26aiL0. Oracle’s Iceberg reader could not read any of the Iceberg tables we tested, V2 or V3, so the V3 geometry question is blocked upstream.

The detail and per-capability breakdowns live in STATUS_V3.md.

SnowflakeSnowflake: the first complete implementation

Snowflake is the current leader.

It delivers Iceberg V3 geometry end to end, both on its managed write path and on externally-written V3 tables. We verified the whole stack:

  • spatial predicates return the correct rows;
  • manifest geometry-bound pruning fires;
  • a California-window predicate scans roughly 1/10 of the bytes of a full geometry-column scan on our 10-file fixture;
  • externally-written V3 tables work as long as the writer is compliant with the V3 specification.

That last point is important. Snowflake did not require a Snowflake-specific table shape. The unmanaged read path worked once the Iceberg metadata was V3-compliant: the snapshot block needed first-row-id and added-rows, and the manifest needed populated value_counts, null_value_counts, and ID-column bounds.

There are a couple of practical notes if you try this yourself.

First, for Snowflake-managed Iceberg tables, you need to pass ICEBERG_VERSION=3 explicitly. V2 remains the default for new Iceberg tables, and the error message does not make the missing opt-in especially obvious.

Second, for spatial queries, use TO_GEOMETRY(wkt, 4326) rather than bare TO_GEOMETRY(wkt). The geometry column is SRID 4326. A bare envelope defaults to SRID 0 and the predicate fails with an incompatible SRID error.

But those are small operational details. The main finding is much bigger: Snowflake proves the architecture works.

DuckDBDuckDB: very close

DuckDB is the most encouraging open-source result.

DuckDB 1.5.3 already reads V3 geometry correctly, and support is progressing quickly. We identified a remaining issue around spatial predicates and manifest geometry bounds, and the DuckDB maintainers responded rapidly with an upstream fix proposal in duckdb-iceberg#1002 and PR #1013.

The current gaps look incremental rather than architectural, and DuckDB appears very close to full end-to-end V3 geometry support.

The PR deliberately skips decoding the geometry bound, so DuckDB falls back to a full scan rather than pruning on it. That puts DuckDB at L2.

Closing the gap to L3 requires one more step: decoding the V3 packed_xy_le manifest bound and feeding it into the file-pruning predicate. The encoding is documented in the repo, and the path from L2 to L3 looks relatively short.

Google BigQueryBigQuery: V3 is there, geometry is not

One of the most important findings is that several engines do support Iceberg V3, just not V3 geometry yet.

To check this, we built a minimal V3 fixture with no geometry columns: an id string and an integer field. BigQuery, Dataproc Serverless 2.3, and EMR Serverless 7.13 all read that table cleanly and returned the expected 10,000 rows.

That tells us something useful. These engines are not missing Iceberg V3 entirely. The specific implementation gap is the native geometry type.

BigQuery’s failure mode is precise: it rejects the native V3 geometry type at external table creation time:

Unknown Iceberg type "geometry(OGC:CRS84)"

The same external table pattern works on the non-geometry V3 fixture, confirming that the V3 reader exists. The missing piece is the geometry type implementation.

That is a more useful conclusion than saying BigQuery “does not support V3.” It does. It just does not yet support the spatial part of V3.

Apache SparkSpark and EMR: the same upstream runtime gap

The Spark-based engines tell a similar story, but with a shared root cause. On the geometry fixture, they fail with:

UnsupportedOperationException: Cannot convert unknown type to Spark: geometry

This affects Sedona with Iceberg-Spark 1.7.1, Dataproc Serverless 2.3, and EMR Serverless 7.13. The gap appears to be upstream in iceberg-spark-runtime. One implementation there would move multiple engines forward at once. That is exactly the kind of leverage point that makes the spatial part of the standard worth finishing.

DatabricksDatabricks: geometry exists, but not through Iceberg yet

Databricks is the most asymmetric case.

GEOMETRY(SRID) and GEOGRAPHY(SRID) already work in Delta. The platform has native spatial types. But the Iceberg-compatible writer rejects those types with a compatibility violation.

So this does not look like a lack of geometry support in Databricks as a platform. It looks like a boundary issue between Delta’s geometry support and the Iceberg compatibility layer.

That distinction matters. The ingredients are there, but they do not yet cross into Iceberg.

OracleOracle: blocked before the V3 question

In our testing, we were not able to get Oracle’s Iceberg reader to successfully read any of the Iceberg tables we pointed it at: our tables, Snowflake-produced tables, V2 tables, or V3 tables. The failure was the same across them:

ORA-20000: Failed to generate column list

We spent time checking storage access, producer differences, metrics configuration, and authentication, but we could not isolate the issue further. It is possible that we missed some Oracle-specific requirement or configuration detail, so we do not want to overstate the conclusion here. What we can say confidently is that, with the configurations and fixtures we tested, we were unable to get baseline Iceberg table reads working, which means we could not meaningfully evaluate Oracle’s V3 geometry support.

What about Iceberg V2?

Iceberg V3 is the destination, but many users need something that works across engines today.

For that, we have been using a V2 convention that we describe as GeoIceberg V2: flat double bbox columns, a WKB geometry column, and a geo table property.

It is not as clean as native V3 geometry, but it has a very practical advantage: it works on every engine that reads Iceberg V2, and it gets file-level pruning for free because standard Iceberg writers already record manifest bounds on double columns.

In other words:

  • V3 is the right long-term model;
  • V2 with explicit bbox columns is the bridge for broad compatibility today.

That bridge matters because users should not have to choose between open table formats and spatial performance while the ecosystem catches up.

Along the way, we also hit a separate issue: even when an engine can theoretically read an Iceberg table, it often cannot discover tables from a truly open catalog.

The clearest example is Databricks. There is no generic Iceberg REST consumer. CREATE CONNECTION TYPE iceberg returns CONNECTION_TYPE_NOT_SUPPORTED; only named partners such as Glue, HMS, Snowflake, or Databricks itself can back a foreign Iceberg table.

Other warehouses have similar constraints: their connectors often require authentication models that public buckets or static catalogs cannot satisfy.

This is separate from geometry, but it matters for open data. If open Iceberg datasets are going to be as easy to share as Parquet files, catalog access needs to become more flexible.

We plan to cover this separately, including the Cloudflare Worker we built to bridge the gap for warehouses that cannot read static catalogs directly.

Recommendations

If you are choosing a geospatial format today, the practical answer depends on your target engines.

Files on disk: use GeoParquet 2.0

For standalone files, GeoParquet 2.0 is the right target. It gives you native Parquet geometry typing and broad readability. If you are publishing spatial files, use it. And if you want an easy way to inspect, validate, and explore GeoParquet datasets, geoparquet.io is a great companion tool. CARTO support for GeoParquet 2.0 is also already available in beta, making it easier to work with these datasets directly in cloud-native spatial workflows.

Tables queried through Snowflake: use Iceberg V3

If your table will be queried through Snowflake, Iceberg V3 geometry works today. Snowflake delivers the full story: native geometry, correct spatial predicates, and manifest-based pruning.

Tables that need broad engine support today: target Iceberg V3

A few months ago, the practical recommendation for broad interoperability would probably have been to stay on a V2 bridge pattern: WKB geometry, explicit bbox columns, and custom metadata.

At this point, the ecosystem momentum is clearly behind native Iceberg V3 geometry.

Snowflake already delivers the full end-to-end implementation. DuckDB is extremely close, with spatial predicates already working and geometry-bound pruning likely not far behind. Databricks also appears close to enabling Iceberg-native geometry support, building on the geometry types that already exist in Delta Lake.

That changes the calculus. If you are starting new geospatial lakehouse projects today, especially long-lived ones, it increasingly makes sense to target V3 directly rather than investing in transitional V2 conventions.

There will still be environments where the V2 bridge remains useful for compatibility reasons, particularly in Spark-based deployments waiting on upstream Iceberg runtime support. But the center of gravity has shifted toward native geometry support.

Long-lived tables: plan for V3

If you are building tables that will live for years, V3 should now be considered the default strategic direction.

The architecture is proven. Snowflake demonstrates the complete model in production today. DuckDB is rapidly converging on full support. Databricks is expected to move into this space soon as well, which would significantly accelerate ecosystem adoption.

The remaining gaps are increasingly implementation details rather than open design questions.

I, Javier de la Torre, will also be at the Data + AI Summit in San Francisco. If you are working on Iceberg, GeoParquet, spatial analytics, or open geospatial lakehouse architectures and want to discuss any of this in person, feel free to reach out.

Why this matters

The work on GeoParquet 2.0 and Iceberg V3 set out to remove one of the long-standing barriers between GIS and modern cloud analytics.

For non-spatial data, the lakehouse architecture already gives us open formats, remote catalogs, metadata-based pruning, schema evolution, and broad interoperability across engines. Geospatial data should not require a parallel stack, a proprietary index, or bespoke integration work every time a user wants to ask a spatial question.

That is why native geometry support in Parquet and Iceberg matters.

The good news is that the architecture is now proven. Snowflake shows that the full path can work: native geometry columns, spatial predicates, and file-level pruning from Iceberg metadata. DuckDB shows that open-source engines are close behind.

The less good news is that support remains uneven. Several engines can read Iceberg V3 tables, but fail specifically when the table contains the new geometry type.

That means the bottleneck has shifted.

A year ago, the question was whether the open lakehouse stack had the right model for geospatial data.

Today, the answer is yes.

The question now is how quickly the ecosystem can implement it.

Try it yourself

Everything in this post is reproducible from the public repository. The fixtures live on a public GCS bucket, and you can point any engine that understands static Iceberg metadata at them.

See the repository for the latest fixtures, engine runners, and reproduction instructions:

https://github.com/jatorre/iceberg-geo-testbed

The status matrices stay current in STATUS_V3.md.

If you maintain an engine and want to flip a cell from L0 to L2 or L3, the relevant pieces are well isolated. Geometry type support, bbox derivation, and the V3 geometry-bound deserializer are relatively contained changes with a large ecosystem payoff.

The standard is here.

Now comes the implementation work.

At CARTO, we are also actively working to bring these standards into production workflows: expanding support for GeoParquet 2.0 across our platform and advancing support for Iceberg V3 catalogs with geospatial data. We expect to share more updates soon as these capabilities mature and become available to users.

And if you are a developer who cares about open geospatial standards, native geometry in the lakehouse, or the engine-level work described in this post, we would love to hear from you. Come build this with us.

Hear from our experts!

Request a Demo