Iceberg V3 Geometry, One Year Later: From Standards to Implementations

May 29, 2026

14 mins read

Iceberg V3 Geometry, One Year Later: From Standards to Implementations

One year ago, we wrote about why Apache Iceberg and GeoParquet could reshape the geospatial data stack. And that post itself was the continuation of a much longer effort. We first introduced GeoParquet back in 2022, when the idea of treating geospatial as a first-class citizen in the open data ecosystem was still mostly a proposal.

Standards work is slow, and it requires patience: years of working-group discussions, specification drafts, and coordination across engine vendors before a single line of the promise becomes something you can actually run. So it is genuinely exciting to reach the point where the implementations are finally arriving.

At the time of last year’s post, the story was about standards. Native geospatial support was arriving in Apache Parquet and Apache Iceberg, and for the first time the open lakehouse ecosystem had a credible path to make spatial data a first-class citizen: native geometry types, interoperable metadata, multi-engine access, and file-level pruning for spatial predicates.

For those of us involved in the GeoParquet working group and the Iceberg geospatial effort, that was an important milestone. The goal was to let geospatial data behave like the rest of your analytical data: open, governed, queryable across engines, and optimized by metadata rather than by bespoke spatial infrastructure.

But a standard is only the beginning.

The second half of the story is implementation.

A year after Iceberg V3 was approved, we wanted to understand how much of that promise had actually made it into the engines people use every day. Which engines can read Iceberg V3 tables with native geometry columns? Which can run spatial predicates correctly? And which can go all the way and use Iceberg’s geometry bounds to prune files automatically?

So we built a public testbed and ran the experiments.

From promise to implementation

The original promise was simple: combine GeoParquet’s native geometry typing at the file level with Iceberg’s table metadata, schema evolution, catalogs, and manifest-level statistics.

In theory, that gives us an open geospatial lakehouse stack where engines can understand geometry columns natively and use spatial bounds to skip irrelevant files.

In practice, each engine has to implement several layers correctly:

read the Iceberg V3 table metadata;
understand the native geometry and geography Iceberg types;
read the corresponding GeoParquet 2.0 geometry encoding;
execute spatial predicates correctly;
and use Iceberg’s geometry bounds for file pruning.

That is why we evaluated support as a ladder, from basic table readability to full spatial pruning.

And geometry is only one row in a much larger story. The Iceberg Matrix does a great job of capturing just how rich, and how uneven, the broader Iceberg ecosystem has become, tracking feature support across dozens of engines, from row-level operations and partitioning to catalogs and the new V3 data types.

The Iceberg Matrix: a feature-by-engine compatibility table covering row-level operations, partitioning, table management, V3 data types, and catalog support across AWS, GCP, Azure, Databricks, Snowflake, and third-party engines. — Iceberg feature support across engines. Source: icebergmatrix.org. Our testbed zooms into a single cell of this matrix: the **Geometry / Geo Types** row under V3 data types. It asks not just "supported or not," but whether spatial predicates and metadata pruning actually work end to end.

This post focuses on that one row, because native geometry is where the long-standing gap between GIS and the lakehouse finally closes, or doesn’t.

What we built

We created iceberg-geo-testbed, a small public repository with reproducible geospatial Iceberg fixtures and per-engine probes.

The testbed includes:

a static Iceberg REST catalog (more on that soon) exposing both V2 and V3 tables on public GCS and S3;
GeoParquet 2.0-typed Parquet files as the data layer, with geometry stored using Parquet’s native Geometry(crs=) logical type rather than plain BINARY;
per-engine runners that probe each table the way a real user would: open it, scan it, run a spatial predicate, and check whether pruning happens.

Each fixture contains the same data: 10 geographically disjoint regions, with 1,000 points per region, for a total of 10,000 rows across 10 Parquet files.

A California-window predicate gives us a clean signal. An engine that fully understands the table and uses the manifest geometry bounds should narrow the scan to 1 file out of 10.

We graded each engine on a simple ladder:

L0 — the table cannot be read at all;
L1 — a full scan works;
L2 — spatial predicates return the correct rows;
L3 — file-level pruning fires, and the manifest narrows the scan.

The short version

The architecture works.

Snowflake proves it. Native Iceberg V3 geometry can be read, queried, and pruned correctly using manifest geometry bounds. That is the end-to-end behavior the specification was designed for.

Databricks delivers it too. On Databricks Runtime 18.2 (Apache Spark 4.1), native Iceberg geometry reads, queries, and prunes end-to-end as well, so there are now two engines with the complete implementation.

DuckDB is very close. It can read the geometry and run spatial predicates with an in-flight fix, but it does not yet prune files using the geometry bounds.

Most other engines are not blocked on Iceberg V3 itself. They can read V3 tables without geometry columns. What they lack is support for the new geometry and geography types.

That distinction matters. The ecosystem is waiting for engines to finish implementing the spatial part of the standard, not for the table format itself.

Where V3 native geometry stands today

Engine	V3 native geometry status
Snowflake	L3 — works end-to-end. Managed and externally-written paths both deliver correct spatial predicates and manifest geometry-bound pruning. Snowflake is the clear leader today.
DuckDB 1.5.3	L2 — reads correctly and spatial predicates work with PR #1013. File pruning is the next step.
Databricks (Runtime 18.2)	L3 — works end-to-end. Native managed Iceberg `GEOMETRY`/`GEOGRAPHY` on Databricks Runtime 18.2 (Spark 4.1): correct spatial predicates and manifest geometry-bound pruning. See the Databricks section below.
BigQuery / BigLake	L0 for geometry. BigQuery can read non-geometry V3 fixtures, but rejects the native Iceberg geometry type.
Dataproc Serverless 2.3	L0 for geometry. V3 tables without geometry work; native geometry currently fails through the Spark Iceberg runtime.
EMR Serverless 7.13	L0 for geometry. Same Spark Iceberg runtime gap as Dataproc.
Sedona + Iceberg-Spark 1.7.1	L0 for geometry. The same upstream Spark gap blocks reads, and the write path is also missing the required type mapping.
Oracle ADB 26ai	L0. Oracle’s Iceberg reader could not read any of the Iceberg tables we tested, V2 or V3, so the V3 geometry question is blocked upstream.

The detail and per-capability breakdowns live in STATUS_V3.md.

Snowflake: the first complete implementation

Snowflake is the current leader.

It delivers Iceberg V3 geometry end to end, both on its managed write path and on externally-written V3 tables. We verified the whole stack:

spatial predicates return the correct rows;
manifest geometry-bound pruning fires;
a California-window predicate scans roughly 1/10 of the bytes of a full geometry-column scan on our 10-file fixture;
externally-written V3 tables work as long as the writer is compliant with the V3 specification.

That last point is important. Snowflake did not require a Snowflake-specific table shape. The unmanaged read path worked once the Iceberg metadata was V3-compliant: the snapshot block needed first-row-id and added-rows, and the manifest needed populated value_counts, null_value_counts, and ID-column bounds.

There are a couple of practical notes if you try this yourself.

First, for Snowflake-managed Iceberg tables, you need to pass ICEBERG_VERSION=3 explicitly. V2 remains the default for new Iceberg tables, and the error message does not make the missing opt-in especially obvious.

Second, for spatial queries, use TO_GEOMETRY(wkt, 4326) rather than bare TO_GEOMETRY(wkt). The geometry column is SRID 4326. A bare envelope defaults to SRID 0 and the predicate fails with an incompatible SRID error.

But those are small operational details. The main finding is much bigger: Snowflake proves the architecture works.

DuckDB: very close

DuckDB is the most encouraging open-source result.

DuckDB 1.5.3 already reads V3 geometry correctly, and support is progressing quickly. We identified a remaining issue around spatial predicates and manifest geometry bounds, and the DuckDB maintainers responded rapidly with an upstream fix proposal in duckdb-iceberg#1002 and PR #1013.

The current gaps look incremental rather than architectural, and DuckDB appears very close to full end-to-end V3 geometry support.

The PR deliberately skips decoding the geometry bound, so DuckDB falls back to a full scan rather than pruning on it. That puts DuckDB at L2.

Closing the gap to L3 requires one more step: decoding the V3 packed_xy_le manifest bound and feeding it into the file-pruning predicate. The encoding is documented in the repo, and the path from L2 to L3 looks relatively short.

BigQuery: V3 is there, geometry is not

One of the most important findings is that several engines do support Iceberg V3, just not V3 geometry yet.

To check this, we built a minimal V3 fixture with no geometry columns: an id string and an integer field. BigQuery, Dataproc Serverless 2.3, and EMR Serverless 7.13 all read that table cleanly and returned the expected 10,000 rows.

That tells us something useful. These engines are not missing Iceberg V3 entirely. The specific implementation gap is the native geometry type.

BigQuery’s failure mode is precise: it rejects the native V3 geometry type at external table creation time:

Unknown Iceberg type "geometry(OGC:CRS84)"

The same external table pattern works on the non-geometry V3 fixture, confirming that the V3 reader exists. The missing piece is the geometry type implementation.

That is a more useful conclusion than saying BigQuery “does not support V3.” It does. It just does not yet support the spatial part of V3.

Spark and EMR: the same upstream runtime gap

The Spark-based engines tell a similar story, but with a shared root cause. On the geometry fixture, they fail with:

UnsupportedOperationException: Cannot convert unknown type to Spark: geometry

This affects Sedona with Iceberg-Spark 1.7.1, Dataproc Serverless 2.3, and EMR Serverless 7.13. The gap appears to be upstream in iceberg-spark-runtime. One implementation there would move multiple engines forward at once. That is exactly the kind of leverage point that makes the spatial part of the standard worth finishing.

Databricks: full support available now

Databricks now delivers native Iceberg V3 geometry end to end.

On a Databricks Runtime 18.2 (Apache Spark 4.1) cluster, we created a native managed Iceberg table with a GEOMETRY(4326) column and walked the full ladder:

the table writes;
ST_AsText returns clean points;
a California-window ST_Intersects predicate returns the correct rows;
manifest geometry-bound pruning fires, narrowing the scan to 1 file out of 10.

That is L3, the complete end-to-end behavior. GEOMETRY(4326) and GEOGRAPHY(4326) both work, and a projected SRID such as GEOMETRY(3857) is preserved as well. (One small detail: bare GEOMETRY is not a valid column type — the (SRID) parameter is required.)

This is not limited to all-purpose clusters. On a serverless SQL warehouse running a recent DBSQL channel, we also created and wrote a GEOMETRY(4326) Iceberg V3 table successfully. The capability tracks the runtime version rather than the compute type, so if you do not see it yet, check that your cluster or warehouse is on a recent enough runtime.

Under the hood, Databricks delivers this through its Delta plus Iceberg (UniForm) layer rather than a separate engine. That is why our earlier testing, on an older runtime, saw the Iceberg-compatible writer reject the spatial types: geometry existed in Delta but did not cross into Iceberg. On Runtime 18.2 that boundary is resolved. The geometry types now flow through to Iceberg, and external engines reading the table see a native V3 geometry column.

With this, there are two engines, Snowflake and Databricks, delivering the full Iceberg V3 geometry story today.

Oracle: blocked before the V3 question

In our testing, we were not able to get Oracle’s Iceberg reader to successfully read any of the Iceberg tables we pointed it at: our tables, Snowflake-produced tables, V2 tables, or V3 tables. The failure was the same across them:

ORA-20000: Failed to generate column list

We spent time checking storage access, producer differences, metrics configuration, and authentication, but we could not isolate the issue further. It is possible that we missed some Oracle-specific requirement or configuration detail, so we do not want to overstate the conclusion here. What we can say confidently is that, with the configurations and fixtures we tested, we were unable to get baseline Iceberg table reads working, which means we could not meaningfully evaluate Oracle’s V3 geometry support.

What about Iceberg V2?

Iceberg V3 is the destination, but many users need something that works across engines today.

For that, we have been using a V2 convention that we describe as GeoIceberg V2: flat double bbox columns, a WKB geometry column, and a geo table property.

It is not as clean as native V3 geometry, but it has a very practical advantage: it works on every engine that reads Iceberg V2, and it gets file-level pruning for free because standard Iceberg writers already record manifest bounds on double columns.

In other words:

V3 is the right long-term model;
V2 with explicit bbox columns is the bridge for broad compatibility today.

That bridge matters because users should not have to choose between open table formats and spatial performance while the ecosystem catches up.

Interoperability across engines

Single-engine support is only half the promise. The other half is interoperability: can one engine read a table that another engine wrote or governs? We tested this across Snowflake, Databricks, and DuckDB. The picture is encouraging, with one important caveat we will come to.

Reading across the warehouses

Between Snowflake and Databricks, in both directions, reads work, each through the platform’s governed integration path:

Snowflake reads an externally-written V3 table. As long as the writer produces V3-spec-compliant metadata, Snowflake reads it through an object-store catalog integration plus an external volume, with native spatial predicates and pruning.
Databricks reads a Snowflake V3 table. Through query federation, Databricks pulls the data over JDBC; the native geometry column comes across as GeoJSON text and re-parses cleanly into a geometry on the Databricks side. The deeper, Iceberg-native catalog-federation path (reading the files directly from storage) is also supported, though in our Snowflake-on-GCP setup it falls back to JDBC because of a storage-scheme mismatch (gcs:// versus gs://).
Snowflake reads a Databricks Unity Catalog table. Using Snowflake’s catalog integration for the Unity Catalog Iceberg REST endpoint, Snowflake authenticated, discovered the namespaces and tables, and obtained vended storage credentials, end to end. In our specific test the final file read was blocked by a storage-permission scope on the free-tier workspace we used; on customer-managed storage that last step completes.

The headline is that the open-catalog and credential-vending machinery between two major warehouses actually works, and geometry rides along: a native V3 geometry column written by one engine is visible to the other.

Open catalogs are still partly gated

There is one recurring friction point, and it is about catalog consumption rather than geometry.

Databricks does not consume a generic, open Iceberg REST catalog. There is no generic Iceberg REST connector in Unity Catalog (CREATE CONNECTION TYPE iceberg returns CONNECTION_TYPE_NOT_SUPPORTED); only named partners such as Glue, Hive Metastore, Snowflake, or Databricks itself can back a foreign Iceberg table. We also tried the open-source escape hatch, a non-Unity cluster with the Iceberg Spark runtime installed, and the generic catalog plugin still did not surface. By contrast, DuckDB reads the very same static catalog, served as plain files on a public bucket, with a single anonymous command.

This matters for open data. If open Iceberg datasets are going to be as easy to share as Parquet files, generic catalog access needs to become more flexible across the warehouses. We will cover the catalog side in more depth separately, including the small Cloudflare Worker we built to bridge static catalogs to engines that insist on an authenticated endpoint.

It always comes back to the CRS

The most important interoperability finding is not about formats or catalogs. It is about the coordinate reference system.

For the overwhelmingly common case, WGS84 longitude/latitude, everything interoperates. Snowflake, Databricks, and DuckDB all converge on a bare geometry type, which the Iceberg V3 specification defines as the default OGC:CRS84. Same coordinates, same axis order, same results.

The divergence appears with projected coordinate reference systems. For the identical CRS, say Web Mercator, the engines write different identifier strings into the Iceberg type:

Snowflake writes geometry(srid:3857);
Databricks writes geometry(EPSG:3857), using authority-qualified identifiers (EPSG, ESRI, or OGC) keyed to the SRID;
DuckDB stores whatever string it reads, verbatim.

Both forms are “valid,” because the specification treats the CRS as an opaque string and does not mandate a single way to represent it. The result is predictable: two production warehouses already disagree on how to name the same projected CRS. A reader keyed to one scheme will not recognize the other, and any cross-engine CRS-equality check or reprojection that relies on string matching will treat srid:3857 and EPSG:3857 as different systems, even though they are identical.

This is exactly the kind of gap that, left unaddressed, quietly fragments an otherwise-open ecosystem, the same way inconsistent metadata fragmented early geospatial formats before GeoParquet standardized it.

So we are going to work on consistency. The landscape today is genuinely fragmented: a CRS can be named as an authority code (EPSG:3857), as an SRID (srid:3857), as OGC:CRS84, or as a full coordinate-system description. OGC’s current published standard for the latter is WKT-CRS, and OGC is also developing a JSON encoding for CRS, though that does not yet appear to be a published standard. We are engaging through OGC to help reduce this divergence, and we expect to put forward a recommendation as soon as possible, so that engines converge on a consistent way to represent a coordinate reference system in Iceberg and GeoParquet. Native geometry in the lakehouse only delivers on its promise if a coordinate reference system means the same thing everywhere.

Recommendations

If you are choosing a geospatial format today, the practical answer depends on your target engines.

Files on disk: use GeoParquet 2.0

For standalone files, GeoParquet 2.0 is the right target. It gives you native Parquet geometry typing and broad readability. If you are publishing spatial files, use it. And if you want an easy way to inspect, validate, and explore GeoParquet datasets, geoparquet.io is a great companion tool. CARTO support for GeoParquet 2.0 is also already available in beta, making it easier to work with these datasets directly in cloud-native spatial workflows.

Tables queried through Snowflake: use Iceberg V3

If your table will be queried through Snowflake, Iceberg V3 geometry works today. Snowflake delivers the full story: native geometry, correct spatial predicates, and manifest-based pruning.

Tables that need broad engine support today: target Iceberg V3

A few months ago, the practical recommendation for broad interoperability would probably have been to stay on a V2 bridge pattern: WKB geometry, explicit bbox columns, and custom metadata.

At this point, the ecosystem momentum is clearly behind native Iceberg V3 geometry.

Snowflake and Databricks both deliver the full end-to-end implementation today. DuckDB is extremely close, with spatial predicates already working and geometry-bound pruning likely not far behind.

That changes the calculus. If you are starting new geospatial lakehouse projects today, especially long-lived ones, it increasingly makes sense to target V3 directly rather than investing in transitional V2 conventions.

There will still be environments where the V2 bridge remains useful for compatibility reasons, particularly in Spark-based deployments waiting on upstream Iceberg runtime support. But the center of gravity has shifted toward native geometry support.

Long-lived tables: plan for V3

If you are building tables that will live for years, V3 should now be considered the default strategic direction.

The architecture is proven. Snowflake and Databricks both demonstrate the complete model today. DuckDB is rapidly converging on full support. With two major warehouses already there, ecosystem adoption is accelerating.

The remaining gaps are increasingly implementation details rather than open design questions.

I, Javier de la Torre, will also be at the Data + AI Summit in San Francisco. If you are working on Iceberg, GeoParquet, spatial analytics, or open geospatial lakehouse architectures and want to discuss any of this in person, feel free to reach out.

Why this matters

The work on GeoParquet 2.0 and Iceberg V3 set out to remove one of the long-standing barriers between GIS and modern cloud analytics.

For non-spatial data, the lakehouse architecture already gives us open formats, remote catalogs, metadata-based pruning, schema evolution, and broad interoperability across engines. Geospatial data should not require a parallel stack, a proprietary index, or bespoke integration work every time a user wants to ask a spatial question.

That is why native geometry support in Parquet and Iceberg matters.

The good news is that the architecture is now proven. Snowflake and Databricks both show that the full path can work: native geometry columns, spatial predicates, and file-level pruning from Iceberg metadata. DuckDB shows that open-source engines are close behind.

The less good news is that support remains uneven. Several engines can read Iceberg V3 tables, but fail specifically when the table contains the new geometry type.

That means the bottleneck has shifted.

A year ago, the question was whether the open lakehouse stack had the right model for geospatial data.

Today, the answer is yes.

The question now is how quickly the ecosystem can implement it.

Try it yourself

Everything in this post is reproducible from the public repository. The fixtures live on a public GCS bucket, and you can point any engine that understands static Iceberg metadata at them.

See the repository for the latest fixtures, engine runners, and reproduction instructions:

https://github.com/jatorre/iceberg-geo-testbed

The status matrices stay current in STATUS_V3.md.

If you maintain an engine and want to flip a cell from L0 to L2 or L3, the relevant pieces are well isolated. Geometry type support, bbox derivation, and the V3 geometry-bound deserializer are relatively contained changes with a large ecosystem payoff.

The standard is here.

Now comes the implementation work.

At CARTO, we are also actively working to bring these standards into production workflows: expanding support for GeoParquet 2.0 across our platform and advancing support for Iceberg V3 catalogs with geospatial data. We expect to share more updates soon as these capabilities mature and become available to users.

And if you are a developer who cares about open geospatial standards, native geometry in the lakehouse, or the engine-level work described in this post, we would love to hear from you. Come build this with us.