Zero-copy, zero contest
Columnar formats have become the standard for analytics, and Apache Arrow has become the standard columnar format for data in motion. Systems from BigQuery to DuckDB output Arrow natively; pandas, Polars, and dbt Fusion are built around Arrow; and Hugging Face, Apache Parquet, and Apache Iceberg use Arrow too. ADBC (Arrow Database Connectivity) connects these systems, providing fast, Arrow-native access to your database or data warehouse.
First released in 2022, ADBC works like JDBC and ODBC, providing standard APIs and an ecosystem of drivers for interacting with data systems. These drivers enable you to execute queries, ingest data, browse catalogs, and more. But unlike those old row-oriented standards, ADBC is Arrow-native and designed for full cross-language interoperability.
The guiding principle behind Arrow is that the fastest way to get something done is to not do it. Arrow’s binary format is the same in memory, on disk, and on the network, enabling zero-copy operations where data moves between systems without any intermediate conversions. ADBC applies this principle to database connectivity. When the database is Arrow-native, ADBC can pass query results straight through to your application as-is, whereas JDBC and ODBC require transposing columns to rows (and then back again).
The benchmarks
We’ve previously shown the kind of optimization effort needed to efficiently exchange Arrow columnar data with systems that don’t natively support Arrow—like MySQL and SQL Server. This time, we’ll compare ADBC with ODBC on two different Arrow-native systems: BigQuery and DuckDB. They represent the extremes: the former is a cloud data warehouse, where the cost of I/O may be expected to dominate. The latter is an embeddable analytics database, where any time spent converting data is time spent not running your query.
Note: This is not intended to compare performance across different data systems or evaluate performance of any single data system. The focus here is on different ways to connect to a particular system. We’ve omitted times in favor of ratios for this reason. We also wanted to test clients for more systems, but many vendors don’t allow benchmarking. So while we can’t run benchmarks on every system, the patterns observed with BigQuery and DuckDB provide a useful indication of what ADBC delivers more broadly.
The benchmarks below reflect the time required to query 6 million rows of data. (Lower is better.)
| Backend | ADBC | ODBC (turbodbc) |
ODBC (arrow-odbc) |
ODBC (pyodbc) |
Native | Native Library |
|---|---|---|---|---|---|---|
| BigQuery | 1× | 2.69× | 2.68× | 4.17× | 1.73× | sqlalchemy-bigquery |
| DuckDB | 1× | driver crashes[1] | driver crashes[1] | 1750×+ [2] | 1× | duckdb-python |
Code for these tests can be found on GitHub.
The takeaway: ADBC is faster. For BigQuery, the gap is moderate—around 2.7× with the best ODBC wrapper. This makes sense: network I/O dominates for a cloud warehouse, and a well-optimized ODBC client can convert data to Arrow concurrently with fetching. Even so, ADBC has a clear edge. For DuckDB, the advantage is enormous: we gave up on waiting for the ODBC client to even complete.
Beyond speed: developer experience
Not only is ADBC faster—dramatically so in the DuckDB case—it’s far easier to use. To set up the ODBC tests, we had to scour web pages for ODBC drivers, configure .ini files, and test three different language bindings to see which one would perform better. Turbodbc can deliver good performance, but it doesn’t provide Python wheels, so we had to build it from source. Both turbodbc and arrow-odbc have numerous tuning options that affect both performance and correctness (meaning that a misconfigured option can produce incorrect results). And we got crashes when using the DuckDB ODBC driver with turbodbc and arrow-odbc.
Meanwhile, ADBC and dbc are far easier. With one dbc command, you can install any of a dozen ADBC drivers on Linux, macOS, and Windows. Then, you can grab your favorite language’s ADBC library with a simple pip install or go get or cargo add or R install.packages command, or use integrations with pandas and Polars, without having to wrangle .ini files. ADBC provides an experience made for 2026, not 1992, and we’re working on widening that gap.
What about native client libraries?
We’ve also included figures for the “native” client libraries for these systems. When a vendor’s own library also outputs Arrow natively—as duckdb-python does—it can achieve the same zero-copy benefit as ADBC, which is why DuckDB’s native library matches ADBC at 1×. That isn’t guaranteed, however: vendor libraries don’t always take full advantage of their system’s Arrow support, as the BigQuery numbers show. Plus ADBC has advantages beyond just speed. ADBC gives you consistent APIs, like JDBC and ODBC did—despite their downsides, that core idea is still useful. And we’ve built an ecosystem of optimized and validated drivers that are ready for you to use. This means you have fewer architectural decisions to make, and your coding agents have fewer chances to get lost or confused. (And stay tuned for ways to help your agents use ADBC and dbc more easily.)
With ADBC, you get performance that matches or beats individual vendor client libraries, in a single, consistent Arrow-native interface. And compared to ODBC, the advantages in both speed and developer experience are clear. Download dbc and try it yourself: columnar.tech/dbc.
Footnotes
- With both turbodbc and arrow-odbc, the client crashed.
- After about 40 minutes, we gave up on the test.