A Deep Dive into ADBC Driver Optimization
Recently we announced new releases of the ADBC drivers for four database systems: Google BigQuery, Microsoft SQL Server, MySQL, and Trino. These updates were focused on driver performance. ADBC, along with companion tools like dbc, bring several advantages to users and application developers: a consistent interface across multiple databases, easy installation of drivers for all platforms, and Arrow-native design for interoperability with modern data platforms. Just as importantly, of course, is that ADBC delivers best-in-class performance, enabled both by its Arrow-native architecture and by the ongoing optimization work of our contributor community.
Let's look at each driver in turn and the optimizations we made to improve its performance.
BigQuery: one weird trick
We were surprised when our friends at Bruin told us that the ADBC driver for BigQuery was slower than the python-bigquery-sqlalchemy package. After all, BigQuery is the only system of the four we'll look at here that is Arrow-native: through the Storage Read API, clients can fetch Arrow data directly, and in parallel. We weren't transforming or processing the result data, just making requests to BigQuery and handing the result to the user, so how could there be a performance issue?
First, we had to replicate this, so we added BigQuery to our benchmark suite. Indeed, the ADBC driver was slower than the Python package. On a sample dataset of about 6 million rows, we found that it took 95.2 seconds for the ADBC driver to read the dataset, vs 56.4 seconds for the Python package. To make sure we weren't doing something suboptimal in the ADBC driver itself, we benchmarked a small, pure-Go program that directly used Google's Go client (which the ADBC driver wraps) to read the same dataset. That was still slower than the Python code, at which point we suspected the Go client.
We compared the source code of the Go and Python clients, and at first thought it was due to differing request patterns. The Go code tries to be nice to the API server and limits concurrency based on the number of cores, while the Python code immediately spams out a request for every result stream. But aligning this and other behavior differences in Go with Python didn't seem to make a difference.
In the end, we noticed that the Python code explicitly enabled Arrow compression, but the Go code didn't. This is a feature of Arrow IPC where data buffers are individually compressed with LZ4 or Zstandard. The Python client always requests the BigQuery API server to LZ4-compress the data, while the Go client never does (and has no option to enable it). We patched the BigQuery Go client library to request ZSTD compression, and finally got performance on par with Python. Rerunning the comparison, now the ADBC driver takes 31.0 ± 2.1 seconds, vs 56.4 ± 1.3 seconds for the Python package.
Why didn't we use the Python library from the get-go, or even directly use gRPC to talk to the BigQuery API servers? Typically, we prefer to build ADBC drivers on top of existing, high-quality, and well-supported connector libraries or SDKs where available, so we didn't want to directly use gRPC. And we generally implement drivers in Go or Rust (or occasionally C++), since these toolchains let us build performant drivers that can be used within a variety of ecosystems, from Python to Java to R. While Python usually does have good connector libraries for various vendors, it's difficult to build the driver in Python and still have it usable by other ecosystems. We do benchmark against other Python connectors, however.
We've reported this to Google as googleapis/google-cloud-go#13742. In the meantime, we've updated the ADBC driver to use a patched client library, fixing the issue for users.
Microsoft SQL Server: a thousand cuts
MSSQL, as it's often called, is not Arrow-native. The ADBC driver has to convert rows to columns. We decided to compare ourselves to the alternatives and see how our driver stacked up. There are many competitors here, with all of C++, Go, and Rust represented:
- Turbodbc, a tried-and-true Python package using C++ to wrap ODBC drivers in a Python DB-API 2.0 interface with extensions to output Arrow data.
- arrow-odbc, a newer Rust library that also wraps ODBC drivers to output Arrow data, providing Python bindings on top.
- DuckDB MSSQL Extension ("DuckDB + mssql"), an even newer DuckDB extension in C++, with its own client that directly implements the SQL Server wire protocol, Tabular Data Stream ("TDS").
- mssql-python, a Python package from Microsoft. (Internally, it appears to wrap an ODBC driver manager and Microsoft's proprietary ODBC driver.) This library does not provide Arrow or columnar data, only rows, so the benchmark is the time to fetch the result set as a list of Python objects.
- The ADBC driver, which wraps the go-mssqldb client library officially maintained by Microsoft.
Note that Turbodbc and arrow-odbc aren't specific to SQL Server, and don't necessarily optimize for it in particular. We've tested both of them here with Microsoft's own proprietary ODBC driver for MSSQL (Linux; msodbcsql18 version 18.6.1.1-1). On the other hand, the ADBC driver and DuckDB + mssql directly use SQL Server's wire protocol, and both support features like bulk copy for improved performance.
Surprisingly, Turbodbc dominated the benchmark, despite being a generic adapter. And another puzzle soon arose: the ADBC driver was much faster than this first benchmark suggested, but only if used directly in Go, instead of being called from Python.
| Client | Query timing (no latency) |
| Turbodbc (5.1.2) | 3.0 ± 0.2s |
| arrow-odbc (9.3.3) | 4.5 ± 0.1s |
| DuckDB 1.4.4 + mssql 0.1.17 | 4.7 ± 0.4s |
| ADBC | 9.6 ± 0.1s |
| mssql-python[1] (1.2.0) | 23.9 ± 0.2s |
The first step this time was to profile the ADBC driver using standard Go tools. To do this, we wrote a small benchmark harness in pure Go to directly call the driver, at which point we noticed that the pure-Go program was unexpectedly much faster than the Python harness, suggesting issues in both the driver itself and the interop between the Go driver (compiled to a shared library) and applications calling it. We'll look at the profile first, then the Python interop issue.
Go footgun: accidentally copying large structs
The result parsing code deep in go-mssqldb was spending a lot of time in an internal Go function called runtime.duffcopy (click for full image):

The function parseRow is called for each row of data, and inside it, a two-value range was actually copying a struct on each loop iteration (in this case, for every column). Worse, the struct turned out to be quite large. Making rows * columns copies would probably kill performance![2] And later on in the function, there was also a method being called on the struct that took a value receiver instead of a pointer receiver, again copying the struct.
A small refactor to (1) only loop over indices and explicitly take pointers to the struct and (2) use a pointer receiver eliminated runtime.duffcopy from the profile and netted a significant performance benefit:
| Client | Query timing (no latency) |
| Turbodbc (5.1.2) | 3.0 ± 0.2s |
| arrow-odbc (9.3.3) | 4.5 ± 0.1s |
| DuckDB + mssql (1.4.4) | 4.7 ± 0.4s |
| ADBC (fixed copying) | 6.4 ± 0.2s |
| ADBC (original) | 9.6 ± 0.1s |
| mssql-python (1.2.0) | 23.9 ± 0.2s |
We submitted both of these improvements to go-mssqldb, where contributors from Microsoft quickly reviewed and merged them (microsoft/go-mssqldb#308, microsoft/go-mssqldb#316). Thanks to shueybubbles and dlevy-msft-sql for the quick review!
CGo footgun: goroutine spam
The vexing question was why the driver seemed to be slower when called from Python. To eliminate Python from the question, we wrote a small C++ program to run a query and iterate through the result, then profiled that with Linux perf:

With some investigation, we think what's happening here is that go-mssqldb spawns a ton of goroutines (one per row). The spawned goroutines are placed on a processor-local queue by default, and normally, they could be picked up by the same OS thread, avoiding scheduler overhead. But when we call a Go function through FFI, the Go callee gets locked to a single OS thread, so the new goroutines have to wait to be picked up by a different OS thread, incurring more scheduling overhead.
That would explain why the problem didn't happen with a pure-Go harness, as the calling goroutine would then not be locked to an OS thread. We hacked up the driver code to push all the work onto a background goroutine, leaving the calling thread to just pull results from a channel. This fixed the performance discrepancy. Turning that hack into an actual change, we can see that we've now ended up slightly behind Turbodbc and ahead of all other competitors:
| Client | Query timing (no latency) |
| Turbodbc (5.1.2) | 3.0 ± 0.2s |
| ADBC (fixed goroutine spam) | 3.3 ± 0.0s |
| arrow-odbc (9.3.3) | 4.5 ± 0.1s |
| DuckDB + mssql (1.4.4) | 4.7 ± 0.4s |
| ADBC (fixed copying) | 6.4 ± 0.2s |
| ADBC | 9.6 ± 0.1s |
| mssql-python (1.2.0) | 23.9 ± 0.2s |
We can actually replicate the goroutine issue in pure Go:
package main
import (
"flag"
"runtime"
)
func main() {
var lock bool
flag.BoolVar(&lock, "lock", false, "Lock the OS thread")
flag.Parse()
if lock {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
}
done := make(chan struct{}, 1)
for i := range 500000 {
go func(v int) {
done <- struct{}{}
}(i)
<-done
}
}
Using perf, we can observe the same issues with runtime.schedule and runtime.park_m that we see above.

While the number of goroutines spawned is probably unusual here, this is something we've learned and will keep in mind when developing in Go.
Unsolved problem: UTF-16 to UTF-8 conversions
If you've looked at the performance trace above, there's one last problem that may have stood out immediately. But unfortunately, this is where our optimizations end, at least for now.
A good chunk of time is spent converting strings from UTF-16 (as NVARCHAR is encoded on the wire) to UTF-8 (as required by Arrow). And then more time is spent copying data around and allocating strings, causing more garbage collection pressure, which is also quite visible in the profile. This happens because the go-mssqldb library first creates Go values which we then append to Arrow arrays. The ideal solution would be to (1) implement a fast UTF-16 to UTF-8 conversion that (2) directly decodes from wire format into Arrow data buffers to avoid allocating intermediate Go values. This would require significantly hacking up go-mssqldb, however; something we will consider, but which may not be palatable to upstream. While not shown in this profile, our benchmarks also show that NVARCHAR(MAX) columns are much slower than fixed-width string columns, which is another point for us to investigate.
Out of curiosity, we looked at what Turbodbc does here, and it turns out to use simdutf. We haven't profiled it to confirm, but we suspect that this is faster than the pure-Go conversion code in go-mssqldb. It's not so straightforward to use simdutf in Go, unfortunately, as the overhead of CGo calls may swamp any performance benefit. But now that Go 1.26 has introduced native Go APIs for platform-specific SIMD, we could consider the possibility of porting it to pure Go.
In the end, we were able to significantly improve performance, even compared to alternatives that implement their own TDS client. But it turns out that the ODBC driver can have great performance in this case. We think the gap can be closed, once we get the chance to invest more into the driver and client library.
MySQL and Trino
The performance story for the MySQL and Trino drivers centered on bulk ingest. Both drivers were inserting data row by row, which is not ideal, so we implemented a standard optimization: putting multiple rows in each INSERT. (For example, pandas implements this in to_sql with method="multi".)
For MySQL, this was straightforward. We prepare a SQL statement with multiple rows' worth of placeholders and execute this statement with batches of Arrow data. To measure the impact, we benchmarked the driver by ingesting various scales of the TPC-H lineitem table before and after the optimization. Before the optimization, it took 455 seconds to ingest 600k rows, but after the optimization, it now takes 7 seconds.
When it came to Trino, some fun surprises were in store for us. We could not just batch by row count since Trino enforces a default query size limit of 1MB, and Trino counts the size of serialized parameters towards this limit. In other words, the parameters are effectively "stringified" into the SQL statement itself, and so we can't just use a fixed number of rows on each call, but have to vary the statement based on the actual values being inserted.
A static row count would still work perfectly for fixed-width types like small integers, but would error the moment a user tried to ingest a batch containing large strings or complex objects that pushed the total SQL text over that 1MB ceiling. To solve this, we implemented a size-aware strategy, packing a variable number of rows into a newly generated SQL string for each insert that is "just right" to maximize throughput, but strictly under Trino's limit. The benchmarking results show that 60k rows previously took 1834 seconds, but now takes 60 seconds after optimization.
Wrapping up
Our journey to optimize these drivers underscored a frustrating reality of modern data infrastructure: we often spend more time formatting and converting data than actually moving it. If these databases directly supported Arrow, much of that overhead would disappear. We could ingest columnar data into the database as-is, without having to worry about things like the size of data values when serialized to strings. We could take data buffers directly from application memory and send them over the network, and the database would receive those same bytes and interpret them directly without having to parse strings. No conversion, no formatting, and no CPU cycles wasted on either side. That's the promise of Arrow-native interoperability, and we can't wait for the ecosystem to catch up. In the meantime, ADBC abstracts away these details for you, and we'll keep working to deliver the best performance and the easiest experience across all of these databases.
Next steps
Footnotes
- This library does not produce Arrow data. The time is only the time to generate a Python list of the result set.
- We were surprised that the Go compiler couldn't elide the copy, but this is part of the language semantics.