Announcing Columnar
TL;DR
Today's dominant data connectivity standards are grossly inefficient and fundamentally incapable of meeting emerging needs. We founded Columnar to solve this. With $4M in seed funding and a team of core Apache Arrow developers, we're launching today with new ADBC drivers, a driver installer CLI, and an open source collaboration with Databricks, dbt Labs, Improving, Microsoft, and Snowflake.
You're a data engineer on a mission. Your objective is to travel 30 years back in time and execute a SQL query that will save the world. Your DeLorean time machine lands in October 1995, twin trails of fire lighting up the night. You step out in a suburban office park and break in through an unlocked window. Sitting down at an oak laminate computer desk, you boot up Windows 95 on a boxy beige tower and dial into the network, holding your breath as the modem lets out a chittering shriek. Gazing into the cathode-ray tube, you search for where to run your world-saving SELECT statement.
What you see bears almost no resemblance to the tools you use as a data engineer in 2025. Python? Unheard of outside academic and hobbyist circles. In a pinch, you could use Java, but in 1995, the first version is only in beta. Even the earliest column-oriented databases exist only as university research projects. You start to sweat. Then you spot it—a thick paperback on the shelf above the monitor: Inside ODBC: The Developer's Guide to the Industry Standard for Database Connectivity. Yes—ODBC is one thing that's barely changed in the ensuing 30 years.
You launch the ODBC Data Source Administrator and find the DSN you're looking for. Feeling powerful, you open Microsoft Query. This looks familiar—it still ships with Excel in 2025. You select the DSN, open the SQL dialog, key in your SELECT statement, and press Execute. Nervously waiting for the query to finish, you grab a 3.5-inch floppy and insert it into the drive. When the query finishes, you move the file to the disk, eject it, pocket it, shut down the computer, and return to the DeLorean.
Back in 2025, having saved the world, you're hailed a hero. But something is gnawing at you—a question you can't answer: Why, in 2025, are we still using ODBC?
It's a great question. In this improbable time-travel scenario, ODBC's 30-year legacy of dominance, in spite of its ossification, was a boon. But in most other respects, it's a bane. And we don't mean to pick on ODBC; the same is true of JDBC (introduced in 1996) and Python's DB-API 2.0 (1999). These vintage row-oriented connectivity standards went from being on point in their era to being a chokepoint today.
Why? In the intervening years, data volumes exploded by more than ten thousand times. Average network speeds increased by a similar multiple. Column-oriented technologies upended storage and processing, enabling sub-second analytical queries on data volumes unimaginable in 1995. Real-time pipelines drove new requirements for low latency and data freshness. Meanwhile, data analytics evolved from an obscure back-office function into the engine of modern business operations and the backbone of global-scale customer-facing applications—and now AI applications.
But standards are sticky. Long after they become antiquated and disliked, they can endure, kept in orbit by network effects. Technically superior challengers arise and find a niche among early adopters, but rarely escape the cold start problem. So it's a big deal when a new challenger reaches escape velocity.
Enter Apache Arrow. In the early days, it was not obvious that Arrow would eventually offer an alternative to the incumbent data connectivity standards. Created in 2016 by a group of open source developers led by Wes McKinney and Jacques Nadeau, Arrow initially provided a binary format and low-level toolkit that enabled fast columnar interchange between modules written in different languages. Emerging at a critical moment when developers across the industry were struggling with low-level data interoperability and performance bottlenecks, Arrow became a smash hit, implemented in almost every major language and used inside virtually every data stack. Its Python library alone is on track to be downloaded more than 2.5 billion times this year. A group of visionary companies fueled Arrow's growth, including Bloomberg, Dremio, G-Research, InfluxData, NVIDIA, Posit, Two Sigma, and Voltron Data.
As Arrow became a fixture in lower-level data infrastructure, it climbed upward, solving progressively higher-level problems. By early 2022, a group of us working at Voltron Data recognized that a new Arrow-based standard was needed in the connectivity layer, and ADBC was born.
ADBC is a modern alternative to ODBC and JDBC for analytic applications. It's a connectivity standard that delivers data in Arrow columnar format instead of a slow row-oriented format. It carries forward the strengths of the legacy standards while correcting their anachronisms. Within months of its introduction, ADBC was picked up by Snowflake and DuckDB, reducing query result retrieval times by more than 90 percent in many applications. Microsoft, Databricks, SDF (now part of dbt Labs), and others soon joined in.
But like many innovations, Arrow's ascent into the connectivity layer has advanced in fits and starts, facing challenges along the way and leaving opportunities untapped. To illustrate this, we turn to the second chapter in our story.
You (our time-traveling data engineer) are back at work in the present day—but you can't shake the question about ODBC's longevity. The more you consider it, the more baffling it seems. In the platform your team uses, most data sources are column-oriented, and so are most query destinations. Yet in between, ODBC and other legacy standards can't transfer results in a columnar format. Instead, they burn processor cycles converting columns into rows, only for you to burn more cycles converting those rows back into columns.
Looking for a way to avoid this waste of time and energy, you come across a Hacker News post about ADBC, a modern data connectivity standard that keeps the data in columnar format. You learn that it's built into the latest release of Microsoft Power BI. You try it out. For the handful of connectors that use ADBC (BigQuery, Databricks, Dremio, Snowflake), results load lightning fast. But you're frustrated that other connectors are still built on ODBC. It's a similar story with the new dbt Fusion engine that your team is upgrading to: blistering speed with the adapters that are built on ADBC, but only a few are available.
The underlying problem, you discover, is that each database, query engine, and cloud platform needs its own ADBC driver, and only a few developers are currently building them. But you're in luck: drivers exist for the platforms you rely on. Itching to speed up some data pipeline code, you dive deeper. Downloading, installing, loading, and configuring multiple ADBC drivers in different languages, you hit several stumbling blocks. Finally getting it to work, the payoff is worth it, but it shouldn't be so hard.
Meanwhile, your team is going gaga over DuckDB. For some queries that traditionally ran on a cloud data warehouse, DuckDB can run them faster and cheaper. And it works great with Arrow and ADBC. With open table formats like Iceberg now unifying the bottom of the stack, a multi-engine architecture with Arrow and ADBC unifying the top of the stack seems like the future. But the reality today falls short of this vision.
Tinkering with an AI side project, you're struck by the unrealized potential of Arrow in this space, too. Connecting your model to a database with MCP, you learn that the protocol uses text-based JSON-RPC for all communication. Underwhelmed by its sluggish performance, you search for how to pass data through MCP in Arrow's binary columnar format. You find out that base64 encoding (a woefully inefficient standard introduced more than 30 years ago) is a recommended approach—and you're flashing back to the 90s again.
This story isn't finished. Finding ourselves engrossed by these problems and energized by the opportunity to solve them, David Li, Matt Topol, and I founded Columnar, with Wes McKinney joining us as an advisor. Columnar is writing the next chapter of data connectivity, with speed, simplicity, and security on every page.
Today we're taking a big first step by releasing dbc, an open source command-line tool for installing and managing ADBC drivers. dbc is powered by our driver registry, a global CDN serving signed and notarized pre-built drivers for all major platforms. dbc aims to make ADBC easy as 1-2-3.
Today we're also releasing four new ADBC drivers—for Amazon Redshift, MySQL, Microsoft SQL Server, and Trino—all installable with dbc. But we have more work to do to make ADBC drivers available for more of the databases, query engines, and data platforms you use. We can't fully close this gap without the cooperation of other vendors and open source projects. So we're pleased to announce the ADBC Driver Foundry, a collaboration with Databricks, dbt Labs, Improving, Microsoft, Snowflake, and the Apache Arrow developer community that aims to increase the availability of safe, high-quality, open source ADBC drivers. We're working to make the Driver Foundry the best and easiest place to build, maintain, test, and release ADBC drivers. We're grateful for the commitment from our early collaborators and eager to welcome more driver developers.
To pursue our vision of a modern connectivity layer, Columnar has raised a $4M seed round led by Bessemer Venture Partners with participation from Breakers, K5 Tokyo Black, Next Play Ventures, and Composed Ventures. Joining the round are a group of angel investors with deep experience in data infrastructure, including Olivier Pomel and Julien Le Dem (Datadog), Tristan Handy and Lukas Schulte (dbt Labs), Caitlin Colgrove (Hex), Joshua Bloom (UC Berkeley), Matthaus Krzykowski (dltHub), Swaroop Jagadish (DataHub), Paul Dix (InfluxData), Scott Breitenother (Kilo Code), and Will Manning (Spiral).
With our work on ADBC as a foundation, we look forward to sharing more about what Columnar is building next. As frontier AI research shifts focus from scaling to data access and data quality, efficient connectivity is becoming a new driver of progress. As the storage layer in enterprise data stacks coalesces around cloud object storage, columnar file formats, and open table formats, the nexus of value is shifting to the connectivity layer. As the Cambrian explosion of new analytic query engines collides with the pressures of platform consolidation, the connectivity layer—stuck too long in the 90s—needs to jump back to the future.
Next steps