We built an ADBC interface for COBOL
COBOL is not a legacy language. It is a production language. It underpins core systems in banking, insurance, government, and payments—environments where software is expected to outlive not just frameworks, but entire hardware generations. There is more COBOL running today than most people are comfortable thinking about.
Meanwhile, the data stack has moved in a very different direction. Columnar memory formats, vectorized execution, and efficient open standards like Apache Arrow and ADBC have made it possible to move analytical data between systems with far less overhead than traditional row-oriented APIs.
These two worlds do not usually meet on equal terms.
Most modernization efforts force a choice: preserve the legacy system or adopt modern infrastructure. In practice, that means translation layers, duplicated pipelines, or rewriting stable systems just to make them visible to newer tooling. This is expensive and fragile, and it rarely ends.
We took a different approach. With a little help from Claude Opus 4.6, we built an ADBC driver manager[1] for COBOL—a standards-based interface that lets COBOL programs connect to any ADBC driver (SQLite, DuckDB, Snowflake, PostgreSQL, BigQuery, and many more) and participate directly in modern analytical workflows.
Instead of treating COBOL systems as isolated endpoints that need extraction pipelines before analysis can begin, we exposed them to the same driver model already used by Go, Python, Rust and every other language with ADBC bindings. The code is on GitHub if you want to skip ahead and try it.
The surprising part was not getting it to work. The surprising part was how natural it felt.
Architecture
Three layers:
A thin C wrapper (adbc_wrapper.c) sits between ADBC’s C API and COBOL.[2] It allocates ADBC and Arrow structs on the heap, returns them to COBOL as opaque pointers, and exposes accessor functions using only types COBOL can represent natively:
| C type | COBOL type |
|---|---|
void* |
USAGE POINTER |
int32_t |
PIC S9(9) COMP-5 |
int64_t |
PIC S9(18) COMP-5 |
char* |
PIC X(n) |
double |
COMP-2 |
Shared copybooks define the data structures every program uses. The central one, adbc-context.cpy, holds four opaque pointers (database, connection, error, fetch buffer) and execution variables (status code, row count, error message). A condition name on the status field makes error handling read like English:
IF NOT ADBC-OK PERFORM ERR-EXIT END-IF
A library of reusable subprograms provides high-level operations: adbc-connect, adbc-ingest, adbc-query-and-print, adbc-query-and-fetch, adbc-execute, adbc-execute-params, adbc-begin, adbc-commit, adbc-rollback, adbc-get-table-types, adbc-list-tables, adbc-cleanup. Each one manages its own transient resources internally. The calling program only manages the long-lived connection context.
The result: a COBOL program never touches an Arrow struct. It defines data with ordinary COBOL constructs—OCCURS tables, PIC X fields, COMP-5 integers—and the C wrapper handles translation to and from Arrow’s columnar format.
Connecting to a database
Four lines:
CALL "adbc-connect"
USING ADBC-CONTEXT ADBC-EXEC-VARS
WS-DRIVER WS-OPTIONS
IF NOT ADBC-OK PERFORM ERR-EXIT END-IF
Where the driver and options are null-terminated string constants:
01 WS-DRIVER PIC X(10) VALUE Z"snowflake".
01 WS-OPTIONS PIC X(260) VALUE
"uri=snowflake://acme_etl:password@ykvmhp-jn72843"
& "/ACME_PROD/PUBLIC"
& "?warehouse=ANALYTICS_WH" & X"00".
Swap the driver name and connection string and you’re talking to a different database. Behind the call, adbc-connect allocates the handles, sets the driver, parses options, loads the driver plugin, and opens the connection.
Driver installation is handled by dbc, a package manager for ADBC drivers:
dbc install sqlite
dbc install duckdb
dbc install snowflake
No manual downloads, no shared library paths, no configuration files. dbc resolves the correct driver binary for your platform and places it where the ADBC driver manager expects to find it.
Ingesting data
COBOL programs define structured data using OCCURS tables—fixed-layout arrays of records with explicitly typed fields. Here is one holding a few well-known board games—eight columns covering names, inventors, years, player counts, and prices, plus a one-byte null marker for the nullable price field:
01 GAME-COUNT PIC 9(4) COMP-5 VALUE 0.
01 GAMES-TABLE.
05 GAME-ROW OCCURS 1 TO 100
DEPENDING ON GAME-COUNT.
10 GAME-ID PIC S9(18) COMP-5.
10 GAME-NAME PIC X(20).
10 GAME-INVENTOR PIC X(25).
10 GAME-YEAR PIC S9(18) COMP-5.
10 GAME-MIN-AGE PIC S9(18) COMP-5.
10 GAME-MIN-PLAYERS PIC S9(18) COMP-5.
10 GAME-MAX-PLAYERS PIC S9(18) COMP-5.
10 GAME-LIST-PRICE-NULL PIC X.
10 GAME-LIST-PRICE COMP-2.
A column spec string tells the C wrapper how to interpret this memory:
01 COL-SPEC PIC X(100) VALUE
"id:l,name:u20,inventor:u25,year:l,"
& "min_age:l,min_players:l,"
& "max_players:l,list_price:?g" & X"00".
Type codes: l = int64, i = int32, g = double, f = float, uN = UTF-8 string of N bytes. Prefixing a type with ? makes it nullable, with "1" meaning NULL and "0" meaning non-null in the preceding one-byte marker.[3]
Then to ingest, call adbc-ingest:
CALL "adbc-ingest"
USING ADBC-CONTEXT ADBC-EXEC-VARS
INGEST-TABLE COL-SPEC
GAMES-TABLE
IF NOT ADBC-OK PERFORM ERR-EXIT END-IF
The wrapper parses the spec, computes the row stride, and builds Arrow arrays that point into the COBOL table’s memory. For integers and doubles, the Arrow buffers reference the COBOL data directly—no copy. For strings, it trims COBOL’s trailing space-padding and builds the offset/data buffers Arrow requires. Nullable fields additionally feed Arrow’s validity bitmap from the one-byte null markers. The arrays get bound to an ADBC statement and executed as a bulk ingest.
Row count and layout are derived from the spec and the byte size of the table.
Querying
01 SQL-SELECT PIC X(32) VALUE Z"SELECT * FROM games".
CALL "adbc-query-and-print"
USING ADBC-CONTEXT ADBC-EXEC-VARS
SQL-SELECT WS-PRINT-MODE
Output:
id name inventor year min_age min_players max_players list_price
--------------------------------------------------------------------------------------
1 Monopoly Elizabeth Magie 1904 8 2 6 19.99
2 Scrabble Alfred Mosher Butts 1938 8 2 4 NULL
3 Clue Anthony E. Pratt 1944 8 2 6 9.99
Column widths auto-size to the data. A universal type formatter handles over twenty Arrow types,[4] so the same function works regardless of what the driver returns.
Transactions
CALL "adbc-begin" USING ADBC-CONTEXT ADBC-EXEC-VARS
MOVE "Charles Darrow" TO UP-INVENTOR
MOVE 1904 TO UP-YEAR
MOVE "Monopoly" TO UP-NAME
CALL "adbc-execute-params"
USING ADBC-CONTEXT ADBC-EXEC-VARS
SQL-UPDATE UPDATE-PARAM-SPEC
UPDATE-PARAMS
CALL "adbc-rollback" USING ADBC-CONTEXT ADBC-EXEC-VARS
Parameter values live in an ordinary COBOL group item. A spec string describes the layout—same idea as ingest. The wrapper builds a single-row Arrow array from the group, binds it, and executes.[5]
Why COBOL’s data model fits
We expected heavy translation. What we found was a surprisingly small impedance mismatch.
COBOL’s OCCURS tables are contiguous arrays of fixed-layout records. Every field has an explicit type and a known byte width. No tagged unions, no optional wrapping, no dynamic dispatch. The structure is flat, predictable, and measurable at compile time.
Arrow’s columnar format makes the same bets: fixed-width types in contiguous buffers, explicit schemas, no per-value type tags. The only structural difference is orientation—COBOL is row-major, Arrow is column-major—but that transformation is straightforward when the types already align.
It is tempting to frame this as a coincidence. But both systems were designed to represent structured business data with minimal abstraction overhead. One was designed in 1959. The other in 2016. They arrived at remarkably similar conclusions about how typed records should be laid out in memory.
The one real mismatch is strings. COBOL pads every string field to its declared width with trailing spaces. Arrow stores variable-length UTF-8 with an offset table. The wrapper trims on ingest and pads on retrieval. It is the only place where the two models genuinely disagree—and it is about fifty lines of C.
What this enables
The driver manager runs on macOS, Linux, and Windows. The C layer is endian-portable, so big-endian targets like mainframes are in reach. It supports any database with an ADBC driver—dbc search lists fourteen and counting. The same COBOL program that queries SQLite in-memory can point at Snowflake by changing one string and running dbc install snowflake.
This is not a proof of concept. It handles bulk ingest, parameterized queries, transactions, catalog introspection, and result sets with complex nested types. It compiles with GnuCOBOL and links against the standard ADBC driver manager. CI passes on all three platforms. There are tests. An ABI probe utility lets you validate the COBOL/C boundary on a new runtime before trying the full driver path.
COBOL systems are usually treated as endpoints that need extraction pipelines and middleware before they can participate in modern data workflows. An ADBC interface makes them peers—able to query, ingest, and exchange data through the same driver model as every other language in the ecosystem.
We are not suggesting you rewrite your data platform in COBOL. But if sixty-seven years of production stability is any indication, the language is not going anywhere. It might as well have good database drivers.
The COBOL interface is open source under the Apache 2.0 license: github.com/columnar-tech/adbc-cobol. To get started with ADBC drivers on any platform or language, install dbc.
Footnotes
- A [driver manager](/glossary/#adbc-driver-manager) is a library that sits between a client application and ADBC drivers, loading and routing calls to the right driver at runtime.
- GnuCOBOL can call C functions directly, but cannot use function pointers, nested struct access, or pointer-to-pointer patterns—all of which ADBC relies on.
- These mostly follow the [Arrow C Data Interface format strings](https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings). This wrapper adds `uN` so fixed-width COBOL strings carry their declared byte width, and a leading `?` to indicate nullable writes.
- Integers of every width (signed and unsigned, 8- through 64-bit), floats, doubles, booleans, UTF-8 strings, binary blobs, dates, timestamps at multiple precisions, 128-bit decimals, and nested types like lists and structs.
- Monopoly was originally invented by Elizabeth Magie, who patented The Landlord’s Game in 1904. Charles Darrow learned the game decades later and sold it to Parker Brothers, who credited him as the sole inventor—one of the more well-known cases of misattribution in American product history. The demo updates the inventor to Darrow, then rolls it back. History, at least in this database, is corrected.