Introdution
The other day, I was thinking about how I could get the bytes of a record of a recently inserted or updated row in SQLite
. The motivation for that is that I wanted to create a hash of that row, essentially, to be able to build a Merkle Tree of the corresponding table as rows get inserted or updated.
The closest API that SQLite
offers to what I was looking for is the sqlite3_update_hook:
The sqlite3_update_hook()
interface registers a callback function with the database connection identified by the first argument to be invoked whenever a row is updated, inserted or deleted in a rowid table.
The issue with that API is that it only returns the rowid of the row. That means I would have to fetch all the columns for the rows inside the column. And even with that approach, I would still not get the raw bytes of the row record. I would just get the driver's representation of that row.
There are probably plenty of approaches to how I could build such a tree, but as far as I'm concerned, the SQLite
API does not offer what I was exactly looking for. So, I decided to dig deeper into the source code to see if there could be a world where I would understand it, and not only that, I could actually make some changes to it to provide what I was hoping for.
I have always felt intimidated by reading C
language code. So I thought it would be just one of those times of opening a C
language file and giving up minutes later. But it turns out this time was different.
Navigating the SQLite source code
I've cloned SQLite
source code using fossil
and started navigating the files structure.
If you're somewhat familiar with how databases work you can probably imagine what some files are responsible for. Not bad. I decided to jump straight to the insert.c
file to see if I could find something interesting there.
If you skim through all function implementation you'll probably hit sqlite3Insert
. Above the function signature, we see:
** This routine is called to handle SQL of the following forms:
**
** insert into TABLE (IDLIST) values(EXPRLIST),(EXPRLIST),...
** insert into TABLE (IDLIST) select
** insert into TABLE (IDLIST) default values
**
Maybe inside this function, there was something that I could tweak. I was able to make some guesses about what's happening in there, but what caught my eye was the number of function calls to functions of a name similar to sqlite3vdbeXXX
.
That reminded me that SQLite
uses a virtual machine called vdbe. That means all SQL statements are translated to the language of this virtual machine first. Then, the execution engine executes the virtual machine code. Let's look at an example of how a simpler INSERT
is translated into bytecode:
sqlite> create table a (a int, b text);
sqlite> explain INSERT INTO a VALUES (1, 'Hello');
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 8 0 0 Start at 8
1 OpenWrite 0 2 0 2 0 root=2 iDb=0; a
2 Integer 1 2 0 0 r[2]=1
3 String8 0 3 0 Hello 0 r[3]='Hello'
4 NewRowid 0 1 0 0 r[1]=rowid
5 MakeRecord 2 2 4 DB 0 r[4]=mkrec(r[2..3])
6 Insert 0 4 1 a 57 intkey=r[1] data=r[4]
7 Halt 0 0 0 0
8 Transaction 0 1 1 0 1 usesStmtJournal=0
9 Goto 0 1 0 0
I concluded that all sqlite3Insert
is really doing is translating a parsed INSERT
statement to a bunch of virtual machine operations, according to all rules of SQLite
insertion.
This was not really the place I was looking for. What I really needed was the place where the record is created before insertion. That could only be the place that is executing the virtual machine code, probably the place that is executing the Insert (
opcode, I thought. OP_INSERT
)
Looking at Figure 1, the vdbe.c
file felt like a reasonable place to look for that. I went straight there.
What I found there was an 8000 lines switch( pOp->opcode )
statement and with a simple CMD+F OP_INSERT
I found the place that was handling the Insert
operation execution.
The first line of the case I found a hint of what I was looking for:
Mem *pData; /* MEM cell holding data for the record to be inserted */
So pData
points to the record data to be inserted. And you can see at L5402, pData = &aMem[pOp->p2];
, how it is setting pData
value to the virtual machine memory aMem
address at the position pointed by virtual machine register p2
.
Quick recap: at insert.c
we've learned that the INSERT
statement was translated into a bunch of virtual machine code. The data from the INSERT
went to the virtual machine through those sqlite3vdbeXXX
calls. I assume the call that registered the OP_INSERT
opcode and the data into the virtual machine is the one at line 2593:
sqlite3VdbeAddOp3(v, OP_Insert, iDataCur, aRegIdx[i], regNewData);
And here is a nice description of regNewData
:
** The regNewData parameter is the first register in a range that contains
** the data to be inserted or the data after the update. There will be
** pTab->nCol+1 registers in this range. The first register (the one
** that regNewData points to) will contain the new rowid, or NULL in the
** case of a WITHOUT ROWID table. The second register in the range will
** contain the content of the first table column. The third register will
** contain the content of the second table column. And so forth.
**
** The regOldData parameter is similar to regNewData except that it contains
** the data prior to an UPDATE rather than afterwards. regOldData is zero
** for an INSERT. This routine can distinguish between UPDATE and INSERT by
** checking regOldData for zero.
So, at this point we are executing that machine code with that data. Scrolling a bit down, let's see how pData
is used. At L5448-L5449, we see:
x.pData = pData->z;
x.nData = pData->n;
And x
is:
BtreePayload x; /* Payload to be inserted */
Awesome. Scrolling a bit more, we see:
rc = sqlite3BtreeInsert(pC->uc.pCursor, &x,
(pOp->p5 & (OPFLAG_APPEND|OPFLAG_SAVEPOSITION|OPFLAG_PREFORMAT)),
seekResult
);
Cool, we found the place where the raw record byte is being inserted. But, how do we know that it is the record formatted as documented here? If you look closely at the virtual machine code from our example INSERT
, before the Insert
opcode there is a MakeRecord
opcode, that is responsible for building the record.
You can check the OP_MakeRecord
implementation at vdbe.c file and see the following comment:
Convert P2 registers beginning with P1 into the [record format] use as a data record in a database table or as a key in an index.
And the last lines of the case statement, we find this:
/* Invoke the update-hook if required. */
if( rc ) goto abort_due_to_error;
if( pTab ){
assert( db->xUpdateCallback!=0 );
assert( pTab->aCol!=0 );
db->xUpdateCallback(db->pUpdateArg,
(pOp->p5 & OPFLAG_ISUPDATE) ? SQLITE_UPDATE : SQLITE_INSERT,
zDb, pTab->zName, x.nKey);
}
break;
Yes! The update hook!
So, here I have everything I need in my hands. The update hook and the record bytes. So, I just need to update the callback to add the record, and voila.
Making changes to SQLite
That's exactly what I did:
db->xUpdateCallback(db->pUpdateArg,
(pOp->p5 & OPFLAG_ISUPDATE) ? SQLITE_UPDATE : SQLITE_INSERT,
zDb, pTab->zName, x.nKey, pData->z, pData->n);
Passing the payload (pData->z
) and its size (pData->n
).
And, of course a bunch of more changes in multiple places to account for function signature change were necessary.
Here's fossil status
after the changes:
EDITED src/main.c
EDITED src/sqlite.h.in
EDITED src/sqlite3ext.h
EDITED src/sqliteInt.h
EDITED src/tclsqlite.c
EDITED src/vdbe.c
And the diff of changes in case you're following along. I've compiled my changes following the instructions.
Forking Go's SQLite driver
All right! Now it's time to create a simple test of my change in a Go program. I'm most familiar with mattn/go-sqlite3 driver for interacting with SQLite
. This project offers SQLite
API access through Go
by importing the SQLite
amalgamation file and working through C
bindings.
So I forked that repo, and imported my own compiled amalgamation file. And did the necessary updates in the Go
API to have access to that new value. My linter messed up with the diff. But only a few changes were needed:
- A change at the updateHookTrampoline that now receives the record as
*C.char
and its size asint
, and casts it to[]byte
passing it to callback:
func updateHookTrampoline(handle unsafe.Pointer, op int, db *C.char, table *C.char, rowid int64, data *C.char, size int) {
callback := lookupHandle(handle).(func(int, string, string, int64, []byte))
callback(op, C.GoString(db), C.GoString(table), rowid, C.GoBytes(unsafe.Pointer(data), C.int(size)))
}
- And a change to RegisterUpdateHook.
Putting everything together
So, now we have everything in place to test this out. Let's run a simple example inspired by SQLite Internals: Pages & B-trees blog post.
package main
import (
"database/sql"
"fmt"
"log"
"os"
"github.com/mattn/go-sqlite3"
)
func main() {
sqlite3conn := []*sqlite3.SQLiteConn{}
sql.Register("sqlite3_with_hook_example",
&sqlite3.SQLiteDriver{
ConnectHook: func(conn *sqlite3.SQLiteConn) error {
sqlite3conn = append(sqlite3conn, conn)
conn.RegisterUpdateHook(func(op int, db string, table string, rowid int64, data []byte) {
switch op {
case sqlite3.SQLITE_INSERT:
fmt.Printf("%x\n", data)
}
})
return nil
},
})
os.Remove("./foo.db")
srcDb, err := sql.Open("sqlite3_with_hook_example", "./foo.db")
if err != nil {
log.Fatal(err)
}
defer srcDb.Close()
srcDb.Ping()
_, err = srcDb.Exec(`CREATE TABLE sandwiches (
id INTEGER PRIMARY KEY,
name TEXT,
length REAL,
count INTEGER
);`)
if err != nil {
log.Fatal(err)
}
_, err = srcDb.Exec("INSERT INTO sandwiches (name, length, count) VALUES ('Italian', 7.5, 2);")
if err != nil {
log.Fatal(err)
}
}
Don't forget to add replace github.com/mattn/go-sqlite3 => github.com/brunocalza/go-sqlite3 v0.0.0-20220926005737-36475033d841
to go.mod
.
If you run that, you'll get:
05001b07014974616c69616e401e00000000000002
Which is exactly the raw record of the row ('Italian', 7.5, 2)
as described at Efficient Sandwich Encoding, without the primary key and the length of the record (two first bytes).
And here I finish the journey. It was really fun to know I was able to understand some parts of SQLite
source code, although I did not understand most of it, and make some changes and see those changes reflected through Go
's driver.
To be honest this approach of changing the source code of a database is too risky. Keeping up-to-date with new versions (and also I'd have to also keep an up-to-date fork of the Go
's driver) is too problematic.
I work on a web3 protocol called Tableland. Problems similar to this are always frequent in our Engineering team. If you enjoy this kind of stuff, get in touch through our Github or Discord, or @brunocalza.