The Anatomy of a QuestDB Table on Disk

Jun 11, 2026

As I continue exploring QuestDB, one natural thing to investigate is how data is stored on disk. In this post, we explore QuestDB’s native columnar format, how data is stored on the bit level, and some interesting techniques that are common in other databases, such as German strings for VARCHAR storage and dictionary encodings for SYMBOL storage.

Let’s use the following table as our example:

ts	price	sym	note
2026-06-10T10:00:00Z	1.0	BTC	hi
2026-06-10T10:00:01Z	2.5	ETH	this note is too long to inline
2026-06-10T10:00:02Z	-3.0	BTC	x

Here’s the SQL:

CREATE TABLE trades (ts TIMESTAMP, price DOUBLE, sym SYMBOL, note VARCHAR) TIMESTAMP(ts) PARTITION BY DAY WAL;
INSERT INTO trades VALUES ('2026-06-10T10:00:00Z', 1.0, 'BTC', 'hi');
INSERT INTO trades VALUES ('2026-06-10T10:00:01Z', 2.5, 'ETH', 'this note is too long to inline');
INSERT INTO trades VALUES ('2026-06-10T10:00:02Z', -3.0, 'BTC', 'x');

We create a table trades partitioned by day on the timestamp ts column.

Our server data is stored on /tmp/questdb-data/ and here’s the filesystem view of our table:

/tmp/questdb-data/
└── db/
    └── trades~12/                
        │
        ├── _meta
        ├── _txn
        ├── _cv
        ├── _name
        ├── _todo_
        │
        ├── sym.c      
        ├── sym.k
        ├── sym.o
        ├── sym.v           
        │
        ├── 2026-06-10/           
        │   ├── ts.d
        │   ├── price.d
        │   ├── sym.d
        │   ├── note.d
        │   └── note.i
        │
        ├── txn_seq/*
        │
        └── wal1/*

We’ll mostly focus on the files responsible for storing the columns’ data: sym.*, ts.d, price.d, note.d, and note.i.

You can already see how each of our table’s columns is stored separately on disk, typical of columnar databases, as suggested in C-Store: A Column-oriented DBMS.

Because our data is partitioned by day and all of our rows have timestamps on June 10, 2026, a directory 2026-06-10 is created to store the data that lives in that partition.

All of QuestDB’s data is stored sorted by our designated timestamp column, meaning writing data is mostly an append on each one of those columns. The writer can be thought of as doing something like this:

try (Writer w = engine.getWriter(schema)) {
    w.row(1781085600000000)
        .putDouble(1, 1.0)
        .putSymbol(2, "BTC")
        .putVarchar(3, "hi")
        .append();

    w.row(1781085601000000)
        .putDouble(1, 2.5)
        .putSymbol(2, "ETH")
        .putVarchar(3, "this note is too long to inline")
        .append();

    w.row(1781085602000000)
        .putDouble(1, -3.0)
        .putSymbol(2, "BTC")
        .putVarchar(3, "x")
        .append();

    w.commit();
}

We won’t get into details of how the data is committed and how it becomes visible for readers; this is just to get an idea of how it was transformed from our INSERT and is now being written to each one of those files, because that helps us understand what we find inside those files.

Fixed-size columns

Let’s now look into ts and price columns. The ts column has the TIMESTAMP type and price column the DOUBLE type. Both of them have a fixed size of 8 bytes. For fixed-size columns, the values are simply packed back-to-back in little-endian with no extra metadata (e.g., headers, checksums). For price.d, which is the file that stores price data, we expect to see (you can use a calculator to confirm):

1.0 -> 000000000000F03F
2.5 -> 0000000000000440
-3.0 -> 00000000000008C0

which is, exactly, what we see:

xxd  -l 24 -g 8 -c 8 db/trades~12/2026-06-10/price.d
00000000: 000000000000f03f  .......?
00000008: 0000000000000440  .......@
00000010: 00000000000008c0  ........

Similarly for the ts.d file:

xxd  -l 24 -g 8 -c 8 db/trades~12/2026-06-10/ts.d
00000000: 00a8804ee3530600  ...N.S..
00000008: 40ea8f4ee3530600  @..N.S..
00000010: 802c9f4ee3530600  .,.N.S..

And that is exactly the timestamp values we inserted:

row	raw bytes in file	as 64-bit int	seconds since epoch	UTC timestamp
0	`00a8804ee3530600`	`0x000653e34e80a800`	1781085600	2026-06-10T10:00:00.000000Z
1	`40ea8f4ee3530600`	`0x000653e34e8fea40`	1781085601	2026-06-10T10:00:01.000000Z
2	`802c9f4ee3530600`	`0x000653e34e9f2c80`	1781085602	2026-06-10T10:00:02.000000Z

One thing to keep in mind is that the file size is not 24 bytes. You could expect that since we stored 3 values of 8 bytes. It is actually 4096 bytes:

ls -lH db/trades~12/2026-06-10/{price,ts}.d 
-rw-r--r-- 1 brunocalza brunocalza 4096 Jun 10 16:28 db/trades~12/2026-06-10/price.d
-rw-r--r-- 1 brunocalza brunocalza 4096 Jun 10 16:28 db/trades~12/2026-06-10/ts.d

And the reason for that is those files are mmapped, and file sizes are rounded to the OS page size, which in my case is 4096 bytes. So, you could be thinking, how does the storage engine know, then, how many “rows” are stored in the columnar file? That information is actually in the _txn file and can be decoded with:

printf '%d\n' 0x$(xxd -e -s 216 -l 8 -g 8 db/trades~12/_txn | awk '{print $2}')
3

All column data files (.d) should store the same number of rows.

The VARCHAR type

The note column has the VARCHAR type, where values are strings of varying length. QuestDB follows an approach similar to the design presented in Umbra: A Disk-Based System with In-Memory Performance, which is widely adopted in other engines, such as DuckDB, Apache Arrow, Polars, and Facebook Velox, and it is called by Andy Pavlo “German strings”. ¹

Two files are used to store VARCHAR values: note.i and note.d. Each row gets a 16-byte entry in .i (VARCHAR_AUX_WIDTH_BYTES); strings of at most 9 bytes (VARCHAR_MAX_BYTES_FULLY_INLINED) are appended entirely inside .i; and strings bigger than that are appended to note.d, and their offset plus a 6-byte prefix of the string is appended in .i.

Here’s the format of .i files:

Inlined values

Let’s now understand in detail how those bytes are stored. For the inlined variant, the header has 1 byte, where 4 bits are used to store the size of the string and 4 bits are used to store the following flags (VarcharTypeDriver.java):

public static final int VARCHAR_HEADER_FLAG_NULL = 4;
..
private static final int HEADER_FLAG_ASCII = 2;
private static final int HEADER_FLAG_INLINED = 1;

There are three flags, and they always live in the lowest 4 bits of the header:

HEADER_FLAG_INLINED: bit 0 (value 1), meaning string lives entirely in this aux entry
HEADER_FLAG_ASCII: bit 1 (value 2), meaning all bytes are ASCII
VARCHAR_HEADER_FLAG_NULL: bit 2 (value 4), meaning the value is NULL

Bit 3 is not used.

Here’s how the header is built on lines L127-L132,

int flags = HEADER_FLAG_INLINED;
if (value.isAscii()) {
    flags |= HEADER_FLAG_ASCII;
}
// size is known to be at most 4 bits
auxMem.putByte((byte) ((size << HEADER_FLAGS_WIDTH) | flags));

For our “hi” value, we’d have the size 0b0010 and flags 0b0011, so the header is 0b00100011, or 0x23. The ASCII value of “hi” is 0x6869. The 48-bit offset for inlined values is a marker of the current end of the .d file. I’m not entirely sure how it is used, though. For this first entry, it is simply 0x000000000000. So, we expect to see the following for the first 16 bytes in note.i: 23 686900000000000000 000000000000. And it is precisely what we see:

xxd -p -l 16 db/trades~12/2026-06-10/note.i | sed -E 's/^(..)(.{18})(.{12})$/\1 \2 \3/'
23 686900000000000000 000000000000

Not inlined

For not-inlined values, we have a header of 4 bytes, where 28 bits are used to store the size of the string and 4 bits to store the same three flags.

For our “this note is too long to inline” value, of length 31, we have 0b0000000000000000000000011111 for the 28-bit size part and 0b0010 for the flags part. The header is 0b00000000000000000000000111110010, or 0xf2010000 in little-endian.

The next 6 bytes are used to store the prefix “this n”, so we expect to see 0x74686973206e. The reason for storing the prefix in the inlined file is to short-circuit some comparisons. We can first compare the length and the 6-byte prefix when doing string comparison, and only when there’s a match, we dereference the pointer to the full string payload to continue the comparison.

The 48-bit offset for this entry will be 0x0 as well, because note.d is empty and our string will be located right in the beginning of the file. We expect to find the following for the second entry: f2010000 74686973206e 000000000000. And there it is:

xxd -p -c 16 -l 32 db/trades~12/2026-06-10/note.i | sed -E '1 s/^(..)(.{18})(.{12})$/\1 \2 \3/; 2 s/^(.{8})(.{12})(.{12})$/\1 \2 \3/'
23 686900000000000000 000000000000
f2010000 74686973206e 000000000000

In note.d, strings are packed back to back:

xxd  -l 31 -g 16 db/trades~12/2026-06-10/note.d
00000000: 74686973206e6f746520697320746f6f  this note is too
00000010: 206c6f6e6720746f20696e6c696e65     long to inline

For completeness, let’s look at all of our three entries in note.i:

xxd -p -c 16 -l 48 db/trades~12/2026-06-10/note.i \
    | sed -E '1 s/^(..)(.{18})(.{12})$/\1 \2 \3/;
              2 s/^(.{8})(.{12})(.{12})$/\1 \2 \3/;
              3 s/^(..)(.{18})(.{12})$/\1 \2 \3/'
23 686900000000000000 000000000000
f2010000 74686973206e 000000000000
13 780000000000000000 1f0000000000

Now we see that the 48-bit offset of the third entry is set to 0x1f, which is the current end of note.d.

The SYMBOL type

The sym column has the SYMBOL type. Let’s understand what that means. We inserted the values BTC, ETH, and BTC. If we expect to see those same strings over and over, there’s an optimization that can be done. Those strings can be encoded into integers, e.g. BTC is 0, and ETH is 1, and we store those integers instead. Filtering and grouping on integers is much faster compared to regular strings. This technique is called dictionary encoding, and it is essentially what happens when the SYMBOL type is used.

Let’s start with the sym.d file. This is just a 4-byte integer column of the encoded values:

xxd -l 12 -g 4 db/trades~12/2026-06-10/sym.d
00000000: 00000000 01000000 00000000           ............

We can see the 0, 1, and 0 stored in the first 12 bytes.

Now, given an encoded value, we need to figure out the string. That information is stored on sym.c and sym.o. Whenever a new symbol is encountered, it is appended once, to sym.c, as a length-prefixed UTF-16 string: 4 bytes for the char count, then 2 bytes per char. Here are our two sym.c entries:

xxd -p -c 10 -l 20 db/trades~12/sym.c | sed -E 's/^(.{8})(....)(....)(....)$/\1 \2 \3 \4/'
03000000 4200 5400 4300
03000000 4500 5400 4800

But with only that information, we’d have to perform a linear search to figure out the symbol given an integer. The sym.o file solves that problem. sym.o starts with a 64-byte header that we’ll ignore. Then, after that, it stores N + 1 entries of 8 bytes, where N is the number of symbols. The N + 1 is just to make the math simple. Each entry marks the beginning and the end of a symbol in sym.c. We saw that both BTC and ETH symbols occupied 10 bytes in sym.c, meaning the symbol BTC starts at 0 and ends at 0x0a and ETH starts at 0x0a and ends at 0x14. And it is exactly that information that is seen in sym.o:

xxd -s 64 -l 24 -c 8 db/trades~12/sym.o
00000040: 0000 0000 0000 0000  ........
00000048: 0a00 0000 0000 0000  ........
00000050: 1400 0000 0000 0000  ........

Resolving the key 1, for example, involves getting the offset in sym.o with the following math:

public static long keyToOffset(int key) {
    return HEADER_SIZE + key * 8L;
}

With the offset in hand, you can directly read the symbol in sym.c.

There’s another question that needs answering: given a symbol, do I have a key for it? That is the purpose of the sym.k and sym.v files. You can think of them as files that store the information necessary to perform the reverse lookup: string -> key. But for this blog post, this is probably enough. To finish, let’s look at what happens when we insert old records.

Inserting out-of-order data

We’ve talked about how the data in QuestDB is always sorted. Let’s insert a new row with a timestamp one second less than our first row.

INSERT INTO trades VALUES ('2026-06-10T09:59:59Z', 1.0, 'ETH', 'another too long note');

Looking at how that affected our data on disk:

xxd  -l 32 -g 8 -c 8 db/trades~12/2026-06-10.1/price.d 
00000000: 000000000000f03f  .......?
00000008: 000000000000f03f  .......?
00000010: 0000000000000440  .......@
00000018: 00000000000008c0  ........


xxd  -l 32 -g 8 -c 8 db/trades~12/2026-06-10.1/ts.d 
00000000: c065714ee3530600  .eqN.S..
00000008: 00a8804ee3530600  ...N.S..
00000010: 40ea8f4ee3530600  @..N.S..
00000018: 802c9f4ee3530600  .,.N.S..


xxd -p -c 16 -l 64 db/trades~12/2026-06-10.1/note.i \
    | sed -E '1 s/^(.{8})(.{12})(.{12})$/\1 \2 \3/;
              2 s/^(..)(.{18})(.{12})$/\1 \2 \3/;
              3 s/^(.{8})(.{12})(.{12})$/\1 \2 \3/;
              4 s/^(..)(.{18})(.{12})$/\1 \2 \3/'
52010000 616e6f746865 000000000000
23 686900000000000000 150000000000
f2010000 74686973206e 150000000000
13 780000000000000000 340000000000


xxd  -l 50 -g 16 db/trades~12/2026-06-10.1/note.d
00000000: 616e6f7468657220746f6f206c6f6e67  another too long
00000010: 206e6f746574686973206e6f74652069   notethis note i
00000020: 7320746f6f206c6f6e6720746f20696e  s too long to in
00000030: 6c69                              li


xxd  -l 16 -g 4 -c 4 db/trades~12/2026-06-10.1/sym.d
00000000: 01000000  ....
00000004: 00000000  ....
00000008: 01000000  ....
0000000c: 00000000  ....


xxd  -l 20 -g 10 -c 10 db/trades~12/sym.c
00000000: 03000000420054004300  ....B.T.C.
0000000a: 03000000450054004800  ....E.T.H.


xxd -s 64 -l 24 -c 8 db/trades~12/sym.o
00000040: 0000 0000 0000 0000  ........
00000048: 0a00 0000 0000 0000  ........
00000050: 1400 0000 0000 0000  ........

The first thing to notice is that the partition directory is now 2026-06-10.1. The partition was rewritten, and to not affect concurrent readers, a new version was created. Beyond that, you can clearly see how those files were rearranged to make sure the values are sorted in timestamp order. The most recent write, which has the oldest timestamp, is the first entry in data files. The entries we had before come right after it. That means there’s a price to pay whenever we insert an out-of-order entry. But that’s a story for another post.

That’s all I had for this post. We have explored how certain QuestDB types are stored on disk and some database encoding techniques are applied. Have in mind that QuestDB offers a lot more types, and I’m sure other techniques are applied, but with the types we explored, we can get a good sense of how data is being stored.

https://cedardb.com/blog/german_strings/ ↩

#databases #questdb #storage-engine #columnar #german-strings #dictionary-encoding #mmap