Recently I've come to know the concept of memory-mapped files while watching a lecture of the course Intro to Database Systems of Andy Pavlo on database storage. One of the main problems a database storage engine has to solve is how to deal with data in disk that is bigger than the available memory. At a higher level, the main purpose of a disk-oriented storage engine is to manipulate data files in a disk. But if we assume that the data in the disk will eventually get bigger than the available memory, we cannot simply load the whole data file into memory, do the change, and write it back to disk.

This is not a new problem in Computer Science. When operational systems were being developed in the early 1960s, a similar problem was faced: how can we run programs stored in disk that are larger than the available memory? A solution to this problem was made by a group in Manchester, implemented on the Atlas Computer, in 1961. It was called virtual memory. The virtual memory gives a running program the illusion that it has big enough memory, despite the fact that the computer does not have enough.

We are not going to go deep on how virtual memory works. Just have in mind that when a program is accessing memory it is accessing the virtual memory. And maybe the data the program is trying to access is not actually in memory, but it does not matter. The operational system will make pretend that it is by going to disk, and putting it there, and replace an old chunk of memory that is not going to be used.

So, one of the ways a database storage engine can solve the larger than memory problem is to make use of virtual memory and the concept of memory-mapped files.

In Linux, we can make this use by using the system call mmap that lets you map a file, no matter how big, directly into memory. If your program needs to manipulate the file, all it needs is to manipulate the memory. The operating system handles the writes to disk for you.

In some occasions, programmers find this method more convenient than the usual system calls: open, read, write, lseek and close.

A simple demonstration

Here is a small example of how you can take advantage of this in Go using the package mmap-go:

package main

import (
	"os"
	"fmt"
	"github.com/edsrzf/mmap-go"
)

func main() {
	f, _ := os.OpenFile("./file", os.O_RDWR, 0644)
	defer f.Close()
	
	mmap, _ := mmap.Map(f, mmap.RDWR, 0 )
	defer mmap.Unmap()
	fmt.Println(string(mmap))
	
	mmap[0] = 'X'
	mmap.Flush()
}
asciicast

The beauty is that we could have a much bigger file, and the solution would still work. We would not have to worry about managing memory in order to avoid it filling up.

Detailing mmap capabilites

We're going to explore more mmap functionalities from the point of view of the API provided by mmap-go. There are probably more features that the native syscall provides that this library does not implement.

The prot argument

Here is the mmap.Map signature

func Map(f *os.File, prot, flags int) (MMap, error) 

Let's look at prot first. The prot argument lets you specify the protection levels of your mapping: RDONLY, RDWR, EXEC are the options provided for mmap-go. These levels are pretty straightforward, RDONLY means you can only read from the mapping, RDWR means you can also write, and EXEC means you can execute code on that mapping.  Here is the description of prot from the Linux man:

The prot argument describes the desired memory protection of the
mapping (and must not conflict with the open mode of the file).
It is either PROT_NONE or the bitwise OR of one or more of the
following flags:

PROT_EXEC
    Pages may be executed.

PROT_READ
    Pages may be read.

PROT_WRITE
    Pages may be written.

PROT_NONE
    Pages may not be accessed.

In the unix package, those flags are: unix.PROT_EXEC, unix.PROT_READ, unix.PROT_WRITE and unix.PROT_NONE.

Experimenting with PROT_EXEC flag

I've become intrigued by the EXEC flag and wanted to see an example of how that works. I've Google and could not find any example. So I tried a search in Github by PROT_EXEC and found a good example in C: MMapExecDemo. I replicated this example in Go using mmap-go.

The first step was to create a function that I wanted to be put in memory by mmap allocation, compile it, and get its assembly opcodes.

I created the inc function in inc.go file

package inc

func inc(n int) int {
	return n + 1
}

compiled it with go tool compile -S -N inc.go, then got its assembly by calling go tool objdump -S inc.o.

func inc(n int) int {
  0x22b                 48c744241000000000      MOVQ $0x0, 0x10(SP)
        return n + 1
  0x234                 488b442408              MOVQ 0x8(SP), AX
  0x239                 48ffc0                  INCQ AX
  0x23c                 4889442410              MOVQ AX, 0x10(SP)
  0x241                 c3                      RET

With this, we can build represent our function in bytes on our code

code := []byte{
        0x48, 0xc7, 0x44, 0x24, 0x10, 0x00, 0x00, 0x00, 0x00,
		0x48, 0x8b, 0x44, 0x24, 0x08,
		0x48, 0xff, 0xc0,
		0x48, 0x89, 0x44, 0x24, 0x10,
		0xc3,
}

We allocate our memory with mmap.

memory, err := mmap.MapRegion(nil, len(code), mmap.EXEC|mmap.RDWR, mmap.ANON, 0)
if err != nil {
    panic(err)
}

In this call, we're using a more complete function called MapRegion that lets you specify how much memory you are allocating (Map allocates the size of the underlying file) and the offset of the file.

In the beginning, we said that the main purpose of mmap was to create a mapping between a file and memory. But in this call we are not indicating any file. mmap can be used just a regular memory allocater by setting nil to the *os.File argument and mmap.ANON to the flags argument. We will talk about more mmap.ANON. Since we are not mapping any file, the offset is 0.

So we have memory allocated with the same size of our code len(code). Since we set the flag mmap.RDWR, we can copy our code to memory.

copy(memory, code)

We have the code of our inc function in memory. In order to execute it, we have to cast that memory address to a function with a signature that matches the signature of our compiled inc.

memory_ptr := &memory
ptr := unsafe.Pointer(&memory_ptr)
inc := *(*func(int) int)(ptr)

When we call inc, we are executing the code we put in memory. That only works because of the flag mmap.EXEC. If that flag was not set, a segmentation violation would occur.

fmt.Println(inc(10)) // Prints 11

I don't know if this is a real use case. I just wanted to see what it meant to execute code that you put in memory. And there are probably other ways of achieving the same with regular memory allocation and calls to mprotect.

One question that may come up is: but the code is already in the code variable, can't we just execute it? No, because the memory static allocated to code is not executable. Can we make it executable? I've tried to use mprotect on it but still got segmentation violation.

Here is the full working gist.

The flags argument

We can have many processes mapping the same memory region. This argument lets us decide about the visibility of the updates happening in the mapping. There are many flags, and you can check them out at mmap. The important ones are unix.MAP_SHARED, unix.MAP_PRIVATE and unix.MAP_ANON.

MAP_SHARED means that changes to the mapping are visible to all processes and will also occur at the underlying mapped file, although we cannot control when.

MAP_PRIVATE means the changes are private and other processes will not see them. And also, they are not carried through to the underlying file.

MAP_ANON means that there is not going to be a mapped file. It is useful for sub-processes communication with shared memory.

I've got confused about the mmap-go library implementation. It only provides the mmap.ANON flag, that we used in the above example. If you want your mapping to be private, you can set the mmap.COPY flag to the prot argument. Anyways, you can always use the flags provided by the unix package implementation.

Locking and flushing

Two other nice methods, Lock and Flush, are provided by the API of mmap-go. The Lock method calls the mlock system call that prevents the mapping to be paged out to disk. And the Flush method calls the msync system call that forces the data in memory to be written to disk. This is a good way to trying to have more control over how and when data is flushed to disk.

Wrapping up

I felt kind of stupid of knowing about mmap after so long. I don't remember it being brought in my college class. For some reason, I felt amazed by it and its capabilities and decided to dig deeper. I like databases and I'm aiming to get a better grasp of them. This means that mmap cannot go unnoticed from my learning. For future posts, I'll try to bring about the benefits and drawbacks of using mmap, which projects use it, and what kind of problems it is suited for.

Even though the mmap can be used to solve that database problem we stated in the beginning, and many modern databases use it, Andy Pavlo advocates against it and have three lecture on how to databases, that don't use mmap, manage data.

If you like this kind of content, follow me on twitter. You may find more related stuff there.