Recently I've come to know the concept of memory-mapped files while watching a lecture of the course Intro to Database Systems of Andy Pavlo on database storage. One of the main problems a database storage engine has to solve is how to deal with data in disk that is bigger than the available memory. At a higher level, the main purpose of a disk-oriented storage engine is to manipulate data files in a disk. But if we assume that the data in the disk will eventually get bigger than the available memory, we cannot simply load the whole data file into memory, do the change, and write it back to disk.
This is not a new problem in Computer Science. When operational systems were being developed in the early 1960s, a similar problem was faced: how can we run programs stored in disk that are larger than the available memory? A solution to this problem was made by a group in Manchester, implemented on the Atlas Computer, in 1961. It was called virtual memory. The virtual memory gives a running program the illusion that it has big enough memory, despite the fact that the computer does not have enough.
We are not going to go deep on how virtual memory works. Just have in mind that when a program is accessing memory it is accessing the virtual memory. And maybe the data the program is trying to access is not actually in memory, but it does not matter. The operational system will make pretend that it is by going to disk, and putting it there, and replace an old chunk of memory that is not going to be used.
So, one of the ways a database storage engine can solve the larger than memory problem is to make use of virtual memory and the concept of memory-mapped files.
In Linux, we can make this use by using the system call mmap that lets you map a file, no matter how big, directly into memory. If your program needs to manipulate the file, all it needs is to manipulate the memory. The operating system handles the writes to disk for you.
In some occasions, programmers find this method more convenient than the usual system calls: open, read, write, lseek and close.
A simple demonstration
Here is a small example of how you can take advantage of this in Go using the package mmap-go:
package main
import (
"os"
"fmt"
"github.com/edsrzf/mmap-go"
)
func main() {
f, _ := os.OpenFile("./file", os.O_RDWR, 0644)
defer f.Close()
mmap, _ := mmap.Map(f, mmap.RDWR, 0 )
defer mmap.Unmap()
fmt.Println(string(mmap))
mmap[0] = 'X'
mmap.Flush()
}
The beauty is that we could have a much bigger file, and the solution would still work. We would not have to worry about managing memory in order to avoid it filling up.
Detailing mmap capabilites
We're going to explore more mmap functionalities from the point of view of the API provided by mmap-go. There are probably more features that the native syscall provides that this library does not implement.
The prot
argument
Here is the mmap.Map
signature
func Map(f *os.File, prot, flags int) (MMap, error)
Let's look at prot
first. The prot
argument lets you specify the protection levels of your mapping: RDONLY
, RDWR
, EXEC
are the options provided for mmap-go
. These levels are pretty straightforward, RDONLY
means you can only read from the mapping, RDWR
means you can also write, and EXEC
means you can execute code on that mapping. Here is the description of prot
from the Linux man
:
The prot argument describes the desired memory protection of the
mapping (and must not conflict with the open mode of the file).
It is either PROT_NONE or the bitwise OR of one or more of the
following flags:
PROT_EXEC
Pages may be executed.
PROT_READ
Pages may be read.
PROT_WRITE
Pages may be written.
PROT_NONE
Pages may not be accessed.
In the unix package, those flags are: unix.PROT_EXEC
, unix.PROT_READ
, unix.PROT_WRITE
and unix.PROT_NONE
.
Experimenting with PROT_EXEC
flag
I've become intrigued by the EXEC
flag and wanted to see an example of how that works. I've Google and could not find any example. So I tried a search in Github by PROT_EXEC
and found a good example in C
: MMapExecDemo. I replicated this example in Go
using mmap-go
.
The first step was to create a function that I wanted to be put in memory by mmap
allocation, compile it, and get its assembly opcodes.
I created the inc
function in inc.go
file
package inc
func inc(n int) int {
return n + 1
}
compiled it with go tool compile -S -N inc.go
, then got its assembly by calling go tool objdump -S inc.o
.
func inc(n int) int {
0x22b 48c744241000000000 MOVQ $0x0, 0x10(SP)
return n + 1
0x234 488b442408 MOVQ 0x8(SP), AX
0x239 48ffc0 INCQ AX
0x23c 4889442410 MOVQ AX, 0x10(SP)
0x241 c3 RET
With this, we can build represent our function in bytes on our code
code := []byte{
0x48, 0xc7, 0x44, 0x24, 0x10, 0x00, 0x00, 0x00, 0x00,
0x48, 0x8b, 0x44, 0x24, 0x08,
0x48, 0xff, 0xc0,
0x48, 0x89, 0x44, 0x24, 0x10,
0xc3,
}
We allocate our memory with mmap
.
memory, err := mmap.MapRegion(nil, len(code), mmap.EXEC|mmap.RDWR, mmap.ANON, 0)
if err != nil {
panic(err)
}
In this call, we're using a more complete function called MapRegion
that lets you specify how much memory you are allocating (Map
allocates the size of the underlying file) and the offset of the file.
In the beginning, we said that the main purpose of mmap
was to create a mapping between a file and memory. But in this call we are not indicating any file. mmap
can be used just a regular memory allocater by setting nil
to the *os.File
argument and mmap.ANON
to the flags
argument. We will talk about more mmap.ANON
. Since we are not mapping any file, the offset is 0
.
So we have memory allocated with the same size of our code len(code)
. Since we set the flag mmap.RDWR
, we can copy our code
to memory
.
copy(memory, code)
We have the code of our inc
function in memory. In order to execute it, we have to cast that memory address to a function with a signature that matches the signature of our compiled inc
.
memory_ptr := &memory
ptr := unsafe.Pointer(&memory_ptr)
inc := *(*func(int) int)(ptr)
When we call inc
, we are executing the code we put in memory. That only works because of the flag mmap.EXEC
. If that flag was not set, a segmentation violation
would occur.
fmt.Println(inc(10)) // Prints 11
I don't know if this is a real use case. I just wanted to see what it meant to execute code that you put in memory. And there are probably other ways of achieving the same with regular memory allocation and calls to mprotect.
One question that may come up is: but the code is already in the code
variable, can't we just execute it? No, because the memory static allocated to code
is not executable. Can we make it executable? I've tried to use mprotect on it but still got segmentation violation
.
Here is the full working gist.
The flags
argument
We can have many processes mapping the same memory region. This argument lets us decide about the visibility of the updates happening in the mapping. There are many flags, and you can check them out at mmap. The important ones are unix.MAP_SHARED
, unix.MAP_PRIVATE
and unix.MAP_ANON
.
MAP_SHARED
means that changes to the mapping are visible to all processes and will also occur at the underlying mapped file, although we cannot control when.
MAP_PRIVATE
means the changes are private and other processes will not see them. And also, they are not carried through to the underlying file.
MAP_ANON
means that there is not going to be a mapped file. It is useful for sub-processes communication with shared memory.
I've got confused about the mmap-go
library implementation. It only provides the mmap.ANON
flag, that we used in the above example. If you want your mapping to be private, you can set the mmap.COPY
flag to the prot
argument. Anyways, you can always use the flags provided by the unix
package implementation.
Locking and flushing
Two other nice methods, Lock
and Flush
, are provided by the API of mmap-go
. The Lock
method calls the mlock system call that prevents the mapping to be paged out to disk. And the Flush
method calls the msync system call that forces the data in memory to be written to disk. This is a good way to trying to have more control over how and when data is flushed to disk.
Wrapping up
I felt kind of stupid of knowing about mmap
after so long. I don't remember it being brought in my college class. For some reason, I felt amazed by it and its capabilities and decided to dig deeper. I like databases and I'm aiming to get a better grasp of them. This means that mmap
cannot go unnoticed from my learning. For future posts, I'll try to bring about the benefits and drawbacks of using mmap
, which projects use it, and what kind of problems it is suited for.
Even though the mmap
can be used to solve that database problem we stated in the beginning, and many modern databases use it, Andy Pavlo advocates against it and have three lecture on how to databases, that don't use mmap
, manage data.
If you like this kind of content, follow me on twitter. You may find more related stuff there.