Dealing with large data structures efficiently in embedded systems

I’m currently dealing with a programming problem where I need access to several 64MB, file-backed data structures concurrently on an Embedded Linux system that only has 64MB of RAM.  The data structures are fairly sparse (mostly zero data), and I typically only need to access small portions of them at any particular time.  There is always the brute-force approach where you write code to manually load sections of the file as you need them.  But with a little thought, the realisation hits “this is what operating systems do.”  This article explores using memory mapping, and the sparse file support in file systems to solve this problem in a very efficient manner.

Sparse File Support

Most Unix file systems support sparse files.  This means that sections of data that is zeros is not stored on the disk.  Consider the following example where we create a 200MB file that is all zeros:

root@cm-x270:/media/mmcblk0p1# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mmcblk0p1         3917212    202448   3515776   5% /media/mmcblk0p1

root@cm-x270:/media/mmcblk0p1# dd if=/dev/zero of=sparse-file bs=1 \
count=1 seek=200M

root@cm-x270:/media/mmcblk0p1# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mmcblk0p1         3917212    202456   3515768   5% /media/mmcblk0p1

root@cm-x270:/media/mmcblk0p1# ls -l
-rw-r--r--    1 root     root    209715201 May 22 08:54 sparse-file

root@cm-x270:/media/mmcblk0p1# time od sparse-file
0000000  000000 000000 000000 000000 000000 000000 000000 000000
*
1440000000
real    0m 17.23s
user    0m 14.21s
sys     0m 2.66s

root@cm-x270:/media/mmcblk0p1# du sparse-file
68      sparse-file

It this case we created a 200MiB file on a SD card formatted as ext3. Even though the file is 200M in size, it only uses a few KiB of disk space.  This same test also worked fine with a JFFS2 filesystem.  With sparse file support, we get a cheap form of run length compression (at least for zeros) with no effort.

mmap()

The mmap() system call is used to map a file, or portions of a file into memory.  The data in the file can then be accessed directly in memory.  Because Linux is a demand paged system, portions of the file are paged in as needed, so the entire file does not need to be present in RAM at one time.  There are several advantages to using mmap():

  1. mmap() avoids the extraneous copy that occurs when using read() or write() as the data does not need to be copied to a user space buffer.
  2. there is very little overhead
  3. you can directly access any part of the file without doing a lseek() and keeping track of where you are.
  4. the operating system takes care of paging in sections of the file you are using, and discarding sections that are not in use when memory is low.  This includes flushing dirty portions to disk.

What this means, is mmap gives you an easy way to work on large, file backed data structures, and the OS takes care of loading the portions you need, and saving the modified portions back to disk.  As most embedded systems are 32-bit, there is a limit to the file size you can use as virtual memory space is limited.

Test Application

Next I wrote a small application that is used to create, and modify files using mmap().  I wanted to experiment with creating files of various sizes, writing non-zero data at various intervals, and test the performance of this.  The test application source is located here.

Usage: sparse_file_mmap_test [OPTION]

  -s, --filesize   File size to allocate
  -o, --offset     Will write a few bytes every offset bytes
  -m, --modify     Modify existing file
  -d, --data       Data to write to file at offset location (0-255)

The application creates a file of size filesize, and writes the value of data to the file every offset bytes.  There is also an option to modify existing files, so we can test the performance of opening a large file, making a small change, and then closing it.

Test Results

The results are pretty amazing, and the performance is beyond my expectations. This type of problem is where you learn to appreciate the performance of an advanced operating system, and file system.  There is a reason for the complexity!  There are 3 things I wanted to test: 1) does the sparse file support work as expected? 2) can mmap be used to easily modify large files? 3) can mmap be used to work on data structures that are larger than physical RAM?

1. Sparse File Support

The basic tests above confirm that sparse file support works for an empty file, but what about a file that has some data every X bytes?  Below are the test results for creating a 10MB file, and writing data at various offset intervals.

Offset (bytes) File Size (KB)
1024 9784
2048 9784
4096 9784
8192 4904
16384 2464
32768 1244
65536 632
1048576 60
2097152 40
4194304 32

As soon as the offset was greater than the MMU page size (4KB), then the sparse file effect started to kick in and worked as expected.

2. Can mmap() be used to efficiently modify large files?

The test here was to open a large file, make a small change in the middle, and then close it.  In this case a 100MB file was created with a data value of 1 written every 1MB, and then re-opened the same file and wrote a data value of 2 every 2MB.  The od utility was used to examine the file to verify the contents were correct.

root@cm-x270:/media/mmcblk0p1# time sparse_file_mmap_test_arm -s104857600 \
-o1048576 -d1
Sparse File mmap() test
Filesize = 104857600, offset = 1048576, data = 1
size = 102400 KB
size on disk (KB):
508     sparse-file
real    0m 1.10s
user    0m 0.00s
sys     0m 0.43s

root@cm-x270:/media/mmcblk0p1# time sparse_file_mmap_test_arm -s104857600 \
-o2097152 -d2 -m
Sparse File mmap() test
Filesize = 104857600, offset = 2097152, data = 2
size = 102400 KB
size on disk (KB):
508     sparse-file
real    0m 0.37s
user    0m 0.01s
sys     0m 0.05s

root@cm-x270:/media/mmcblk0p1# time od -x sparse-file
0000000     0002    0000    0000    0000    0000    0000    0000    0000
0000020     0000    0000    0000    0000    0000    0000    0000    0000
*
4000000     0001    0000    0000    0000    0000    0000    0000    0000
4000020     0000    0000    0000    0000    0000    0000    0000    0000

....
*
614000000     0001    0000    0000    0000    0000    0000    0000    0000
614000020     0000    0000    0000    0000    0000    0000    0000    0000
*
620000000
real    0m 12.93s
user    0m 7.33s
sys     0m 1.70s

Creating and modifying large file-backed data structures is very fast and efficient, and takes on the order of 1 second for a 100MB file that contains data every 1MB.  Dumping the data with od took considerably longer (13 seconds) as 100MB of data needed to be processed.

Possible Issues

There are several things to watch out for when using sparse files and mmap()

  1. With sparse files, there is the potential to run out of disk space as the files are using less space on disk than the file size.  It is a good idea to monitor disk space when working with sparse files, so you don’t use up all of the disk space.
  2. mmap() requires virtual memory space for the size file it maps.  With embedded systems, this is less of an issue, because the physical RAM size tends to be much less than the 4GB virtual address space.  With a system that only has 64MB of RAM, mmap()’ing files in the 10’s of MB in size makes a lot of sense because it insures that you will not run the system out of memory, and yet there should be plenty of virtual address space to map in files of this size.

Conclusion

mmap() and sparse file support provide a very convenient solution for dealing with large, file-backed data structures.  One example of this type of data structure is any type of large matrix such as maps used in mapping applications.  Writing a “from scratch” solution to solve this problem would be a fairly large and difficult task.  Processing large amounts of data efficiently is becoming more and more important in many embedded systems. This example provides another compelling reason why implementing modern, data-centric embedded systems using Linux makes a lot of sense.