Setting up a Go development environment

Go has a pretty neat development environment, and its helpful to set up a standard GOPATH on your workstation up front.  This is what I do:

  • mkdir ~/go
  • add the following to .bashrc (or some file that configures your env on login)
    • export GOPATH=~/go
    • export PATH=$GOPATH/bin:$PATH

Now, after you log in, you can do things like:


A Review of Graphical Application Solutions for Embedded Linux Systems

One of the decisions we face when building Embedded Linux systems is what components to use. With Open Source software, there is often more than one good option. Graphical libraries are no exception. In this article, we’ll examine GTK+, Qt, EFL, Android, and HTML/Javascript.

There are many factors that go into a choice like this, but some of them are:

  • Does the application need to run on Windows or MacOS?
  • Does the GUI need to be viewed remotely over a network?
  • Are dynamic effects (think iPhone) desired?
  • Does the application need to run on low end CPU’s (ones without a GPU)?

Putting some effort into selecting the right GUI technology is important for the following reasons:

  • Most of the development effort for the product will likely be in the graphical application, so it makes sense to maximize the productivity of the application developers.  Typically there might be 3-10 application developers on a project for every system software developer.
  • You want a technology that will scale with new revisions and improvements of the product.
  • You want a technology that will be supported and improved long term.

With Open Source software, things are always changing.  Leading technologies change.  This can be illustrated by following Intel’s support for their open source software platforms.  While some of this may be driven by politics, we can also see clear technical and licensing reasons why these shifts were made.

  • Moblin (GTK+), became public around 2007, merged into MeeGo in 2010
  • MeeGo (Qt), announced in 2010-02, cancelled 2011-09
  • Tizen (HTML/Javascript, EFL), announced in 2011-09

We can see similar shifts in other companies:

  • Nokia started with GTK+ as the GUI technology for their tablet computers, and then shifted to Qt
  • Samsung has been using GTK+ on top of DirectFB, and now is moving toward EFL/HTML/Javascript for some phones
  • Many phone manufactures are producing Android products
  • Palm moved from a proprietary GUI to HTML in webOS


GTK+ is part of the GNOME desktop project and is perhaps the most used graphical library for desktop Linux applications, and in the past has been very popular in Embedded systems.  Nokia has invested heavily in GTK+ in the past with its early tablet products (N770, N800, N900, etc).  However, with the advent of the iPhone and faster processors with GPUs, everything has changed.  The standard is now dynamic GUI’s with sliding effects, etc.  The Clutter project is a library that can be used to build dynamic GUIs and fits in with the GNOME stack.  GTK+ supports Windows and MacOS, but probably not as well as Qt.


Qt is a very mature project that is also extensively used for desktop projects, and recently is being used on some of Nokia’s phones.  Qt was originally developed by the Norwegian company Trolltech. Originally, Qt was only offered with either a proprietary license or the GPL license.  This meant if you wanted to write a proprietary application using Qt, you had to purchase Trolltech’s commercial license. This factor alone probably made GTK+ a much more popular solution for many years for Embedded Linux GUI’s, including most cell phone stacks.  In 2008, Trolltech was acquired by Nokia, and shortly after that, Qt was offered under the LPGL license, giving it the same license as GTK+.

One of Qt’s compelling features introduced in Qt 4.7 is its QML (or Qt Quick) technology.  This allows you to write declarative GUI’s in a Javascript like syntax with many automatic bindings.  There is good support for dynamic operations like sliding effects, and the performance is reasonable, even on low-end systems without a GPU.

In the future, Qt 5.0 will require OpenGL, and hence be relegated to high end ARM CPU’s with a GPU, or desktop systems.

Qt’s cross platform support is excellent, and provides good native support for Linux, MacOS, and Windows.

Recently, Nokia has made efforts to set up Qt as more of a community project, instead of retaining exclusive control over it.


EFL (Enlightenment Foundation Libraries) is a project that started out as the Enlightenment window manager, and grew into a set of general purpose libraries.  It claims to be to be more efficient than GTK+ and Qt, and work with or without hardware acceleration.  Recently, EFL seems to have garnered the commercial interest of Samsung, Intel, and others involved in the Tizen project.  According to a presentation by Carsten Haitzler (one of EFL’s founders and developers), Samsung was using GTK+ and DirectFB, but switched to EFL after seeing the performance.  Perhaps the most compelling story for EFL is the high performance over a range of hardware capabilities (from simple phones with low end processors, to high end smart-phones running OpenGL).  Parts of EFL are also used in the commercial GUI FancyPants.


Android is an interesting GUI solution, especially as many developers have experience working on Android applications.  Android now seems to be used in many applications where Windows CE was used in the past, probably due to its polished application development tool-set.


Application development with HTML and Javascript is one of the more interesting developments because many embedded systems are headless (don’t have a local display). Couple this with the fact that many users now have smartphones or tablets readily available, and it may make sense to simply use an external device for displaying the UI.  There is often a requirement for accessing the UI of a device remotely, and in this case, HTML/Javascript works very well.  If the UI needs to be displayed locally, then a fairly powerful CPU is required (an ARM Cortex-A8, etc) to run a modern web browser.  If there is no physical display on the device, and the UI is accessed remotely on a computer or mobile device, then a less powerful CPU is required because the embedded device does not actually have to do any of the rendering.  HTML/Javascript also has the benefit that it is a very popular technology, thus there are many experienced developers.


Each of the above technologies has benefits and drawbacks.  Understanding your project’s requirements, and what each solution offers is key to making the best decision.


OpenEmbedded srctree and gitver

Recently an OpenEmbedded class name srctree became usable.  The srctree.bbclass enables operation inside of an existing source tree for a project, rather than using the fetch/unpack/patch idiom.  The srctree.bbclass in combination with the OpenEmbedded gitver.bbclass and git submodules provides a very interesting way to build custom software with OpenEmbedded.

One of the classic problems with OpenEmbedded is how do application and kernel developers use it.  While OpenEmbedded excels at automating image builds, it is less friendly when used as a a cross development tool.  Historically there are several options for iterative development:

  1. develop in the working directory: cd <tmp>/work/arm…/<my recipe>; ../temp/run.do_compile …
  2. a variation of #1: bitbake -c devshell <my recipe>
  3. manually set up an environment that uses the toolchain generated by OE.  As example see this script in the BEC OE template.
  4. a variation of #3: export a SDK that includes a toolchain and libs

While the above solutions work OK, the process can be a little cumbersome.  Unless your OE recipe pulls software directly from the TIP of a SVN repository, you may have to manually update the recipe after you make changes, create patch files, etc.  There is also the problem that if your recipe fetches the latest from SVN, it drastically slows down the recipe parsing as it has to check the repository for a new version every time the recipe is processed.

The optimal solution would to be to simply check a software component out of a version control system, and build it directly using OpenEmbedded.  Icing on the cake would be if the package generated would automatically extract version information from the version control system.  This would facilitate iterative development for software components that need to be cross compiled.

Although srctree can be used with any directory of source code, it really works best with a git repository.  The gitver.bbclass provides a GITVER variable which is a (fairly) sane version, for use in ${PV}, extracted from the ${S} git checkout, assuming it is one (text from recipe).  gitver uses the ‘git describe’ command to extract the last tag and uses that for the version.

The best way to illustrate the use of these tools is an example:

The easiest way to try this is clone the above project in your openembedded/recipes directory:

$ cd openembedded/recipes
$ git clone
$ cd autotools-demo
$ git describe
$ git tag -l

Notice that git describe simple returns the latest tag.   The recipe can be located in the same directory as the source code and has the following contents:

# recipe to build the autotools-demo in the working tree

inherit srctree autotools gitver

PV = "${GITVER}"

Can’t get much easier than that!  If you build the recipe, you end up with a package:

$ bitbake autotools-demo
$ ls tmp/deploy/glibc/ipk/armv5te/autotools-demo_1.1-r0.6_armv5te.ipk

Now, what happens if you make changes and commit them?

$ cd .../autotools-demo
$ (make a change and commit)
$ git describe
1.1-1-gfbc1ecc (notice the count and hash automatically appended)
$ (make another change and commit)
$ git describe
1.1-2-g7ad3715 (notice the count is now 2)

If we bitbake the recipe now, we end up with a packaged named:


The gitver class in OpenEmbedded automatically takes care of creating a usable PV (package version) that always increments.

So in summary, srctree and gitver give developers a convenient way to handle custom components that change often in an Embedded Linux build without increasing parse times, requiring manual tweaks to version numbers, or creating a separate workspace for each version of the application that is built.  As practices such as continuous integration become more common, OpenEmbedded features like this are increasingly needed.  An added benefit is that the OpenEmbedded recipe can now be stored in the same location as the source code.  Perhaps in the future, most applications will include an OpenEmbedded recipe as part of their source code and git submodules could be used to simple populate the components you want to use.

2017-11-08 update: A tool named devtool is now the preferred way to do much of the above.


Qt Creator for C/C++ development

Recently, I’ve been evaluating Qt Creator for general C/C++ development.  I’m currently involved in the development of a rather large C++ application that is approaching 200,000 lines of code and 1000 source modules.  In the past, I’ve typically used Vim for editing, and Eclipse as a gdb front-end when needed.  Qt Creator is a rather new tool developed by Nokia’s Qt team.  What initially attracted my attention was that one of the reasons the project was started was no existing tools effectively handled the Qt codebase, which is quite large.  Things I like about Qt Creator:

  • it works fairly well with any C++ Make based project.  This includes projects built with autotools as well as the Qt Qmake build system.
  • easy to import existing projects
  • it is very fast.  Indexing 200,000 lines of code happens in around a minute or so.
  • Provides a Vim compatibility mode that actually works
  • provides fast methods for finding files, symbols, and seeing where symbols are used
  • did I mention yet it is fast?

I also recorded a screencast that demos Qt Creator with a large project (can be viewed in firefox).  As always, I’m interested in what others find interesting in this or other tools.  Future efforts will be to use Qt Creator to build and remotely debug ARM binaries — I am interested in what others have done in this regard.

If you do try Qt Creator, I recommend the latest pre-release snapshot.


Best practices for building Gtk+ applications with OpenEmbedded

I recently wrote an article about best practices building Qt applications with OpenEmbedded, and it occured to me that I should write an equivalent article for Gtk+ applications.  The same points apply — put your application source in a SCM system, and put the install logic in the application source (read the above article).  The difference is that Gtk applications typically use autotools where Qt applications use qmake to build the application.  This article details how a minimal GTK+ application should be set up and built using OpenEmbedded.

Application Source

I created a sample GTK hello application located at:  This is a SVN repository, so you can simply “svn co” the above URI to check out the code.  If you are running Ubuntu, you can install the necessary tools to build a native Gtk+ application by:

  • sudo apt-get install libgtk2.0-dev build-essential autoconf automake pkg-config

To build on your x86 Linux PC, run the following steps:

This compiles and installs the application in to the “install” directory.  If you look in this directory, you will notice the application binary is installed in the “install/bin” directory.  Typically, the install directory is set to /usr/bin, but in this example we set it to install so we don’t need to run “make install” as root, but yet we can verify the install logic works properly.

OpenEmbedded Recipe

Now that you have verified the application builds and installs properly on a x86 PC, it is trivial to build the application in OpenEmbedded.  Create a recipe in your OE recipes directory named with the following contents:

DESCRIPTION = "Sample Gtk+ Hello application, used to demonstrate build system"
AUTHOR = "Cliff Brake <>"

SRCREV = "17"
PV = "0.0+svn${SRCREV}"
PR = "r0"

DEPENDS = "gtk+"

SRC_URI = "svn://;module=gtk_hello;proto=http;rev=17"

S = "${WORKDIR}/gtk_hello/"

inherit autotools

Now, run: bitbake gtk-hello.  That is it!  Building Linux applications is easy if you simply use the tools, whether it be autotools, qmake, etc.  Too often there is the tendency to set up your own compile steps with ${CC} variables, etc.  While this seems to be the simple approach at first glance (autotools is too hard), it quickly becomes unmaintainable and in the end is a lot more work than simply learning the basics of the industry standard tools.  See previous autotools article for more information.


Best practices for building Qt applications with OpenEmbedded

This article describes how to cross compile a Qt application (named qt_tutorial) with OpenEmbedded, and several best practices you should consider.  OpenEmbedded currently includes fairly good support for building Qt — both Qt Embedded and Qt X11.   OE also includes a number of qt classes that make building Qt applications easy.  One of the main considerations with embedded Linux application development is that you keep the build system flexible so that you can easily build on a PC or for your embedded hardware.  This means that hand crafted Makefiles with hardcoded paths to cross compilers do not quality.

Put Your Application in a SCM system

No matter what type of application you are building, it is a good idea to put your source in a SCM or revision control system and fetch it directly using OE.  OpenEmbedded supports fetching sources from a number of revision control systems including Subversion, Git, Bazaar, Mercurial, CVS, etc.   The not so obvious reason we do this is so you can easily check out the source code and build it on a PC as well as your target system in OE.  In this example we fetch the application source from a SVN repository:

SRC_URI = "svn://;module=qt_tutorial;proto=http"

For the above repository, the direct URI to the source would be:

Put the Install logic in the Application Source

Most Linux applications support “make install”, and this is the case with autotools, and qmake (Qt’s build tool).  We could put logic in the OE recipe to install the application something like the following:

do_install() {
	install -d ${D}/${bindir}
	install -m 0755  ${S}/qt_tutorial ${D}/${bindir}

But, the problem with this approach is you can’t install the application in other environments (like a native x86 PC build) unless you are building with OE.  So a better approach is to put the logic to install the application in the application source, so that in can be installed in both your PC environment, and your OpenEmbedded build.  To accomplish this, you can set up the project as follows

application qmake project file (

TARGET = qt_tutorial

# Input
HEADERS += lcdrange.h
SOURCES += lcdrange.cpp main.cpp

target.path = /usr/bin
INSTALLS += target

OpenEmbedded recipe (

inherit qt4e

PV = "1.0+svnr${SRCREV}"
PR = "r1"

SRC_URI = "svn://;module=qt_tutorial;proto=http"

S = ${WORKDIR}/qt_tutorial

FILES_${PN}-dbg += "${bindir}/.debug"

do_install() {
	export INSTALL_ROOT=${D}
	make install

Now, the same mechanism is used to install the application on both a PC native build, as well an OpenEmbedded build.  If you look in the Makefile generated by qmake, you see the following:

install_target: first FORCE
	@$(CHK_DIR_EXISTS) $(INSTALL_ROOT)/usr/bin/ || $(MKDIR) $(INSTALL_ROOT)/usr/bin/

install:  install_target  FORCE

INSTALL_ROOT can be set to force the entire system to be installed in a subdirectory.  This is required for build systems that generate packages, like OpenEmbedded.  To build this example, put the file in your OE recipes tree, and run: bitbake qt-tutorial.  This will fetch the source code and build a package.  To run the tutorial, install the package on a system that includes Qt Embedded, and then run: qt_tutorial -qws.


Embedded Linux versus Windows CE

Occasionally I am asked how Embedded Linux compares with Windows CE.  I have spent the past 5 years doing mostly embedded Linux development, and the previous 5 years doing mostly WinCE development with a few exceptions, so my thoughts are no doubt a little biased toward what I understand best.  So take this with a grain of salt 🙂  In my experience, the choice is often made largely on perception and culture, rather than concrete data.  And, making a choice based on concrete data is difficult when you consider the complexity of a modern OS, all the issues associated with porting it to custom hardware, and unknown future requirements.  Even from an application perspective, things change over the life of a project.  Requirements come and go.  You find yourself doing things you never thought you would, especially if they are possible.  The ubiquitous USB and network ports open a lot of possibilities — for example adding Cell modem support or printer support. Flash based storage makes in-field software updates the standard mode of operation.  And in the end, each solution has its strengths and weaknesses — there is no magic bullet that is the best in all cases.

When considering Embedded Linux development, I often use the iceberg analogy; what you see going into a project is the part above the water.  These are the pieces your application interacts with, drivers you need to customize, the part you understand.  The other 90% is under water, and herein lies a great deal of variability.  Quality issues with drivers or not being able to find a driver for something you may want to support in the future can easily swamp known parts of the project.  There are very few people who have a lot of experience with both WinCE and Linux solutions, hence the tendency to go with what is comfortable (or what managers are comfortable with), or what we have experience with.  Below are thoughts on a number of aspects to consider:


Questions in this realm include CPU support, driver quality, in field software updates, filesystem support, driver availability, etc.  One of the changes that has happened in the past two years, is CPU vendors are now porting Linux to their new chips as the first OS.  Before, the OS porting was typically done by Linux software companies such as MontaVista, or community efforts.  As a result, the Linux kernel now supports most mainstream embedded cpus with few additional patches.  This is radically different than the situation 5 years ago.  Because many people are using the same source code, issues get fixed, and often are contributed back to the mainstream source.  With WinCE, the BSP/driver support tends to be more of a reference implementation, and then OEM/users take it, fix any issues, and that is where the fixes tend to stay.

From a system perspective, it is very important to consider flexibility for future needs.  Just because it is not a requirement now does not mean it will not be a requirement in the future.  Obtaining driver support for a peripheral may be nearly impossible, or be too large an effort to make it practical.

Most people give very little thought to the build system, or never look much beyond the thought that “if there is a nice gui wrapped around the tool, it must be easy”.  OpenEmbedded is very popular way to build embedded Linux products, and has recently been endorsed as the technology base of MontaVista’s Linux 6 product, and is generally considered “hard to use” by new users.  While WinCE build tools look simpler on the surface (the 10% above water), you still have the problem of what happens when I need to customize something, implement complex features such as software updates, etc.  To build a production system with production grade features, you still need someone on your team who understands the OS and can work at the detail level of both the operating system, and the build system.  With either WinCE or Embedded Linux, this generally means companies either need to have experienced developers in house, or hire experts to do portions of the system software development.  System software development is not the same as application development, and is generally not something you want to take on with no experience unless you have a lot of time.  It is quite common for companies to hire expert help for the first couple projects, and then do follow-on projects in-house.  Another feature to consider is parallel build support.  With quad core workstations becoming the standard, is it a big deal that a full build can be done in 1.2 hours versus 8?  How flexible is the build system at pulling and building source code from various sources such as diverse revision control systems, etc.

Embedded processors are becoming increasingly complex.  It is no longer good enough to just have the cpu running.  If you consider the OMAP3 cpu family from TI, then you have to ask the following questions: are there libraries available for the 3D acceleration engine, and can I even get them without committing to millions of units per year?  Is there support for the DSP bridge?  What is the cost of all this?  On a recent project I was involved in, a basic WinCE BSP for the Atmel AT91SAM9260 cost $7000.  In terms of developer time, this is not much, but you have to also consider the on-going costs of maintenance, upgrading to new versions of the operating system, etc.


Both Embedded Linux and WinCE support a range of application libraries and programming languages.  C and C++ are well supported.  Most business type applications are moving to C# in the WinCE world.  Linux has Mono, which provides extensive support for .NET technologies and runs very well in embedded Linux systems.  There are numerous Java development environments available for Embedded Linux.  One area where you do run into differences is graphics libraries.  Generally the Microsoft graphical APIs are not well supported on Linux, so if you have a large application team that are die-hard windows GUI programmers, then perhaps WinCE makes sense.  However, there are many options for GUI toolkits that run on both Windows PCs and Embedded Linux devices.  Some examples include GTK+, Qt, wxWidgets, etc.  The Gimp is an example of a GTK+ application that runs on windows, plus there are many others.  The are C# bindings to GTK+ and Qt.  Another feature that seems to be coming on strong in the WinCE space is the Windows Communication Foundation (WCF).  But again, there are projects to bring WCF to Mono, depending what portions you need.  Embedded Linux support for scripting languages like Python is very good, and Python runs very well on 200MHz ARM processors.

There is often the perception that WinCE is realtime, and Linux is not.  Linux realtime support is decent in the stock kernels with the CONFIG_PREEMPT option, and real-time support is excellent with the addition of a relatively small real-time patch.  You can easily attain sub millisecond timing with Linux.  This is something that has changed in the past couple years with the merging of real-time functionality into the stock kernel.


In a productive environment, most advanced embedded applications are developed and debugged on a PC, not the target hardware.  Even in setups where remote debugging on a target system works well, debugging an application on a workstation works better.  So the fact that one solution has nice on-target debugging, where the other does not is not really relevant.  For data centric systems, it is common to have simulation modes where the application can be tested without connection to real I/O.  With both Linux and WinCE applications, application programing for an embedded device is similar to programming for a PC.  Embedded Linux takes this a step further.  Because embedded Linux technology is the same as desktop, and server Linux technology, almost everything developed for desktop/server (including system software) is available for embedded for free.  This means very complete driver support (see USB cell modem and printer examples above), robust file system support, memory management, etc.  The breadth of options for Linux is astounding, but some may consider this a negative point, and would prefer a more integrated solution like Windows CE where everything comes from one place.  There is a loss of flexibility, but in some cases, the tradeoff might be worth it.  For an example of the number of packages that can be build for Embedded Linux systems using Openembedded, see


It is important to consider trends for embedded devices with small displays being driven by Cell Phones (iPhone, Palm Pre, etc).  Standard GUI widgets that are common in desktop systems (dialog boxes, check boxes, pull down lists, etc) do not cut it for modern embedded systems.  So, it will be important to consider support for 3D effects, and widget libraries designed to be used by touch screen devices.  The Clutter library is an example of this.


Going back to the issue of debugging tools, most people stop at the scenario where the device is setting next to a workstation in the lab.  But what about when you need to troubleshoot a device that is being beta-tested half-way around the world?  That is where a command-line debugger like Gdb is an advantage, and not a disadvantage.  And how do you connect to the device if you don’t have support for cell modems in New Zealand, or an efficient connection mechanism like ssh for shell access and transferring files?


Selecting any advanced technology is not a simple task, and is fairly difficult to do even with experience.  So it is important to be asking the right questions, and looking at the decision from many angles.  Hopefully this article can help in that.  For additional assistance, please do not hesitate to contact BEC Systems — we’re here to help.


Dealing with large data structures efficiently in embedded systems

I’m currently dealing with a programming problem where I need access to several 64MB, file-backed data structures concurrently on an Embedded Linux system that only has 64MB of RAM.  The data structures are fairly sparse (mostly zero data), and I typically only need to access small portions of them at any particular time.  There is always the brute-force approach where you write code to manually load sections of the file as you need them.  But with a little thought, the realisation hits “this is what operating systems do.”  This article explores using memory mapping, and the sparse file support in file systems to solve this problem in a very efficient manner.

Sparse File Support

Most Unix file systems support sparse files.  This means that sections of data that is zeros is not stored on the disk.  Consider the following example where we create a 200MB file that is all zeros:

root@cm-x270:/media/mmcblk0p1# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mmcblk0p1         3917212    202448   3515776   5% /media/mmcblk0p1

root@cm-x270:/media/mmcblk0p1# dd if=/dev/zero of=sparse-file bs=1 \
count=1 seek=200M

root@cm-x270:/media/mmcblk0p1# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mmcblk0p1         3917212    202456   3515768   5% /media/mmcblk0p1

root@cm-x270:/media/mmcblk0p1# ls -l
-rw-r--r--    1 root     root    209715201 May 22 08:54 sparse-file

root@cm-x270:/media/mmcblk0p1# time od sparse-file
0000000  000000 000000 000000 000000 000000 000000 000000 000000
real    0m 17.23s
user    0m 14.21s
sys     0m 2.66s

root@cm-x270:/media/mmcblk0p1# du sparse-file
68      sparse-file

It this case we created a 200MiB file on a SD card formatted as ext3. Even though the file is 200M in size, it only uses a few KiB of disk space.  This same test also worked fine with a JFFS2 filesystem.  With sparse file support, we get a cheap form of run length compression (at least for zeros) with no effort.


The mmap() system call is used to map a file, or portions of a file into memory.  The data in the file can then be accessed directly in memory.  Because Linux is a demand paged system, portions of the file are paged in as needed, so the entire file does not need to be present in RAM at one time.  There are several advantages to using mmap():

  1. mmap() avoids the extraneous copy that occurs when using read() or write() as the data does not need to be copied to a user space buffer.
  2. there is very little overhead
  3. you can directly access any part of the file without doing a lseek() and keeping track of where you are.
  4. the operating system takes care of paging in sections of the file you are using, and discarding sections that are not in use when memory is low.  This includes flushing dirty portions to disk.

What this means, is mmap gives you an easy way to work on large, file backed data structures, and the OS takes care of loading the portions you need, and saving the modified portions back to disk.  As most embedded systems are 32-bit, there is a limit to the file size you can use as virtual memory space is limited.

Test Application

Next I wrote a small application that is used to create, and modify files using mmap().  I wanted to experiment with creating files of various sizes, writing non-zero data at various intervals, and test the performance of this.  The test application source is located here.

Usage: sparse_file_mmap_test [OPTION]

  -s, --filesize   File size to allocate
  -o, --offset     Will write a few bytes every offset bytes
  -m, --modify     Modify existing file
  -d, --data       Data to write to file at offset location (0-255)

The application creates a file of size filesize, and writes the value of data to the file every offset bytes.  There is also an option to modify existing files, so we can test the performance of opening a large file, making a small change, and then closing it.

Test Results

The results are pretty amazing, and the performance is beyond my expectations. This type of problem is where you learn to appreciate the performance of an advanced operating system, and file system.  There is a reason for the complexity!  There are 3 things I wanted to test: 1) does the sparse file support work as expected? 2) can mmap be used to easily modify large files? 3) can mmap be used to work on data structures that are larger than physical RAM?

1. Sparse File Support

The basic tests above confirm that sparse file support works for an empty file, but what about a file that has some data every X bytes?  Below are the test results for creating a 10MB file, and writing data at various offset intervals.

Offset (bytes) File Size (KB)
1024 9784
2048 9784
4096 9784
8192 4904
16384 2464
32768 1244
65536 632
1048576 60
2097152 40
4194304 32

As soon as the offset was greater than the MMU page size (4KB), then the sparse file effect started to kick in and worked as expected.

2. Can mmap() be used to efficiently modify large files?

The test here was to open a large file, make a small change in the middle, and then close it.  In this case a 100MB file was created with a data value of 1 written every 1MB, and then re-opened the same file and wrote a data value of 2 every 2MB.  The od utility was used to examine the file to verify the contents were correct.

root@cm-x270:/media/mmcblk0p1# time sparse_file_mmap_test_arm -s104857600 \
-o1048576 -d1
Sparse File mmap() test
Filesize = 104857600, offset = 1048576, data = 1
size = 102400 KB
size on disk (KB):
508     sparse-file
real    0m 1.10s
user    0m 0.00s
sys     0m 0.43s

root@cm-x270:/media/mmcblk0p1# time sparse_file_mmap_test_arm -s104857600 \
-o2097152 -d2 -m
Sparse File mmap() test
Filesize = 104857600, offset = 2097152, data = 2
size = 102400 KB
size on disk (KB):
508     sparse-file
real    0m 0.37s
user    0m 0.01s
sys     0m 0.05s

root@cm-x270:/media/mmcblk0p1# time od -x sparse-file
0000000     0002    0000    0000    0000    0000    0000    0000    0000
0000020     0000    0000    0000    0000    0000    0000    0000    0000
4000000     0001    0000    0000    0000    0000    0000    0000    0000
4000020     0000    0000    0000    0000    0000    0000    0000    0000

614000000     0001    0000    0000    0000    0000    0000    0000    0000
614000020     0000    0000    0000    0000    0000    0000    0000    0000
real    0m 12.93s
user    0m 7.33s
sys     0m 1.70s

Creating and modifying large file-backed data structures is very fast and efficient, and takes on the order of 1 second for a 100MB file that contains data every 1MB.  Dumping the data with od took considerably longer (13 seconds) as 100MB of data needed to be processed.

Possible Issues

There are several things to watch out for when using sparse files and mmap()

  1. With sparse files, there is the potential to run out of disk space as the files are using less space on disk than the file size.  It is a good idea to monitor disk space when working with sparse files, so you don’t use up all of the disk space.
  2. mmap() requires virtual memory space for the size file it maps.  With embedded systems, this is less of an issue, because the physical RAM size tends to be much less than the 4GB virtual address space.  With a system that only has 64MB of RAM, mmap()’ing files in the 10’s of MB in size makes a lot of sense because it insures that you will not run the system out of memory, and yet there should be plenty of virtual address space to map in files of this size.


mmap() and sparse file support provide a very convenient solution for dealing with large, file-backed data structures.  One example of this type of data structure is any type of large matrix such as maps used in mapping applications.  Writing a “from scratch” solution to solve this problem would be a fairly large and difficult task.  Processing large amounts of data efficiently is becoming more and more important in many embedded systems. This example provides another compelling reason why implementing modern, data-centric embedded systems using Linux makes a lot of sense.


GTK performance on PXA270 vs. OMAP3

Several of my customers have built applications using the GTK+ tookit.  While GTK+ works fairly well for what we have done, I have been wondering how the performance compares on the new Omap3 processors from TI. As we are evaluating the OMAP3 for several projects, I did a simple comparison with an existing application.  Below is a video that shows a fairly complex application running on both a PXA270, and a OMAP3530.  While the PXA270 gets the job done, the result on the OMAP3 is much more pleasing.  With that advent of a OMAP3 module available for $117 in volume, it seems like the OMAP3 will be a popular solution for upcoming Embedded Linux projects.


How to implement an interrupt driven GPIO input in Linux

With Linux, some of the things that seem like they should be easy are not — at least at first glance.  For example, how do you read an interrupt driven GPIO input in a Linux application?  With simpler microcontroller systems, this is straightforward, but with a system like Linux, you have to navigate through several layers of software (and for very good reasons).  You can’t handle interrupts directly in a Linux application, so this means you need a kernel component involved.  This operation of reading a GPIO resembles a key press, so the Linux input subsystem might be a good place to start looking.  Once we take that route, we discover the gpio_keys driver.

Configuring the gpio_keys driver

The gpio_keys driver is configured with a few lines of code in your Linux board configuration file.  An example is below:

static struct gpio_keys_button svs_button_table[] = {
  { .code = KEY_RECORD,
    .gpio = PP_GPIO_MIC_EN,
    .active_low = 1,
    .desc = "MIC_EN",
    .type = EV_KEY,
    .wakeup = 0,

In this application, we are reading button presses on a cell phone style headset.  Once the gpio_keys driver is configured, a new entry will show up in /dev/input/eventX.  An application can then do a blocking read on this device.

Reading the GPIO in an application

To read the GPIO, we simply do a blocking read on the new /dev/input/eventX device.  The read will block until there is a change in GPIO state.  An example is show below:

#define MIC_INPUT_DEV  "/dev/input/event0"

static gboolean mic_button_callback(GIOChannel *source, GIOCondition condition, gpointer data)
  struct input_event ev;
  int bytes_read;

  g_io_channel_read_chars(source, (gchar *)&ev, sizeof(ev), &bytes_read, NULL);

  if (bytes_read > 0) {
    if (bytes_read != sizeof(ev)) {
      s_debug(1, "warning, only read %i bytes from mic input");
      return TRUE;
  } else {
    return TRUE;

  if (ev.type != EV_SYN && ev.value == 1) {
    /* button pressed, do something ... */

  return TRUE;

void mic_button_init()
  GIOChannel * micbutton = g_io_channel_new_file(MIC_INPUT_DEV, "r", NULL);

  if (micbutton == NULL) {
    s_debug(TRUE, "Error initializing mic button");

  g_io_channel_set_encoding(micbutton, NULL, NULL);

  guint id = g_io_add_watch(micbutton, G_IO_IN, mic_button_callback, NULL);

The above example also shows how to incorporate the GPIO read into a GLib mainloop so that you don’t need to create a separate thread. ( As a side, GLib mainloop programming is worth learning!)  Using this method, reading a GPIO interrupt is easy and requires very few lines of code.  This is typical of complex systems like Linux — if you know how to do something, it is relatively easy, but getting started down the right path is sometimes the challenge.


Using the Vala Programming Language in Embedded Systems

Recently I’ve been following the Vala programming language and using it some in embedded systems.  Vala is a new programming language that aims to bring modern programming features to GNOME developers without imposing additional runtime requirements and without using a different ABI compared to applications and libraries written in C.  A few notes and observations about Vala:

  • language syntax that resembles C#, so you can write code faster with less mistakes
  • Vala compiles to C, so it starts fast and runs fast like native applications
  • binaries are not platform independent  (like Mono or Java)
  • lots of bindings already exist, as it is very easy to write bindings to existing C libraries
  • easy to write programs that are mixed C and Vala
  • documentation is still a bit sparse, so you end up reading the binding files to figure out how to use the libraries

Much of the core Vala functionality is built on top of Glib.  Having programmed extensively with Glib, I can say programming in Vala is a lot more fun, and a lot less tedious.  Vala also provides dynamic D-Bus bindings which makes if very nice for writing system daemons or other bits of software that need to implement a D-Bus server.

As far as real world experience, in a recent application, we ported a system monitoring application from C# to Vala.  There were no real serious problems with the original application, but it started slow (7 seconds or so), and the customer wanted to reduce the boot time of the system.  Once you get past the core language syntax, the libraries for Vala are all different than C#, so most of the library function calls needed tweaked a little.  Most of the functionality was implemented with Vala, but there were a few minor functions that were implemented in C.  It now starts fast, which is what we needed.

We are also using the fsod application from the OpenMoko project in a customer project.  This is an excellent example of a well written, advanced Vala project that uses features such as plugins and D-Bus.

Watching the releases from the Vala project, it is amazing the progress this language is making.  I’m sure we’ll be hearing more about Vala in the future.


Printing from Embedded Systems

How does one implement support for printing in embedded systems? I recently had the opportunity to add printing support to an embedded Linux system.  The device is an industrial touch screen powered by a Compulab cm-x270 module (PXA270 CPU), and runs a GTK+ application.  The customer is implementing a device calibration system where customers bring their equipment in to get calibrated, and the system prints out a report on a local printer.  This article describes how components of Hewlett Packard’s HPLIP solution along with Cairo can be used to implement printer support in a non-desktop Linux system.


The requirements for this project were fairly basic — we needed to print a single page report that contained text and some elementary graphics.  We initially wanted to support several low-end Inkjet and Laser USB printers.  As the system was powered by a fairly slow ARM processor (slow compared to modern desktop systems), the solution needed to be efficient, and not require excessive amounts of memory.

Desktop Linux Print Flow

With desktop Linux systems, the standard printing flow looks something like this:

Application -> PS output -> Ghostscript -> Rasterized output -> Printer Driver (filter) -> I/O Backend -> Printer

CUPS is typically used to manage this flow, provide queuing, etc.

Print Architecture

As the application is written in GTK+, we decided to generate the report using Cairo, which is a 2D graphics library.  Cairo is easy to use, and is well suited for this application.  With a Cairo generated report, we already had a raster image of the report, so it seemed that the PS output/Ghostscript steps were not really needed and only added more processing to the data flow.  Also the PXA270 CPU does not have a FPU, and PostScript processing can be floating point intensive.  So the need was now to figure out how to get a Cairo generated raster image to a USB printer.

Hewlett Packard offers lots of interesting software for their printers.  Their APDK is a OS independent library in source code form.  However, implementing this would have required quite a bit of integration, and writing the I/O layer.  It seemed like there should be something available that would work with a little less effort.  The HPLIP project is a comprehensive set of software for Linux printing, and is used by desktop Linux systems.  It is not obvious at a glance how all the components of HPLIP fit together, but after spending some time digging through source code, and asking questions on several maillists/forums, we were able to figure out that the basic flow is:

Ghostscript -> HPIJS (driver) -> HP backend -> Printer

In this application, we simply launched the HPIJS driver directly from our application instead of Ghostscript.


There were several challenges using HPLIP in our ARM system.  The HPLIP is not cross-compilation friendly out of the box, so we had to fix a few issues, and ended up disabling all the Python pieces as we only needed the HPIJS and backend components.  We also had to figure out the data flow from the application to HPIJS, and then to HP backend.  In a nutshell, the process is:

  • start the hp backend process by forking, and obtain a file handle to STDIN for the hp backend process
  • start the HPIJS process, send various parameters to it including the file handle for the backend STDIN
  • send HPIJS the print raster and tell it to print

There are some additional details such as handling margins, error conditions, cancel support, etc.  Some of the details can be gleaned from the Ghostscript source code, and the IJS reference implementation provides some very useful library code for implementing the IJS client functionality in the application that is doing the printing.  When finished, our printing module was 403 source lines of code (SLOC) — not bad considering the functionality.

How well does it work?

Overall, we are very pleased with the result.  With a simple industrial terminal, we can now support just about every USB printer made by HP with the exception of some of their very low end LaserJet printers that require a binary plugin.  The HP backend can be used to detect what printer is attached at run time, so everything is plug-n-play with no user configuration.  Let me repeat as this is significant — on a simple Industrial terminal, we can support about every HP USB printer available with no user configuration.  This is easier for users than their desktop as they simply need to buy a printer and plug it in!  Kudos to HP for their excellent open source software.  Because the source code is available, we were able to customize it for our application with very little support from HP.  Performance wise, the printing process is quite fast; the printer starts almost immediatly after the user initiates the print.  Now if I could just get HP to build a ARM version of their binary plugin for the few low end lasers we can’t support … but this is not a critical issue for this vertical application.

Future Direction

Hopefully, this type of solution can evolve into a standard printing solution for Embedded Linux systems.  Epson also supports the IJS driver model, so adding support for their printers should be possible.  It may eventually make sense to integrate portions of CUPS for queing, and other management tasks.  Some of the tasks I hope to implement in the future:

  • Clean up the HPLIP build for inclusion in OpenEmbedded.
  • Clean up the IJS reference code library build and packaging in OpenEmbedded.
  • Figure out portions of CUPS that may make sense.
  • Keep conversations going with the Linux printing group so we can someday have a “standard” printing solution for embedded Linux systems.

Thanks to Matt Gessner for helping implement this solution, and for providing feedback on this article.


Should you be using monotonic timers?

In a previous article ( ), I covered some of the basics of Linux timers.  Any time you are doing any type of fixed time delay in a program, you should really be using monotonic times, so the delay will not be affected by system time changes.  In an effort to save cost, some embedded systems today do not have a battery backed up RTC, and instead get the time via GPS, NTP servers, or other clever means.  What this means is your applications had better be able to handle the system time changing as the system time may not be set until well after the unit boots.  This article describes how you can quickly test your system for timer problems.

There are two cases where delays may fail if you are using non-monotonic timers.  The first is if the time advances forward by a large amount.  Delays will expire immediately in this case.  The other case is if the time advances backward by a large amount.  Delays will never expire in this case.  To test for these situations, write a simple test application (sample included below) that rapidly changes the time and then test your application while the time is changing.

#include <stdio.h>
#include <time.h>

int main()
  time_t system_time;
  static int count = 0;

  printf("Starting test application...\n");
    // Sleep for 0.01 seconds
    usleep(10 * 1000);

    system_time = time(NULL);
    system_time += 30;
    printf("Cycle %d - system date/time set to %s", ++count, ctime(&system_time));

How to implement realtime periodic tasks in Linux applications

Have you ever wondered what is the best way to implement periodic tasks in Linux applications — something better than usleep()?  This article covers a number of issues related to this subject including real-time tasks, the different timers available, timer resolution, and how to implement periodic tasks accurately so that error is not accumulated.  The recent inclusion of the high-resolution timers in the mainstream kernel makes this a very interesting subject.

High Resolution Timers

Historically Linux has done all timing off the OS timer tick which is usually between 1ms and 10ms.  While this is adequate for many tasks, it is really nice to have a high resolution timer for timing tasks.  With the integration of the high resolution timers into the mainstream Linux source tree, this is now possible.  From a user space perspective, there are no API changes.  The only difference you will notice is that now you can sleep for less than OS timer tick period.  clock_getres() can be used to check the timer resolution and will tell you instantly if you have high resolution timer support.  If the clock resolution is 1ns, you have high res timer support.  Realistically, you can’t delay for 1ns in a Linux application, but delays in the range of 100us should be possible, and depending on the configuration, much better performance is possible.  The kernel config entry for high resolution timers is CONFIG_HIGH_RES_TIMERS.

The difference between CLOCK_REALTIME and CLOCK_MONOTONIC

Some of the Linux timer functions take a clockid_t argument that can be specified as CLOCK_REALTIME, or CLOCK_MONOTONIC.  The big difference between these two is that changing the system time will have an affect on CLOCK_REALTIME, thus affecting your timers.  Changing the system time will have no affect on CLOCK_MONOTONIC — it will always count upward.  For periodic tasks, CLOCK_MONOTONIC may be more applicable.  The best way to get burned using CLOCK_REALTIME is when your application takes a timestamp, does something, takes another time stamp and then compares them.  If you are using CLOCK_REALTIME, and the system time gets changed between the two timestamps, your comparison will not be valid.  For most timeouts and relative timekeeping in Linux, use CLOCK_MONOTONIC.  This issue becomes more important as many systems now have a process that periodically sets the time automatically from network time servers, and you have no idea when this might happen.


Kernel preemption ( ) makes a huge difference in the performance of real time applications by allowing the kernel to be preempted by higher priority application processes.  Historically, any kernel code that was runnable ran before the kernel returned control to applications.  This all changes with kernel preemption.  Kernel preemption has been available in mainline kernels for some time now (I think since 2.6.16).  The worst offender I’ve found for locking the kernel for long periods of time have been flash drivers — especially proprietary ones.  But, even jffs2 can cause problems in realtime applications without kernel preemption.

The PREEMPT_RT patch

Much of the realtime work being done for Linux is maintained in the PREEMPT_RT patch.  Bits of this patch have already been merged into the mainline kernel, but there is still a lot of very useful functionality in the patch and it should be considered if you are doing any type of realtime work.  More details will be presented in a future article.

Absolute vs Relative timekeeping

When implementing a periodic task, you really want to base your timekeeping off an absolute time, versus relative delays such as usleep().  It is not possible to achieve precise periodic activation with a relative sleep such as usleep().  The reason for this is you must first get the current time, make a calculation to determine how long to sleep, and then call the relative sleep function.  If your process gets preempted between the time you acquire the timestamp, and the sleep, your relative sleep time will probably be wrong.  This problem is solved by using the clock_nanosleep() function.  clock_nanosleep() can be called with the TIMER_ABSTIME value in the flags argument.  If  the TIMER_ABSTIME flag is set, then clock_nanosleep() will sleep until the absolute timer value is reached.  It does not matter if you get preempted between the time you take a timestamp, and the sleep function.


In summary, if you want to implement accurate, realtime, periodic tasks in a Linux application, consider the following:

  • Use high resolution timers.  Although Linux is still limited in its response time, at least the scheduling resolution is now quite high.
  • Enable kernel preemption.  This makes sure long kernel processes don’t get in the way of your real-time application process.
  • Use the CLOCK_MONOTONIC for relative timekeeping.  You don’t want your application to lock up due to the system time changing.
  • use the clock_nanosleep() function instead of relative delays like usleep().  This is the only way to accurately schedule periodic tasks.
  • if needed, apply the PREEMPT_RT patch.

The clock_nanosleep man page also includes a lot of useful information about timer functions.