Tips for reading a serial data stream in Python

Interfacing with a RS232 serial device is a common task when using Python in embedded applications. The easiest way to get python talking to serial ports is use the pyserial project found at http://pyserial.sourceforge.net/. This module works on most platforms and is straightforward to use (see examples on project web site). However, getting the read function in this module to operate in an optimal way takes a little study and thought. This article investigates how the pyserial module works, possible issues you might encounter, and how to optimize serial reads.

We start out with several goals as to how we want the application to behave in relation to the serial port:

application must block while waiting for data.
for performance reasons, we want to read decent size chunks of data at a time if possible. Python function calls are expensive, so performance will be best if we can read more than one byte at a time.
We want any data received returned in a timely fashion.

A key parameter in the pyserial Serial class is the timeout parameter. This parameter is defined as:

 timeout=None,           #set a timeout value, None for waiting forever

The Serial class read function also accepts a size parameter that indicates how many characters should be read. Below is the source for the read function on Posix systems (Linux, etc):

     def read(self, size=1):
        """Read size bytes from the serial port. If a timeout is set it may
           return less characters as requested. With no timeout it will block
           until the requested number of bytes is read."""
        if not self.fd: raise portNotOpenError
        read = ''
        inp = None
        if size > 0:
            while len(read) < size:
                #print "\tread(): size",size, "have", len(read)    #debug
                ready,_,_ = select.select([self.fd],[],[], self.timeout)
                if not ready:
                    break   #timeout
                buf = os.read(self.fd, size-len(read))
                read = read + buf
                if self.timeout >= 0 and not buf:
                    break  #early abort on timeout
        return read

The easy way to use this module is to simply set the timeout to None, and read size to 1. This will return any data received immediately. But, this setup is very inefficient when transferring large amounts of data due to the Python processing overhead.

To meet our goal of reading multi-byte blocks of data at a time, we need to pass the read function a size greater than 1. However, if timeout is set to None, the read will block until size bytes have been read, which does not meet the goal of returning any data read in a timely fashion. The solution then is to:

set the read size high enough to get good performance
set the timeout low enough so that any data received is returned in a reasonable timeframe, but yet the application spends most of its time blocked if there is no data.

As an example, a size of 1000 and a timeout of 1 second seems to perform well. When used this way, the pyserial module performs well and returns all data read quickly.

Tips for reading a serial data stream in Python

2 thoughts on “Tips for reading a serial data stream in Python”