How to Optimize Python reads for general data

Posted by Cliff Brake on 2009-01-09 | Read the First Comment

The Python file object read() function acts a little differently than the standard read() found in C.  This article describes some of these differences and how to optimize reads for general continuous data streams such as reading data from a collection device through a pipe.

Python read()

The Python read function seems to be optimized for reading files and text oriented streams.  By default, a read() call will block until a EOF is encountered.  This is very handy for reading files from a disk.  You can just slurp them up with one read() statement.  If you pass a size parameter to read(), it will block until size bytes have been received.  This is less than ideal for reading a continuous data stream where some of thye data may be stuck until the size threshhold is reached.

Non-blocking read()

The way you get read to return with whatever data is available even if it is less than the read size parameter is to set the file object up in non-blocking mode.  This can be done using the fcntl module:

 		flags = fcntl.fcntl(fp, fcntl.F_GETFL)
            	fcntl.fcntl(fp, fcntl.F_SETFL, flags | os.O_NONBLOCK)

However, now the application does not block while waiting for data and spins using CPU resources.

select()

Enter the select call.  The Python select module does much the same thing as the C select() function.  In this case, it can be used to block waiting for data from a non-blocking file object with the added benefit of a timeout.  So the resulting code might look like:

 	fp = os.popen(<application that returns data to stdout>, 'r')

	flags = fcntl.fcntl(fp, fcntl.F_GETFL)
    	fcntl.fcntl(fp, fcntl.F_SETFL, flags | os.O_NONBLOCK)

	while 1:
		[i, o, e] = select.select([fp], [], [], 5)
		if i: s_ = fp.read(1000)
		else: s_ = ''

		if s_:
			logging.debug("received %i bytes of data, total = %i" % (len(s_), total))
			<do something with s_>

The above select statement blocks until data is available from the fp object or times out after 5 seconds.  Reading continuous data streams in Python is very possible, but usually requires the file object to be set in non-blocking mode and a select used to block while waiting for data.

  • Nelson said,

    You might want to use os.open also, intended for low level use.