Interpreting Serial Data

Originally written on August 23, 2014 by Tom Igoe
Last modified on August 27, 2016 by Tom Igoe

 

Introduction

Serial data is passed byte by byte from one device to another, but it’s up to you to decide how each device (computer or microcontroller) should interpret those bytes, when the beginning of a message is, when the end is, and what to do with the bytes in between.  Serial communication protocols define the structure of communication between devices. These notes explain how serial data is interpreted by computers.

To get the most out of these notes, you should know what a microcontroller is and have an understanding of the basics of microcontroller programming. You should also understand the basics of serial communication. You should have some understanding of programming a personal computer as well, ideally in a language that can access the serial ports of the computer, like Processingnode.js, Python, or Java.

ASCII and Binary

Related video: ASCII Bytes Explained

Imagine that you’re sending the value of one sensor from a microcontroller to a personal computer. If the sensor’s value is always less than 255, you know it can fit in a single byte. This kind of message is easy. Just send the value over and over, and the receiver can read the latest byte to have your whole message. In Arduino, you can do this using the Serial.write() command, like so:

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.write(sensorValue);
}

Imagine a typical stream of bytes sent by the program above:

23 23 23 23 24 24 25 25 26 28 27 27 27

Every value you send can range from 0 to 255, the full range that can fit in a byte.  A protocol like this is often called a binary protocol, because there’s no other meaning to each byte’s value other than the number itself.

Now imagine you want to send two sensor readings.  You might think “No problem; just add another analogRead() and another Serial.write().” But the resulting stream of data would look the same. Each byte would range from 0 to 255, and you’d have no way to know which byte represents which sensor. You need some punctuation bytes.

If you’re sending more than one value (and you usually are), then the receiving computer has to know when the message starts and when it ends, and it needs to know how to put the bytes together into a message.

Computers use numbers to represent alphanumeric characters (letters and numbers and punctuation) in bytes. The first standard code for doing this was called the ASCII code which stands for American Standard Code for Information Interchange. ASCII assigns each number or letter a specific byte value from 0 to 127.  For example, capital A is ASCII value 65. Capital B is 66. A space is ASCII value 32. The numeral 0 is ASCII 48. ASCII includes only the characters for the English alphabet, though, so a newer protocol, Unicode, includes the ASCII character set and includes codes for other alphabets as well. The simplest subset of Unicode, UTF-8, is compatible with ASCII.

The ASCII table and Unicode set can be found in many computer manuals’ indexes, and all over the place online. Here’s one online version.  These codes are used for number-to-character translation in every computer operating system and programming language.

Because ASCII and Unicode assign each alphanumeric character a unique value, including punctuation symbols, you can now differentiate the values in your serial stream by ASCII-encoding them.  Change the Serial.write() in the program above to a Serial.print(), and add one more line:

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.print(sensorValue);
  Serial.print(",");
}

Now you’ve got a string of bytes representing numeric characters, AND a byte representing a comma.  What would the values of the bytes be?

50 51 44 50 51 44 50 51 44 50 51 44 50 52 44 50 52 44 50 53 44 50 53 44 50 54 44 50 56 44 50 55 44 50 55 44 50 55 44

Why is it so much longer, and what are all those 44s doing in there so regularly?  If you look at the ASCII table, you’ll see why. The sensor values are ASCII-encoded now.  That means that the value 23 is represented by the character “2” followed by the character “3”. In ASCII, “2” has the value 50 and “3” has the value 51. And “,” has the value 44.  So the numbers above, read as ASCII, translate to:

"23,23,23,23,24,24,25,25,26,28,27,27,27"

It takes more bytes to send data this way, but you have a much more readable protocol.

Serial Terminal Programs

Related video: Serial 4 – Devices and Bytes

Most serial terminal programs assume that when you’re receiving serial data, it should be interpreted as ASCII characters, This is why you’ll see random characters when you open the Serial Monitor in Arduino after uploading the first program above: the Arduino’s using a binary protocol, but the Serial Monitor thinks it’s an ASCII protocol.

The freeware program CoolTerm is a useful Serial Terminal application, because it can show you both ASCII and raw binary values. Download it, then open it.  Click the Options icon, then choose your serial port from the Serial Port menu:
CoolTerm options menu

Click OK, then click Connect. You should see the same random characters you were seeing in the Arduino IDE’s Serial Monitor. (make sure you have the Serial Monitor closed before you connect in CoolTerm because Serial ports can only be controlled by one program at a time). But if you click on the View Hex icon, you’ll see a very different view:

CoolTerm "view hex" view

Now you’re looking at the ASCII characters for each byte in the right hand column, and the raw binary values in the center column (in hexadecimal notation. This is really handy when you’re trying to interpret a binary protocol.

Click the Disconnect icon in CoolTerm  (because  Serial ports can only be controlled by one program at a time) and upload the second Arduino program above. Once the program’s uploaded, connect in CoolTerm again and look at how the hex view has changed:

coolTerm_hex_view_2

With this version of the program, the ASCII view is more readable. In the hex view, you can tell the commas from the regular data, because every third byte is 0x2C, or 44 in decimal, or “,” in ASCII.

Related video: Serial 7 – Reading Strings

Make the following final improvement to the program above:

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.print(sensorValue);
  Serial.print(",");

  // read another sensor:
  int sensorValue2 = analogRead(A1);
  // divide by 4 to reduce the range to 0-255:
  sensorValue2 = sensorValue2 / 4;
  // send it:
  Serial.println(sensorValue2);
}

When you view the output of this, you’ll see you get a string of two numbers, with a linefeed after each string. In hex view, these will be represented as 0x0A (linefeed) and 0x0D (carriage return). This is a simple multi-value protocol. You’ve got a comma separating the values, and a linefeed and carriage return separating each set of values.  This protocol is both easy to read, and easy for most personal computer programming environments to interpret. For more on that, see the Duplex Serial Lab.

Managing the Serial Buffer

Before you read this section, try the Duplex Serial Lab. This section will make more sense when you’ve seen the programs in that lab in operation.

The reason this form of  serial communication is called  “asynchronous” is that the data between the two devices involved is not synchronized. The sender can send when the receiver is not listening, and vice versa. Both sides typically maintain a serial buffer, which is a place in memory to store incoming serial data before it is used, as mentioned in the serial basics notes. On computers with an operating system, this serial buffer is maintained even when your program isn’t running. So when you stop your program and re-start it, there may be data in the buffer from a previous run of the program. This can cause errors. In fact, one of the most common and infuriating serial errors you can see in Processing is this:

Error, disabling serialEvent() for <your port name>

There are many causes for this error. All it tells you is that you did something wrong in the serialEvent() function.

Make Sure There’s Any Data To Read

The first thing you can do wrong is to assume there’s data there to read when there isn’t. Imagine the following situation:

Your microcontroller is continually sending a string of three sensors values to a program on your personal computer, ASCII-encoded, comma-separated, and terminated by a newline, like so:

234,23,142\n

The controller is not waiting for a response from your desktop program, it’s just continually sending as fast as it can. The desktop program reading this data is waiting until it sees a newline character, then reading the whole buffer. Then it splits the incoming string into an array and converts the values to integers. In Processing, it might look like this:


// in the setup(), configure the serial object to generate serial events
// only when a newline arrives, like so:
myPort.bufferUntil('\n');

// then your serialEvent function looks like this:
void serialEvent(Serial myPort){
  String inputString = myPort.readString();
  // split the string into an array:
  String[] sensorReadings = split(inputString, ",");
}

The desktop program isn’t reading as frequently as the microcontroller is sending, and the personal computer’s serial buffer contains the unread bytes. You stop the desktop program, and there’s still a byte in the buffer, like this:

\n

The next time you start your program, there’s a newline in the buffer even if the Arduino’s not running, so serialEvent() is called. But there’s no string preceding the newline character, so when you try to split the string into an array, you get an error .  The serial communication between the devices was working properly, but because it’s asynchronous, it’s your job to check that the data in the buffer is what you think it is. That’s your job as programmer. You might address this problem by checking that there’s a valid string to read first:

  String inputString = myPort.readString();
  if (inputString != null) {
    // when you know you've got a good string, take action on it:
    // split the string into an array:
    String[] sensorReadings = split(inputString, ",");
  }

Clear Any Old Data Before Reading

Although this solution works if you’re reading the serial buffer as a string, it doesn’t solve every problem. Another way to avoid this particular problem is to make sure that you’ve cleared out the serial buffer at the beginning of your program before you start reading new data. Once you’ve configured your serial object, clear the buffer like so:

  // in your setup() after you create a new serial object called myPort:
  myPort.clear();

Make Sure All The Data Has Been Received

You can still get errors even if you’ve cleared the buffer and made sure there’s data to read if the data there to read doesn’t match your expectations. In the example above, your data sentence includes three ASCII-encoded numbers and a newline.  But what if the microcontroller doesn’t send that?  Maybe there’s an error in its program, or maybe there’s still data in the serial buffer (because you didn’t clear the buffer).  You should check to make sure that everything you expect is present before you operate on it. For example, imagine you’re splitting the input data into an array of three strings, then copying those strings into global variables like so:

// global variables:
int xPosition, yPosition, zPositon;

// then your serialEvent function looks like this:
void serialEvent(Serial myPort){
  String inputString = myPort.readString();
  // split the string into an array:
  String[] sensorReadings = split(inputString, ",");
  xPosition = int(sensorReadings[0]);  // copy the first element into xPosition
  yPosition = int(sensorReadings[1]);  // copy the second element  into yPosition
  zPosition = int(sensorReadings[2]);  // copy the third element into zPosition
}

This works great until you don’t have three elements in the array. If there wasn’t a full sentence of data, the array might have only one or two elements, and your error is back. To solve this, make sure the length of the array is as long as the number of elements you’re trying to read from it like so:

// global variables:
int xPosition, yPosition, zPositon;

// then your serialEvent function looks like this:
void serialEvent(Serial myPort){
  String inputString = myPort.readString();
  // split the string into an array:
  String[] sensorReadings = split(inputString, ",");
  if (sensorReadings.length > 2) {       // check that the array has at least three elements
    xPosition = int(sensorReadings[0]);  // copy the first element into xPosition
    yPosition = int(sensorReadings[1]);  // copy the second element  into yPosition
    zPosition = int(sensorReadings[2]);  // copy the third element into zPosition
  }
}

If all of the data isn’t there for any reason, this if statement will prevent an error by skipping the part where you look for data that’s not there.  When you combine it with the check for a null string above, and the clearing of the serial buffer, you make your serial reading much more stable.  When combined with a handshaking methodology as seen in the Serial Duplex Lab as well as in the Serial Duplex Lab using P5.js, you also ensure that the serial buffer is only filled when you’re ready for new data. All of these practices are good serial communication practices and will make your projects more stable.