Interpreting Serial Data

Introduction

Serial data is passed byte by byte from one device to another, but it’s up to you to decide how each device (computer or microcontroller) should interpret those bytes, when the beginning of a message is, when the end is, and what to do with the bytes in between.  Serial communication protocols define the structure of communication between devices. These notes explain how serial data is interpreted by computers.

To get the most out of these notes, you should know what a microcontroller is and have an understanding of the basics of microcontroller programming. You should also understand the basics of serial communication. You should have some understanding of programming a personal computer as well, ideally in a language that can access the serial ports of the computer, like p5.js, the Processing programming environment, the node.js programming environment, Python, or Java.

This is a useful page to return to when you have a specific advanced serial communication problem to solve.

ASCII and Binary

Related video: ASCII Bytes Explained

Imagine that you’re sending the value of one sensor from a microcontroller to a personal computer. If the sensor’s value is always less than 255, you know it can fit in a single byte. This kind of message is easy. Just send the value over and over, and the receiver can read the latest byte to have your whole message. In Arduino, you can do this using the Serial.write() command, as shown in the Arduino sketch below:

Binary Serial Arduino Program

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.write(sensorValue);
}

Imagine a typical stream of bytes sent by the program above:

23 23 23 23 24 24 25 25 26 28 27 27 27

Every value you send can range from 0 to 255, the full range that can fit in a byte.  A protocol like this is often called a binary protocol, because there’s no other meaning to each byte’s value other than the value of the number itself.

Now imagine you want to send two sensor readings.  You might think “No problem; just add another analogRead() command and another Serial.write() command.” But the resulting stream of data would look the same. Each byte would range from 0 to 255, and you’d have no way to know which byte represents which sensor. You need some punctuation bytes.

If you’re sending more than one value (and you usually are), then the receiving computer has to know when the message starts and when it ends, and it needs to know how to put the bytes together into a message.

Computers use numbers to represent alphanumeric characters (letters and numbers and punctuation) in bytes. The first standard code for doing this was called the ASCII code which stands for American Standard Code for Information Interchange. ASCII assigns each number or letter a specific byte value from 0 to 127.  For example, capital A is ASCII value 65. Capital B is 66. A space is ASCII value 32. The numeral 0 is ASCII 48. ASCII includes only the characters for the English alphabet, though, so a newer protocol, the Unicode protocol, includes the ASCII character set and includes codes for other alphabets as well. The simplest subset of Unicode, UTF-8, is compatible with ASCII.

The ASCII table and Unicode set can be found in many computer manuals’ indexes, and all over the place online. These codes are used for number-to-character translation in every computer operating system and programming language.

Because ASCII and Unicode assign each alphanumeric character a unique value, including punctuation symbols, you can now differentiate the values in your serial stream by ASCII-encoding them.  Change the Serial.write() in the program above to a Serial.print() command, and add one more line:

ASCII-encoded Serial Arduino Program

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.print(sensorValue);
  Serial.print(",");
}

Now you’ve got a string of bytes representing numeric characters, AND a byte representing a comma.  What would the values of the bytes be?

50 51 44 50 51 44 50 51 44 50 51 44 50 52 44 50 52 44 50 53 44 50 53 44 50 54 44 50 56 44 50 55 44 50 55 44 50 55 44

Why is it so much longer, and what are all those 44s doing in there so regularly?  If you look at the ASCII table, you’ll see why. The sensor values are ASCII-encoded now.  That means that the value 23 is represented by the character “2” followed by the character “3”. In ASCII, “2” has the value 50 and “3” has the value 51. And “,” has the value 44.  So the numbers above, read as ASCII, translate to:

"23,23,23,23,24,24,25,25,26,28,27,27,27"

It takes more bytes to send data this way, but you have a more human-readable protocol, because most receiving programs will default to displaying byte values using the ASCII or unicode characters assigned to them.

Serial Terminal Programs

Related video: Serial 4 – Devices and Bytes

Most serial terminal programs assume that when you’re receiving serial data, it should be interpreted as ASCII characters, This is why you’ll see random characters when you open the Serial Monitor in Arduino after uploading the binary serial program above: the Arduino’s using a binary protocol, but the Serial Monitor thinks it’s an ASCII protocol.

The freeware program CoolTerm is a useful Serial Terminal application, because it can show you both ASCII and raw binary values. Download it, then open it. Click the Options icon, then choose your serial port from the Serial Port menu:

Screenshot of the CoolTerm options menu, showing the port name that's the same as your Arduino, and a rate of 9600 bps
Figure 1. CoolTerm options menu

Click OK, then click Connect (Figure 1). You should see the same random characters you were seeing in the Arduino IDE’s Serial Monitor as shown in Figure 2. (make sure you have the Serial Monitor closed before you connect in CoolTerm because Serial ports can only be controlled by one program at a time). But if you click on the View Hex icon, you’ll see a very different view:

CoolTerm "view hex" view
Figure 2. The CoolTerm serial terminal application showing the hexadecimal view.

Now you’re looking at the ASCII characters for each byte in the right hand column, and the raw binary values in the center column (in hexadecimal notation) as shown in Figure 3. This is really handy when you’re trying to interpret a binary protocol.

Click the Disconnect icon in CoolTerm  (because  Serial ports can only be controlled by one program at a time) and upload the ASCII-encoded Serial Arduino program above. Once the program’s uploaded, connect in CoolTerm again and look at how the hex view has changed:

The CoolTerm serial terminal application showing the hexadecimal view. The screen is filled with hexadecimal values in the center, along with the ASCII characters corresponding to those values running down the side. This time you see the numeric values of the ASCIII characters.
Figure 3. The CoolTerm serial terminal application showing the hexadecimal view. This time you see the numeric values of the ASCIII characters.

With this version of the program, the ASCII view is more readable. In the hex view, you can tell the commas from the regular data, because every third byte is 0x2C, or 44 in decimal, or “,” in ASCII.

Related video: Serial 7 – Reading Strings

Make the following final improvement to the program above:

ASCII-Encoded Serial Arduino Sketch with Line Break

void setup() {
  Serial.begin(9600);
}

void loop() {
  // read the sensor:
  int sensorValue = analogRead(A0);
  // divide by 4 to reduce the range to 0-255:
  sensorValue = sensorValue / 4;
  // send it:
  Serial.print(sensorValue);
  Serial.print(",");

  // read another sensor:
  int sensorValue2 = analogRead(A1);
  // divide by 4 to reduce the range to 0-255:
  sensorValue2 = sensorValue2 / 4;
  // send it:
  Serial.println(sensorValue2);
}

When you view the output of this, you’ll see you get a string of two numbers, with a linefeed after each string. In hex view, these will be represented as 0x0A (linefeed) and 0x0D (carriage return). This is a simple multi-value protocol. You’ve got a comma separating the values, and a linefeed and carriage return separating each set of values.  This protocol is both easy to read, and easy for most personal computer programming environments to interpret. For more on that, see the Serial Duplex Lab using P5.js and p5.webserial.

Managing the Serial Buffer

Before you read this section, try the Serial Duplex Lab using P5.js and p5.webserial. This section will make more sense when you’ve seen the programs in that lab in operation.

The reason this form of  serial communication is called  “asynchronous” is that the data between the two devices involved is not synchronized. The sender can send when the receiver is not listening, and vice versa. Both sides typically maintain a serial buffer, which is a place in memory to store incoming serial data before it is used, as mentioned in the serial basics notes. On computers with an operating system, this serial buffer is maintained even when your program isn’t running. So when you stop your program and re-start it, there may be data in the buffer from a previous run of the program. This can cause errors. Most programming environments automatically flush this buffer every time you restart the program, but if the one you are using does not, then look for a function that flushes the buffer. It’s often called flush() and it’s good to run it right after you open the serial port. In Processing and p5.webserial it’s called clear().

In Processing:

// in your setup() after you create a new serial object called myPort: myPort.clear();

In p5.js with p5.webserial:

function openPort() {
  // wait for the serial.open promise to return,
  // then call the initiateSerial function
  serial.open().then(initiateSerial);

  // once the port opens, let the user know:
  function initiateSerial() {
    serial.clear();
  }
}

Make Sure There’s Any Data To Read

The first thing you can do wrong is to assume there’s data there to read when there isn’t. Imagine the following situation:

Your microcontroller is continually sending a string of three sensors values to a program on your personal computer, ASCII-encoded, comma-separated, and terminated by a newline, like so:

234,23,142\n

The controller is not waiting for a response from your desktop program, it’s just continually sending as fast as it can. The desktop program reading this data is waiting until it sees a newline character, then reading the whole buffer. Then it splits the incoming string into an array and converts the values to integers. You want to read only when a newline character comes in.

In p5.js using the p5.webserial library it might look like this:

function serialEvent(){
  let inputString = serial.readStringUntil("\r\n");
  // split the string into an array:
  let sensorReadings = split(inputString, ",");
}

In Processing, it might look like this:

// in the setup(), configure the serial object to generate serial events // only when a newline arrives, like so: myPort.bufferUntil(‘\n’); // then your serialEvent function looks like this: void serialEvent(Serial myPort){ String inputString = myPort.readString(); // split the string into an array: String[] sensorReadings = split(inputString, “,”); }

The receiving program may not be reading as frequently as the microcontroller is sending, and the personal computer’s serial buffer contains the unread bytes. You stop the desktop program, and there’s still a byte in the buffer, like this:

\n

The next time you start your program, there’s a newline in the buffer even if the Arduino’s not running, so a serial data event is generated and your serialEvent() function is called. But there’s no string preceding the newline character, so when you try to split the string into an array, you get an error .  The serial communication between the devices was working properly, but because it’s asynchronous, it’s your job to check that the data in the buffer is what you think it is. That’s your job as programmer. You might address this problem by checking that there’s a valid string to read first:

in p5.js with p5.webserial:

 let inputString = myPort.readStringUntil("\r\n");
  if (inputString != null) {
    // when you know you've got a good string, take action:
    // split the string into an array:
    let sensorReadings = split(inputString, ",");
  }

in Processing:

String inputString = myPort.readString(); if (inputString != null) { // when you know you’ve got a good string, take action on it: // split the string into an array: String[] sensorReadings = split(inputString, “,”); }

Similar problems can happen in all serial environments.

Clear Any Old Data Before Reading

Although this solution works if you’re reading the serial buffer as a string, it doesn’t solve every problem. Another way to avoid this particular problem is to make sure that you’ve cleared out the serial buffer at the beginning of your program before you start reading new data. Once you’ve configured your serial object and opened the port, clear the buffer by calling serial.clear().

Make Sure All The Data Has Been Received

You can still get errors even if you’ve cleared the buffer and made sure there’s data to read if the data there to read doesn’t match your expectations. In the example above, your data sentence includes three ASCII-encoded numbers and a newline.  But what if the microcontroller doesn’t send that?  Maybe there’s an error in its program, or maybe there’s still data in the serial buffer (because you didn’t clear the buffer).  You should check to make sure that everything you expect is present before you operate on it. For example, imagine you’re splitting the input data into an array of three strings, then copying those strings into global variables like so:

in Processing:

// global variables: int xPosition, yPosition, zPositon; // then your serialEvent function looks like this: void serialEvent(Serial myPort){ String inputString = myPort.readString(); // split the string into an array: String[] sensorReadings = split(inputString, “,”); // copy the first element into xPosition: xPosition = int(sensorReadings[0]); // copy the second element into yPosition: yPosition = int(sensorReadings[1]); // copy the third element into zPosition: zPosition = int(sensorReadings[2]); }

In p5.js with p5.webserial:

// global variables:
let xPosition, yPosition, zPositon;

// then your serialEvent function looks like this:
function serialEvent(){
  let inputString = serial.readStringUntil("\r\n");
  // split the string into an array:
  let sensorReadings = split(inputString, ",");
  // copy the first element into xPosition:
  xPosition = Number(sensorReadings[0]);
  // copy the second element  into yPosition:  
  yPosition = Number(sensorReadings[1]); 
  // copy the third element into zPosition: 
  zPosition = Number(sensorReadings[2]);  
}

This works great until you don’t have three elements in the array. If there wasn’t a full sentence of data, the array might have only one or two elements, and your error is back. To solve this, make sure the length of the array is as long as the number of elements you’re trying to read from it like so:

in Processing:

// global variables: int xPosition, yPosition, zPositon; // then your serialEvent function looks like this: void serialEvent(Serial myPort){ String inputString = myPort.readString(); // split the string into an array: String[] sensorReadings = split(inputString, “,”); if (sensorReadings.length > 2) { // copy the first element into xPosition: xPosition = int(sensorReadings[0]); // copy the second element into yPosition: yPosition = int(sensorReadings[1]); // copy the third element into zPosition: zPosition = int(sensorReadings[2]); } }

In p5.js with p5.webserial:

// global variables:
let xPosition, yPosition, zPositon;

// then your serialEvent function looks like this:
function serialEvent(){
  let inputString = serial.readStringUntil("\r\n");
  // split the string into an array:
  let sensorReadings = split(inputString, ",");
  if (sensorReadings.length > 2) {  
    // copy the first element into xPosition:
    xPosition = Number(sensorReadings[0]);
    // copy the second element  into yPosition:  
    yPosition = Number(sensorReadings[1]); 
    // copy the third element into zPosition: 
    zPosition = Number(sensorReadings[2]);
  }  
}

If all of the data isn’t there for any reason, this if statement will prevent an error by skipping the part where you look for data that’s not there.  When you combine it with the check for a null string above, and the clearing of the serial buffer, you make your serial reading much more stable.  The final result might look like this:

in Processing:

// global variables: int xPosition, yPosition, zPositon; // then your serialEvent function looks like this: void serialEvent(Serial myPort){ String inputString = myPort.readString(); if (!inputString) return; // split the string into an array: String[] sensorReadings = split(inputString, “,”); if (sensorReadings.length > 2) { // copy the first element into xPosition: xPosition = int(sensorReadings[0]); // copy the second element into yPosition: yPosition = int(sensorReadings[1]); // copy the third element into zPosition: zPosition = int(sensorReadings[2]); } }

In p5.js with p5.webserial:

// global variables:
let xPosition, yPosition, zPositon;

// then your serialEvent function looks like this:
function serialEvent(){
  let inputString = serial.readStringUntil("\r\n");
  if (!inputString) return;
  // split the string into an array:
  let sensorReadings = split(inputString, ",");
  if (sensorReadings.length > 2) {  
    // copy the first element into xPosition:
    xPosition = Number(sensorReadings[0]);
    // copy the second element  into yPosition:  
    yPosition = Number(sensorReadings[1]); 
    // copy the third element into zPosition: 
    zPosition = Number(sensorReadings[2]);
  }  
}

When combined with a handshaking methodology as seen in the Serial Duplex Lab using P5.js and p5.webserial, you also ensure that the serial buffer is only filled when you’re ready for new data. All of these practices are good serial communication practices and will make your projects more stable.

Parsing Text in Arduino

There are some tools in the Serial library of Arduino that make it simpler to parse text strings. For example if you know you are getting a string of numbers separated by commas, you can use Serial.parseInt() like so:

void setup() {
  // initialize serial
  Serial.begin(9600);
}

void loop() {
  if (Serial.available()) {
    int x = Serial.parseInt();
    int y = Serial.parseInt();
    int z = Serial.parseInt();
    Serial.print("x = ");
    Serial.print(x);
    Serial.print(", y = ");
    Serial.print(y);
    Serial.print(", z = ");
    Serial.println(z);
  }
}

Try sending comma-separated strings of three numbers to this from the Serial Monitor. You should get an output like this:

x = 34, y = 45, z = 56

Sometimes you will get 0 values if a non-numeric character arrives (like a newline or carriage return). You can catch these by checking for the right number of bytes. For example, 34, 56, 78 followed by carriage return and newline is 10 bytes. So if you know you can expect at least 10 bytes, you could use if (Serial.available() > 10). There are a number of other useful finding and parsing functions in the Serial library, including:

  • Serial.find()
  • Serial.findUntil()
  • Serial.parseInt()
  • Serial.parseFloat()
  • Serial.readBytes()
  • Serial.readBytesUntil()
  • Serial.readString()
  • Serial.readStringUntil()

For more on these, see the full Arduino Serial reference.

Data Protocols

What you’ve seen in these notes has been a common data protocol called Comma Separated Values (CSV), in which the values of different data items, whether numeric or otherwise, are encoded as text and separated by commas. Here are a few examples:

23,35,423,24,554,23,51,443,63,74,0,43
cat, dog, chicken, eel, pig, donkey, monkey, ocelot
23.5, 96.8, 2.343, 0.65, -17.34

CSV is probably the simplest of data formats, because Unicode is ubiquitous, and almost all programming APIs have a function for splitting strings on a delimiter like a comma. However, there might be more complex data protocols.

URL Encoding

You may have seen something like this at the end of a web URL:

?lat=40.8028523&long=-73.9562655&name=coco%20nail%20salon

This is called URL encoding, and it’s a common way of formatting data so that it can be included with the hypertext transport protocol, HTTP. You’ll notice that it always starts with a ?. Web locations can’t include a ?, so it delimits the beginning of a query string. Items are separated by & symbols, and each item’s key (or name) comes first, followed by an = sign, then the item’s value. This is more complex than CSV, but it’s possible to come up with an algorithm to read it, and functions to do so are common in web-native programming APIs.

JavaScript Object Notation

A common format used in JavaScript is JavaScript Object Notation, or JSON. A JSON string looks like this:

{ "name": "Sandy",
  "age": 99,
  "employed": true
}

JSON data is always enclosed in braces {}, and each key-value data pair is separated by a colon (:). pairs are separated by commas. Data items can be of any data type, because JavaScript itself is an untyped language.

Since p5.js is a JavaScript API, let’s look at how it’d be handy if we sent data serially to it. Here’s a sample Arduino program to send JSON to p5.js:

void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.println("{\"x\": 34, \"y\": 45, \"z\": 200}");
  delay(100);
}

Now here’s a serialEvent() function you can use in p5.js with p5.webserial to receive the data. Take this sketch and replace the serialEvent function it with the following:

function serialEvent() {
  // read a string from the serial port
  // until you get carriage return and newline:
  var inString = serial.readStringUntil("\r\n");
    //check to see that there's actually a string there:
    if (inString) {
      // parse the string as JSON:
      var sensors = JSON.parse(inString);
      locH = sensors.x; 
      locV = sensors.y; 
      circleColor = sensors.z; 
      console.log(sensors);
  }
}

What’s happening here is that p5.js is expecting a JSON-formatted string, and parsing it (JSON.parse()), and then you get to use the values in it just like a regular JSON object. It’s simpler than parsing all the pieces out with split(). Of course, it’s a pain to have to format a complex JSON-formatted string in Arduino, but there is a library that’ll help, called Arduino_JSON. You can install it from the Library manager (search for the name with the underscore, Arduino_JSON), and install it. There are several examples that come with it, but here’s a simple one for putting sensor values into a JSON string:

#include <Arduino_JSON.h>
// make a new JSONVar object called device:
JSONVar device;

void setup() {
  Serial.begin(9600);
}

void loop() {
  // make a data item called x in the JSONVar object:
  device["x"] = analogRead(A0);
  delay(1);
  // make a data item called y in the JSONVar object:
  device["y"] = analogRead(A1);
  delay(1);
  // make a data item called z in the JSONVar object:
  device["z"] = analogRead(A2);
  // print out the JSONVar object:
  Serial.println(device);
}

Add three analog sensors to pins A0 through A2 (potentiometers will do the job). Then try this with the same sketch. p5.js will read the data from the sensors and use them to set the position and color of the ball.

To see the sketch running on GitHub at this link. You can see the source files for copying into the p5.js editor at this link.

Conclusion

There are lots of different data formats out there, for different applications. Learning how they are structured and how to format them will simplify communication from device to device. Most protocols are either binary, as you saw at the beginning of these notes, or ASCII-encoded, like the ones in the latter half of these notes. Of the ASCII-encoded protocols, CSV is the simplest and most common, but all of them can be learned and interpreted with a little work. When you are using a well-known protocol, it’s always worth checking to see if there’s a library to format or interpret it in whatever programming environment you are using.