Adapted from Variables
Introduction
This tutorial explains how computer programs organize information in computer memory using variables. All computer programming languages use variables to manage memory, so it’s useful to understand this no matter what programming language or computer you’re using. Although the following was written with microcontrollers and physical computing applications in mind, it applies to programming in general.
The programming language examples below use a syntax based on the programming language C. That same syntax is used by many other languages, including Arduino (which is written in C), Java, Processing (which is written in Java), JavaScript, and others.
What Is Computer Memory, Anyway?
A computer’s memory is basically a matrix of switches, laid out in a regular grid, not unlike the switches you see on the back of a lot of electronic gear as shown in Figure 1:
Each switch represents the smallest unit of memory, a bit. If the switch is on, the bit’s value is 1. If it’s off, the value is 0. Each bit has an address in the grid. We can envision a grid that represents that memory like this:
bit0 | bit1 | bit2 | bit3 | bit4 | bit5 | bit6 | bit7 |
bit8 | bit9 | bit10 | bit11 | bit12 | bit13 | bit14 | bit15 |
– | – | – | – | – | – | – | – |
– | – | – | – | – | – | – | – |
– | – | – | – | – | – | – | – |
– | – | – | – | – | – | – | |
– | – | – | – | – | – | – | – |
– | – | – | – | – | – | – | – |
So if a bit can be only 0 or 1, how do we get values greater than 1?
When you count normally, you count in groups of ten. This is because you have ten fingers. So to represent two groups of ten, you write “20”, meaning “2 tens and 0 ones”. This counting system is called base ten, or decimal notation. Each digit place in base ten represents a power of ten: 100 is 102, 1000 is 103, etc.
Now, imagine you had only two fingers. You might count in groups of two. This is called base two, or binary notation. So two, for which you write “2” in base ten, would be “10” in base two, meaning one group of two and 0 ones. Each digit place in base two represents a power of two: 100 is 22, or 4 in base ten, 1000 is 23, or 8 in base ten, and so forth.
Any number you represent in decimal notation can be converted into binary notation by simply regrouping it in groups of two. Once you’ve got the number in binary form, you can store it in computer memory, letting each binary digit fill a bit of memory. So the number 238 in decimal notation would be 11101110 in binary notation. The bits in memory used to store 238 would look like this:
27 (128) | 26 (64) | 25 (32) | 24 (16) | 23 (8) | 22 (4) | 21 (2) | 20 (1) |
1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Arranging Memory into Variable Space
Programming languages organize computer memory by breaking the grid of bits up into smaller chunks and labeling them with names. Those names are called variables, and they refer to a location in the computer’s memory. When you ask for the value of a variable, you’re asking what the states of the switches in that location in memory are.
If you think of your program as a set of instructions, then variables are the words that you use to describe what those instructions act upon.
For example:
” When the user has pushed the button two times… “
For this you need a variable called buttonPushed, and you need to check when it’s equal to 2:
if (buttonPushed == 2)
When you want to store a piece of something like the number of times a button’s been pushed in the computer’s memory, you give it a name and a data type, which states how much memory you intend to use. You usually give it an initial value as well. This is called declaring the variable, and it looks like this:
int sensorValue = 234;
byte buttonPushed = 15;
long timeSinceStart = 10324;
boolean isOpen = false;
Data Types
Every variable has a data type. The data type of a variable determines how much of the computer’s memory the variable will occupy. Different programming languages have different data types. The examples above use data types from the C programming language that Arduino uses.
The first one, int sensorValue, is an integer data type. Ints in C take up 16 bits (32 bits in the 32-bit boards like the Nano 33 IoT), so they can contain 216 different values (or 232). Ints can only contain integers, but they can be positive or negative, so the variable sensorValue above could range from -32,768 to 32,767. That’s a range from -215 to 215, with one bit used to store the plus or minus sign.
The second, buttonPushed, is a byte data type. Bytes take up 8 bits, and can therefore store 28 or 256 different values, from 0 to 255. Bytes in Arduino are unsigned, meaning that they can only be positive numbers.
The third, timeSinceStart is a long integer type, for storing very large values. In this instance, it might be storing the number of milliseconds since your program started, which can get big very quickly. Long ints are signed, and can range from -2,147,483,648 to 2,147,483,647. That’s 232 possible values.
The fourth, isOpen, is a boolean variable. Booleans can only true or false, and ideally take up just one bit in memory (though most programming languages use a whole byte for convenience).
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
0 | – | – | – | – | – | – | – |
When you declare a variable, the microcontroller picks the next available address and sets aside as many bits are needed for the data type you declare. If you declare a byte, for example, it sets aside 8 bits. An integer gets 16 bits. A string gets one byte (eight bits) for every character of the string, and a byte to end the string.
What About Fractional Numbers?
The variable types above are all for whole numbers, or integers. But how do you store a number like 3.1415 or 2.7828 or other fractional numbers? These are called floating-point numbers in the programming world, and they’re a special type of variable called a float. In Arduino, floats are actually 32-bit numbers, stored in 4 bytes of memory. A few of those bits are used to store the decimal point position.
Depending on the processor you are using, the data types might be different sizes. Table 4 shows the different sizes for the data types on the Uno, an 8-bit processor, and the Nano 22 IoT, a 32-bit processor.
Data Type | Uno | Nano 33 IoT |
---|---|---|
byte | 1 byte | 1 byte |
int | 2 bytes | 4 bytes |
float | 4 bytes | 4 bytes |
char | 1 byte | 1 byte |
long | 4 bytes | 4 bytes |
short | 2 bytes | 2 bytes |
double | 4 bytes | 8 bytes |
bool | 1 byte | 1 byte |
Here’s a snippet of code to find out a given data type’s size on an Arduino:
Serial.print(sizeof(int));
Replace int above with byte, bool, double, float, long, or the data type whose size you want to know.
Choosing the Right Data Type
How do you know what data type to choose when declaring variables? It depends on two factors: what you’re going to use the variables for, and what functions you plan to use on them. First, consider how likely the numbers you might store are likely to be. For example, if you’re counting button pushes, you’re unlikely to get more than a few hundred in a few minutes, so an int or a byte might be fine. But for a number that might get large, like counting the number of milliseconds since some past event, the number could get very large, so you might need a long int.
Different built-in functions of a programming language will require different data types as parameters, so when you can, use data types that match the functions you plan to use. For example, if you were using a variable to store the results of Arduino’s millis() function, you should use a long int, because millis() returns that data type.
Doing Math With Variables
When you add, subtract, multiply, or divide with variables in a computer program, the results you get depend on the variable types you used. For example, if you ran the function below:
int voltage = 5;
int divider = 2;
int newVoltage = voltage / divider;
You might think that newVoltage = 2.5, right? Wrong. Because you used ints, the fractional part is gone, so the result would be 2. Here’s another:
byte buttonPushes = 254;
buttonPushes = buttonPushes + 4;
After this, you’d expect that buttonPushes = 258, right? Wrong again! Because you used a byte, you can’t store a value larger than 255, so when the result is larger than that, the number rolls over to the lowest possible value again. The result would be 2.
Wait, what?
Look at it this way. The highest value you can store in a byte is 255. Therefore, if you try to store 256, it rolls over to 0. 257 rolls over to 1. And 258 rolls over to 2. So 254 + 4 in a byte variable yields 2. If you used an int instead of a byte, then you’d get the result you expect (258) because an int can hold values larger than 255.
Numeric Notation Systems
There are three notation systems used most commonly in programming languages to represent numbers: binary (base two), decimal (base ten), and hexadecimal (base sixteen). In hexadecimal notation, the letters A through F represent the decimal numbers 10 through 15. Furthermore, there is a system of notation called ASCII, which stands for American Standard Code for Information Interchange, which represents most alphanumeric characters from the romanized alphabet as number values. More on ASCII can be found in the pages on serial communication. For more, see this online table representing the decimal numbers 0 to 255 in decimal, binary, hexadecimal, and ASCII. While you can work mostly in decimal notation, there are times when it’s more convenient to represent numbers in ms other than base 10.
Table 5 shows a few number values in the different bases, and the different notation forms:
Decimal value | Hexadecimal | Binary |
3 | 0x03 | 0b11 |
12 | 0x0C | 0b1100 |
45 | 0x2D | 0b101101 |
234 | 0xEA | 0b11101010 |
1000 | 0x3E8 | 0b1111101000 |
Because the values are all bits in the computer’s memory, you can use all of these notation systems interchangeably. Here are a few examples:
if (colorValue == 0xFF); // check to see if the color value is 255
// add 5 to 0x90. Result will be 0x95:
int channelNumber = 5;
int midiCommand = 0x90 + channelNumber;
Variable scope
Variables are local to a particular function in your code if they are declared in that function. Local variables can’t be used by functions outside the one that declares them, and the memory space allotted to them is released when the function ends. Variables are global when they are declared at the beginning of a program, outside all functions. Global variables are accessible to all functions in a the program, and their value is maintained for the duration of the program. Usually you use global variables for values that will need to be kept in memory for future use by other functions, and local variables when you know the value won’t be used outside that function. In general, it’s better to default to local variables when you can, to manage memory more efficiently. Here’s a typical example:
int oldButtonPush = 0; // global variable
void setup() {
Serial.begin(9600);
}
void loop() {
int buttonPush = digitalRead(3); // local variable
if (buttonPush != lastButtonPush) {
// the button changed. Do something here
// then store the current button push state
// in the global variable for the next time
// through the loop:
oldButtonPush = buttonPush;
}
}
In this example, the variable buttonPush is local to the loop function. You couldn’t read it in the setup, or any other function. The variable oldButtonPush, on the other hand, is global, and can be read by any function. In the example above, the local variable is used to read the latest state of a digital input, and then later, the value is put into the global variable so that you can get a new reading and compare it to the old one.
Constants
In addition to variables, every programming language also includes constants, which are simply variables that don’t change. They’re a useful way to label and change numbers that get used repeatedly within your program. For example, imagine you’re writing a program that runs a servo motor. Servo motors have a minimum and maximum pulse width that doesn’t change, although each servo’s minimum and maximum might be somewhat different. Rather than change every occurrence of the minimum and maximum numbers in the program, we make them constants, so we only have to change the number in one place.
You don’t have to use constants in your programs, but they’re handy to know about, and you will encounter them in other people’s programs.
In C and therefore in Arduino, there are two ways you can declare constants. You can use the const keyword, like so:
const int LEDpin = 3;
const int sensorMax = 253;
Or you can use define:
#define LEDPin 3
#define sensorMax 253
Defines are always preceded by a #, and are don’t have a semicolon at the end of the line. Defines always come at the beginning of the program. They actually work a bit like aliases. What happens is that you define a number as a name, and before compiling, the compiler checks for all occurrences of that name in the program and replaces it with the number. This way, defines don’t take up any memory, but you get all the convenience of a named constant. There are several defines in the libraries of the Arduino core libraries, so it’s preferable to use const instead of #define for constants.
For more on variables in Arduino, see the variable reference page.