# Geography of the Internet

A network is a collection of things connected to each other. Put more formally, networks are usually described as a series of nodes connected by links.  A node can be anything or anyone that can communicate with other nodes over the links between them. A network of people in a room might communicate through notes passed between them, or by pulling on strings connecting them. A network of sensors might communicate with each over over wires linking them, or over a radio link. The links that connect the nodes in a network are the medium of communication, and they affect how we communicate in a network.

The internet is just one of many networks.  There are a couple of ideas explained in a book titled Linked, by Albert-László Barabási, that are useful in understand the geography of networks, whether we’re talking about the internet or transportation networks, infrastructure networks like the power grid, or even the biological networks that link plants and animals.

## Network Topologies

The first kind of network to consider, described by Paul Erdős and Alfred Rényi, is the random network. In a random network, links between nodes are placed randomly. There’s no particular reason behind the connection between any two nodes. Not every node is linked to every other node, but all nodes will have more or less the same number of links. A random network can be shown as a distributed collection of nodes connected by links, as Figure 1 below shows.

When you want to send a message across a random network, you need to know the distance across the network. This isn’t a physical distance, but a distance measured in links. Any network can be measured in terms of the number of hops it takes to get a message from one node to another. Figure 1 shows a distributed network of a few dozen nodes. Each node is linked to a those nodes close to it, but none are linked directly to nodes far away from each other. If you want to get a message from node A , which is on the top left of the network, to node Z on the bottom right, the message has to go to the nearest node that’s closer to Z than A itself. In this diagram, B is closer to Z than A. Likewise, C is closer to Z than B is, so the message goes from B to C. That process continues until the message gets to Z. In Figure 1, the distance, in links, between A and Z is fourteen hops (the diagram skips a few letters for the sake of space). This is what’s called a distributed network.

There are other possible topologies for networks. Paul Baran, a researcher at the RAND corporation in 1964, was tasked with developing communications networks that could survive outages anywhere in the network. He described three possible network topologies: distributed networks, as mentioned above, are linked randomly, and links are more or less evenly distributed between nodes. Centralized networks are networks in which all nodes are linked to one central node, and all communication flows through the center. In a diagram, they look like a star. These afford tight control over communication, but are very inflexible. Without the central node, the network is gone. Decentralized networks are collections of  several hubs of nodes. Each hub has a central node, but the hubs are all linked together through other links. The graph resembles a collection of stars, each linked by a line to a star further out. In a decentralized network, removal of any one hub node affects only those nodes that depend on it for a connection to the larger network. Figure 2 shows Baran’s 1964 diagram of these three topologies.

Baran’s diagram simplifies the topologies. Baran’s decentralized network drawing, for example is actually centralized: one hub node at the center of the diagram connects several other hub nodes together. Removing that one separates the network into several smaller, unconnected networks.

The internet’s structure is based on these topologies. End devices like your computer or mobile devices are connected to your router, which is connected to other routers, which are connected to other routers, and so forth.  Whether you’re on a home network, academic network, business network, or mobile network, it’s routers all the way down. Figure 3 shows a simplified network of the internet. There is no one central router or central node to the network. Different network providers connect to each other, and form a distributed center of the internet. They are connected through multiple links to each other.

The central network in Figure 2 shows two nodes at the top of the diagram that are both connected to the central node, and directly to each other as well.  A message from either of the two sub-networks at the top of that diagram could take two possible paths to the bottom of the diagram: it could either go directly through its hub to the center hub, or it could go to the neighboring hub and then to the center. The greater the density of links in a network, the more possible paths there are for the message to get through to the end point, and the less critical any node is to the functioning of the network. This was one of Baran’s most important insights on networks, and it is still in play commonly today. In Figure 3, the network providers at the center of the diagram are connected not only to their customers, but to each other as well. Each network provider connects to multiple other network providers, because the more possible paths there are for your messages, the more reliable their service will be.

The most connected network would be one in which every node is connected directly to every other node. In such a network, no message is reliant on anyone but the sender and receiver to successfully send a message. This is called a complete network. However, it would take a lot of links to make that happen. Figure 4 shows two complete networks, one with five nodes, and one with ten. The network with five nodes is fairly easy to count the links. A five-pointed star of links connects the nodes across the network, and a pentagon of links connects each to its nearest neighbor. There are ten links. The second network has ten nodes, but there are so many links that it’s difficult to count them all. Each node has to have nine links to reach all the others. For a network made up of n nodes, to be complete, it must have (n2-n)/2 links.

In most physical networks, links are expensive and difficult to maintain, so you don’t see many complete networks in practice. Network designers and operators look for ways to provide adequate link density so that messages get through reliably with a minimum of links to maintain.

## Small Worlds Networks

Chapter 4 of Linked introduces Mark Granovetter’s notion of small worlds networks. These networks are not so random, as some connections are strong ties, and some are weak. Figure 5 below shows a network in which each node is connected strongly to its nearest four neighbors. The distance across such a small worlds network can get relatively long, but it only takes one or two ties across the network to shrink the distance considerably. Duncan Watts and Steve Strogatz, building on Granovetter’s ideas, developed the notion of clustering on networks, in which one subnet might have strong ties to each other, and weak ties across the larger network. These weak ties perform the important function of connecting the larger network. They make it possible for a network to be much more interconnected with relatively few links.

As an example of what weak ties can do, Granovetter showed through his research how many people find jobs not through their strong ties to family or friends, but through weak ties to friends of friends, acquaintances, and so forth. In internet terms, many connections between local networks might be thought of as weak ties.

The small networks model is not the only one that affects the connectivity of a network, but it is one of the easiest for people to understand, because it relates to many of our everyday experience. Barabási discusses other models in Linked. It’s a worthwhile read for those interested in the dynamics of networks. The concepts summarized here give you a place to start thinking about factors to consider when you design or use networks like the Internet, or mobile networks, or Bluetooth networks, or any of the many other digital communications networks.

## How are computer networks structured?

Communications networks are complex things, made up of many elements. The networks described in Figure 3, for example, includes personal computers, mobile phones, home and business routers, servers, mobile base stations, and many more pieces of equipment, made by dozens of different manufacturers. No one company manufactures all of this equipment, or even has the expertise to do so. In order to make such a system possible, there are needs for different standards of interconnection. A number of different organizations participate in the development of these standards. Organizations like the  International Standards Organization (ISO), the Institute of Electrical and Electronics Engineers (IEEE) and the United Nations’ International Telecommunication Union (ITU) all negotiate between governments, corporations and other industry stakeholders to develop and maintain these standards.

Of the numerous standards in the telecommunications industry, one of the first that’s helpful to know about in understanding networks is the The Open Systems Interconnection (OSI) model.

### The Open Systems (OSI) Interconnection Model

The Open Systems Interconnection (OSI) model, developed from parallel ideas at the ISO and the ITU. It was formalized as ITU-T recommendation X.200. This model breaks a network into several layers, each of which describes a subset of the activities that happen on the network.  By breaking the various functions of a network up into discrete chunks, it enables different companies to concentrate on different layers, and know that their hardware and software will interoperate with other companies’ equipment.

Table 1 summarizes the OSI model in terms of the tasks that each layer’s protocols handle.

Table 1. The Open Systems Interconnect (OSI) Network Model
Application Uses or generates network data from user activities.
Provides the interfaces for users to interact over networks.
What are you doing with the data transmitted or received?
Presentation Handles the formatting and presentation of incoming data to applications. Encryption is usually handled at this level as well.  How is the data formatted and/or encrypted?
Session Manages the connection between sender and receiver.  Maintains the state of the connection, and start and end of connections How do you say hello and goodbye?
Transport Manages the type of transmission, order of packets received,  acknowledgement and re-transmission, reliability, and flow control. How are you sending data packets?
Should the receiver acknowledge them?
Network Handles addressing and traffic management between networks.  Organizes data in packets, each with an Internet Protocol (IP) address. What’s your network address?
How do you connect to other networks?
Datalink Manages traffic on the physical transmission medium and addressing of physical devices. Organizes data in frames. Each network interface’s unique Media Access Control (MAC) address is associated with this layer. What’s your physical address, regardless of which network you’re on?
Physical Provides a physical medium for transmission: fiber optic, copper, radio, etc.  Specifies physical characteristics:  voltages, connectors, etc. What physical medium are you transmitting on?

The application layer is closest to what most of us experience. Applications, such as web browsers and mail clients, exist here. Most of us don’t think much beyond this layer. Designers of applications usually think about the presentation layer as well, since that defines how data is formatted. They take the other layers for granted.

The session layer defines how programs on a network open and close their communications with each other. Do they send data in short bursts, or do they maintain an ongoing connection?  The transport layer underlies that, defining how two programs confirm what was sent and what was received. The layers below the transport layer define how the network is structured, logically and physically.

You can’t send a message if you don’t have an address for the receiver. The network layer defines this. The internet protocol (IP) is perhaps the most ubiquitous network-layer protocol. It defines what kinds of devices get to define a network, and how they assign addresses. A router on an IP network is a device which defines the network and the range of addresses assigned to other devices when they’re connected to that network. In doing so, it also defines the maximum number of devices on the network. For example, a router might give itself the address 10.0.0.1, and then define that all other devices on that network get addresses from 10.0.0.2 to 10.0.0.255.

On the data link layer, each network interface, such as the WiFi radio or Ethernet jack in your computer, gets a unique hardware address called the Media Access Control (MAC) address. One device may have multiple MAC addresses if it has multiple network interfaces. For example, a laptop that has a WiFi radio and an Ethernet jack has two network interfaces, and therefore two MAC addresses.

Underlying all the others, the physical layer defines how devices use the physical media over which they communicate. If communication is over wires, what voltage and amperage is allowed? If it’s over radio, what frequencies? How are the connectors shaped? These questions, and others pertaining to the physical details of communication, are defined in the physical layer.

## Who Assigns Addresses?

Addresses on a network at the network layer, and addresses of network interfaces at the datalink layer, need to be agreed upon, or things can get confusing fast. Who assigns those addresses?

On the datalink layer, manufacturers of devices license blocks of addresses from the Institute of Electrical and Electronics Engineers (IEEE), which registers MAC addresses. Each address is six bytes long and identifies a network interface of a device. The first three bytes of a MAC address are called the Organizational Unique Identifier (OUI). These identify companies which make devices that use MAC addresses.  OUIs are registered by the IEEE. The second three bytes of a MAC address uniquely identify your network interface. The bytes of a MAC address are usually written in hexadecimal (base 16) notation.

Imagine your WiFi radio’s MAC address is 48:d7:05:aa:bb:cc. The first three bytes are the Organizational Unique Identifier (OUI): 48:d7:05. The final three bytes are your unique device: aa:bb:cc.

IP Address space is administered by theInternet Assigned Names Authority (IANA), through Public Technical Identifiers (PTI). Incorporated in 2016, PTI is an affiliate of the Internet Corporation For Assigned Names and Numbers (ICANN).  IANA manages address space through its Regional Internet Registries (RIRs).  Internet Service Providers license addresses from the RIRs. Figure 6, and table 2, show the various RIRs.

Table 2. Regional Internet Registries
Registry Area Covered
AFRINIC Africa Region
APNIC Asia/Pacific Region
ARIN Canada, USA, and some Caribbean Islands
LACNIC Latin America and some Caribbean Islands
RIPE NCC Europe, the Middle East, and Central Asia

Table and figure from IANA.

The Internet Protocol (IP) is maintained by the Internet Engineering Task Force (IETF). The first major version, IPv4, was first published as RFC 791 in 1981 as  and is still in use today. Its successor, IPv6, was designed to support a much larger number of addresses. Published as RFC 2460 in 1998, but not ratified as an Internet standard until 2017, IPv6 is still not as ubiquitous as IPv4. Both are still commonly used today.

Each address for IPv4 is four bytes long and identifies a device on a network. IPv4 space is divided into public IP addresses and private IP addresses. There are also blocks reserved for multicast use, and for future use.

IP Addresses are divided into four bytes, each representing a range of addresses. The first byte determines the largest block of addresses in a given network, and the last determines the smallest block of addresses. For example, a network defined as  128.xxx.xxx.xxx could have up to 224 possible devices on it, while 128.122.6.xxx could only have up to 28, or 256 devices on it.

Public IP addresses are unique on the internet. There can only be one device on the internet with the address 93.184.216.34, for example, and it’s currently associated with the name example.com. Web wervers have public addresses so that they can be widely reached.  Private address ranges are ranges of IP addresses designed to be used for local network addresses only. They mean nothing outside the domain of a given router. For example, your home router might have the address 10.0.1.1, and it might assign your computers the addresses 10.0.1.2, 10.0.1.3, 10.0.1.4, and so forth. But these numbers are not reachable by a device outside your router’s network. Common private IP address ranges are:

• 10.0.0.0, a 24-bit block of addresses
• 172.16.0.0, a 20-bit block of addresses
• 192.168.0.0, a 16-bit block of addresses

On the network layer, routers use Address Resolution Protocol (ARP) to associate a device’s MAC address (from the datalink layer) with an available IP address. When a new device connects to a network, it announces its MAC address and requests an IP address. Depending on the rules of the network, the router either grants an IP address or denies it. Many routers default to permissive addressing, meaning that they will assign an IP address to any legitimate ARP request. However, most enterprise networks, such as academic networks and business networks, will maintain tables of registered MAC addresses that can be granted an IP address. Any addresses not in that table will not be assigned an IP address.

Just because an internet service provider (ISP)  licenses a chunk of IP address space, that doesn’t mean they can provide physical access to the network. Conversely, just because a network provider has fiber to your building, it doesn’t mean they can provide you with IP addresses. The two often go hand-in-hand, but not always.

In order to connect to a network, a private network has to have a public gateway. Its router performs this function, and therefore has two network addresses, one public and one private. Figure 7 shows a typical home setup, in which the router has both a public and a private IP address, and assigns private IP addresses to the laptop and tablet connecting to it. The router’s public IP address is in the range of the ISP’s subnet.

One large network may combine several smaller ones. A large network can combine a combination of public and private networks. Ultimately every device with a private address will be “represented” to the rest of the internet by the first router above it with a public address. Figure 8 shows this in action. The central router has the address 128.122.x.x, and can therefore form the largest network. Each router attached to this router can form one or more public or private networks of its own, using addresses within the central router’s range.

Finally,  Autonomous Systems are networks of networks. They are joined using the Border Gateway Protocol (BGP). Autonomous System routers maintain routing tables not only for their own network, but for the networks to which they connect. They tell each other how traffic should be routed. When they fail, major traffic problems occur. Major internet service providers maintain these kinds of routers. Figure 8 below shows a network of autonomous systems, featuring some actual AS numbers that you can look up.