Introduction To Networking

Introduction

Networking is a very broad subject, and this guide will inevitably only scratch the surface.

With the advent of IPv6, many things that we all knew about networking are no longer accurate, so an update is timely.

This document focuses on TCP/IP networking, rather than OSI protocols.

The Protocol Stack

Network connections work using protocols: standards for the encoding and transfer of data from one computer system to another. These protocols are organised into layers, with one protocol utilising the services of a lower level protocol to fulfill its function.

The table below shows the protocols in the TCP/IP networking model, with examples that you may be familiar with:

Layer Example Protocols
Application HTTP, HTTPS, FTP, SSH, RDP, VNC, DNS, SNMP, SMTP, POP3, IMAP
Transport TCP, UDP
Network IP (IPv4 & IPv6), ICMP & ICMPv6, IPsec
Link Ethernet, ARP, NDP)
Physical MAC (Ethernet, Wi-Fi, DSL, ISDN, FDDI), PPP

Packets and Layering

Networking protocols send and receive data in discrete packages, called packets.

If you are browsing a web-site using the HTTP protocol (part of the Application layer), the data between your browser and the web-server is packaged into HTTP packets. These are then transported by a protocol in the Transport layer (the next layer down) (in this scenario the TCP protocol), so the HTTP packets are further packaged into TCP packets. The TCP packets are then passed to the appropriate Network layer protocol (e.g. IPv4), which again packages the TCP packets into IPv4 packets, and so on.

Outgoing packets generated by higher level protocols may sometimes be too large for handling by a lower level protocol, in which case they are broken up into smaller chunks, and reassembled by the same protocol at the receiving end.

The Physical Layer

The Physical layer provides the basic data transmission capability, whether via electrical cables, fibre-optic cables, or radio signals, to allow data connections.

Most readers will recognise at least some of the examples of physical layer protocols. Most of you will have Wi-Fi at home and at the office, and maybe also Ethernet; some people still have a connection to the Internet which uses DSL. PPP (Point to Point Protocol) is for connecting two routers, and is sometimes used for secure VPN (Virtual Private Network) connections.

For normal consumers, it is not important to understand the Physical layer protocols. Wi-Fi standards exist, and at least one is supported by any Wi-Fi router that you buy; PCs (desktop and laptop) support multiple standards, so that they will usually work with whichever Wi-Fi router you buy. Ethernet devices and cables are standard, and the only thing you need to be concerned with is the bandwidth.

The Link Layer

The comments above also apply to the Link layer. Protocols and devices are standardised, and if your devices have the right sockets, the correct protocols are supported.

The Network Layer

With the Network layer, things start to get more interesting.

Effectively, the Internet exists only from the Network layer upwards. The two protocol layers below that are simply a means of connecting to the Internet Protocol (IP).

Most home-network still use IPv4, which is the old standard for network data exchange using addresses of the form "xxx.yyy.zzz.nnn", where xxx, yyy, zzz and nnn are numbers in the range 0-254, (e.g. "192.168.0.237"). Home networks typically only support what are called class C networks, where the first 3 fields of the addresses on your network are the same for all devices, usually "192.168.0", allowing you to have up to 255 devices connected (usually enough for a home network). IPv4 is supported by your Wi-Fi router and/or Internet modem. IPv4 is also used on the public Internet. The problem with IPv4 is the shortage of addresses; the world started running out of free IPv4 addresses a few years ago. Part of the workaround introduced to cope with this shortage of addresses is NAT routing. Your home network uses NAT routing for IPv4 attached devices. NAT routing uses only one IPv4 address on the public Internet, and routes all connections from the devices on your home network by making them all appear to come from this one externally visible address (this is called masquerading - the addresses in the data packets, outgoing and incoming, are rewritten by the NAT router). NAT routing only works for TCP traffic (more about that later). NAT routing brings with it some inherent security: it is impossible for an external device to access devices on your home network via your router or Internet modem because it cannot know the destination address (there is only access from outside if a connection is opened from within your network); so NAT routing has the same security impact as a firewall.

IPv4 addresses on your home or office network are usually assigned dynamically by a DHCP server. DHCP is an Application layer protocol. A DHCP server is included in your WiFi router or Internet modem. You can usually configure the DHCP server to either assign a fixed address to certain devices, or at least allow you to have some devices with static addresses, which you would want for any servers (web-servers, print-servers, file-servers, database servers, etc.) on your home or work network.

IPv6, the new networking standard, works very differently. The addresses are much longer, to ensure that there will be no shortage of addresses: an example IPv6 address is "2a02:2455:264:800:a5de:29bf:c947:ac72". The first 4 fields are common to all devices on your home or office network; the remaining 4 fields can be assigned by the router, or statically assigned. Devices can have multiple IPv6 addresses. IPv6 addresses are global in scope, and reachable by any device on the Internet, so unless you have one or more firewalls, there is less security than IPv4.

IPv4 is slowly being phased out, and is expected to be completely retired by 2040.

More importantly, the number of web-sites and web-services continues to grow, and the pool of IPv4 addresses is largely exhausted, so it is to be expected that in future more and more web-sites and web-services will only be available on IPv6 addresses, and if your home or office network doesn't support IPv6, you will be unable to access these sites/services.

Of course, you know that you don't need to type in IP addresses to reach a server on the Internet (that would be a huge pain for those long IPv6 addresses). The IP address is looked up (invisibly) using the DNS service.

IPv4 and IPv6 Interoperability

A device connected to the Internet with an IPv4 address can communicate with other devices on the Internet which have IPv4 addresses. Similarly, a device connected with an IPv6 address can communicate with another device with an IPv6 address. Communication between devices with IPv4 addresses and devices with IPv6 addresses is not possible. In effect there are now two distinct Internets, with no interconnection, even though traffic for both protocols are carried over the same backbone.

If your PC has both an IPv4 and IPv6 addresses, then you will be able to communicate with all addresses, but not all home and office networks support both. I have found this to be a problem with some office networks. If your PC has only an IPv6 address, and the web-site that you want to access has only an IPv4 address, then you will not be able to access that web-site. Many web-hosting companies still only provide their customers with IPv4 addresses, meaning that their web-sites are inaccessible to users with only IPv6 addresses (my web-hosting service provides customers with both IPv4 and IPv6 addresses, with the relevant DNS entries).

Simultaneous Operation of IPv4 and IPv6

Whilst it is possible (and easy) to use IPv4 and IPv6 at the same time on your machines, sometimes things do not run smoothly. The root cause seems to be the limitations of most WiFi adaptors in connecting using multiple addresses. Since most people connect to their network via WiFi, this often causes problems. For example, on the laptop on which I am writing this, I have both an ethernet cable connection and a WiFi connection, but when both are enabled, WiFi is used as the default connection. When I boot the machine, both IPv4 and IPv6 work, but over time one or other (usually IPv4) becomes inactive. That means that I am unable to access IPv4 addresses on the Internet, and web-servers which have no IPv6 address are inaccessible; a reboot solves this, at least for a while.

Other Network Layer Protocols

There are a few other Network layer protocols that you may know, or need to use at some point.

ICMP (the IPv4 version) and ICMPv6 (the IPv6 version), are Internet Control Message Protocols, used for error messaging (host unreachable or service unavailable) and for ping (checking whether a host is reachable) and traceroute (reporting on the route to a requested destination host) commands, which are often used to diagnose problems with connectivity.

IPsec is a secure protocol used for VPN connections. You may have used a VPN access program, e.g. from Cisco or IBM, or a free open source application; these are mostly based on IPsec, although some use proprietary protocols.

Routing

Routing is the process of selecting a path for traffic in a network or between or across multiple networks.

Routing is done at the Network layer, although the process of exchange of routing information between networked devices is (using protocols such as RIP, BGP and EGP) is done at the Application layer, and network discovery (using protocols such as ARP and NDP) is done at the Link layer.

The Transport Layer

There are two important Transport layer protocols used on the Internet: TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

TCP

TCP is the most widely used Transport layer protocol, which is why the Internet is often referred to as a TCP/IP network.

TCP is a connection based protocol (rather like making a phone call). When your PC creates a TCP connection, it is a bidirectional, reliable (delivery is guaranteed - if a packet is lost, it is resent, multiple times, if necessary), ordered (packets arrive in the order in which they are sent), and error-checked (if data is corrupted, it will be detected, and a resend requested). Because the connection is bidirectional, the host that you connect to can send data back to your PC or other device.

Higher level protocols such as HTTP (for web-browsers) use TCP to transport their data.

UDP

UDP is a connectionless protocol. It has no handshaking, and is thus unreliable, with no guarantee of delivery, ordering, or duplicate protection.

For IPv4 clients behind a NAT router (as in most home and office networks), UDP is not very useful because it is effectively unidirectional; the destination host can reply, but the replies will go to the NAT router (usually your Internet modem), and the NAT router will not know to whom to forward the datagrams. For IPv6 clients, this is not a problem because IPv6 addresses are global in scope.

Ports

Connections via TCP and UDP are made to a specified port number. Specific services listen on specific ports, for example:

Some services use a range of port numbers. Examples are VNC, which uses different port numbers for connection to different hosts, and torrents, which use multiple ports simultaneously for higher bandwidth by downloading different segments of a file on different ports.

The port number is mostly invisible to users, because the program which establishes the connection knows which port to use for the service that it uses. Some applications, such as FTP clients and SSH clients allow you to specify alternative ports, in cases where the desired service is offered on a non-standard port number.

If you are running any kind of server on your network (Linux, Windows, etc.) you will need to know the port numbers used by the services that it shares (printing, file sharing, web-server,etc.) in order to configure the sever's firewall to allow in traffic on the relevant ports. By default, firewalls allow any outbound connections (initiated on that machine) but block all inbound connections (initiated on machines trying to use the services of that server), and if you don't explicitly allow the incoming traffic on those ports, the client machines will not be able to access the services. Such firewall configuration is not usually needed for devices such network printers and NAS (Network Attached Storage).

Port Forwarding

Most routers and Internet modems allow the configuration of port forwarding, which is essential if you want to make services available on the Internet over IPv4, where you only have one externally visible IP address, and can be very especially useful when providing multiple instances of the same service.

For example:

Port forwarding is not neccessary for IPv6 traffic, because IPv6 addresses are globally unique, and the destination machine will always be directly reachable from the Intenet (if firewalls allow).

The Application Layer

The Application layer contains a large number of protocols specific to certain services.

DNS

DNS is the Domain Name System. It is a globally distributed database of domain names and their associated IP addresses. Every time that you type a domain name into your browser's location bar, or send an email, the DNS service is used to look up the IP address. This is all transparent to users.

DNS entries can include both IPv4 and IPv6 addresses; there can be multiple entries for each, providing primary and backup server addresses for the domain name.

DNS is an absolutely esential part of how the Internet works.

HTTP and HTTPS

Virtually all Internet users are at least aware of HTTP and HTTPS, used for browsing the World Wide Web.

When you use a browser to view a web-site, it opens a TCP connection to either port 80 (for insecure browsing) or port 443 (for secure browsing), and communicates with the service at the other end of the connection (typically an Apache2 web-server) using the HTTP or HTTPS protocol.

When browsing with HTTPS, the client and server exchange encryption keys, to ensure that the data sent in both directions is secure from snooping. This is all done automatically, hidden from the user.

HTTP and HTTPS requests are sent to a server address looked up using the DNS service, but include the requested domain name. This is necessary because most web-servers serve multiple web-sites, and the server need to know which one you want to browse. You may have noticed, therefore, that typing in the IP address in your browser's location bar doesn't yield the same result as typing in the domain name, and often simply produces an error page.

NFS

NFS is the file-sharing protocol used on most Linux file-server systems (although Linux systems can also share files using Samba/Microsoft-DS).

NFS uses port number 2049.

NFS has some performance advantages over Samba, but NFSv3 has poor security and is only suitable for file-sharing on a local network (inaccessible from the Internet); NFSv4 has strong security and can be used where the file-server is more widely accessible.

Samba and Microsoft-DS

Samba, also known as SMB (Server Message Block), is a protocol for sharing not only files, but also printers and serial ports between network devices. It is how Windows computers share files; Microsoft refers to it as Microsoft-DS (Directory Services).

Ubuntu Linux computers now use Samba as their default file-sharing service, and if you want to use NFS, you will first need to install the software.

SMB/Microsoft-DS uses port 445.

Printing Services

CUPS (formerly Common Unix Printing System - based on IPP) is a protocol for sharing printers over a network. It is not limited to Unix print-servers: it is used on Linux and Windows systems, and for network printers (standalone printers directly attached to a network rather than to a server).

CUPS uses port 631.

FTP

FTP (File Transfer Protocol) is for transferring files to and from another networked computer. Although the first version was launched in 1971, it is still in widespread use. For example, it is the standard method of uploading content to remote web-servers.

Originally, FTP was used via a command-line client interface; nowadays there are many FTP client programs with a graphical user interface (GUI); my favourite is FileZilla.

FTP programs support several security modes, and is therefore safe from snooping (unless it is badly configured). It is very fast.

FTP uses port 20 for data transfer, and port 21 for control/commands. It also uses a configurable (on the FTP server) range of ports (e,g. 55000 to 55209) for data transfer. Therefore, for FTP to work, you need to have ports 20, 21 and the configurable port range open on the server. In PASV mode, the most commonly used mode, the FTP server tells the FTP client on which ports to open data channel connections, and the client opens the number that it needs in that range; this gets around the problems that would otherwise be caused by NAT routers and firewalls on the client's local network.

SSH

SSH stands for Secure SHell. It is used for secure logins to Linux and Unix remote machines, and also for file transfers using SCP (Microsoft has a GUI-based SCP client for Windows machines) and SFTP (one of the security modes for FTP)

.

SSH gives you a command-line interface to a remote computer. You can invoke it with the "-X" option (e.g. "ssh -X <user-name>@<server-address-or-name>"), which means that when you open a program on the remote device, the window opens on your local desktop (using the X11 window protocol). This is very useful, and much more flexible than working with a remote desktop system like RDP or VNC.

SSH clients are available for all operating systems; SSH servers are available for most operating systems.

SSH is very secure, and is widely used to access remote servers over the Internet.

X11

X11 is a window protocol: the basis of virtually all desktop and GUI functionality on Linux and Unix systems. It allows you to display and control a GUI application's windows on either your local desktop or a remote desktop.

Pretty much the only disadvantage of X11 is the bandwidth that it uses. Because window display is at the bitmap level, rendering a window requires fair amount of data to be transferred, which can lead to high latency when using applications remotely.

RDP

RDP is Microsoft's Remote Desktop Protocol. If you have remotely accessed a Windows machine from another Windows machine, you have used this protocol.

RDP is much faster and more efficient than X11 and VNC, because window borders and controls (and some other window-content) are rendered as predefined graphical objects, rather than using bitmaps, reducing the amount of data that needs to be dtransmitted, thus reducing latency.

The functionality of RDP is more limited than X11: you can only display a whole desktop, not individual application windows.

Citrix have a tool (XenApp) for remote access to individual application windows, using proprietary protocols derived from RDP, giving users more flexible functionality similar to X11.

RDP operates over port number 3389, so to enable RDP sessions to any of the machines on your home or office network, you will need to ensure that this port is opened on any intervening firewalls.

Windows machines (desktops and laptops) are usually limited to two simultaneous sessions, or sometimes to one session. This means that if there are already sessions running up to the limit, you will not be able to connect. Also, if there is a session already running for the user you wish to use, the new RDP connection will usually take over that session, and the existing session will be disconnected.

VNC

VNC (Virtual Network Computing) provides remote graphical desktop access and sharing for X11 based desktop, similarly to RDP.

VNC uses a range of port numbers starting at 5900. On any given machine, each user's VNC server has to be configured to listen on a different port (e.g. port 5900 for user1, and port 5901 for user2). If you want to have VNC access from the Internet to multiple destination servers on your network, via IPv4 (where you usually only have one externally accessible IP address for all devices on your network), you will need to use different ports for each user on each device (e.g. 5900 for user1 and 5901 for user2, on machine1, and 5902 for user1 and 5903 for user3, on machine2).

One of the nice things about VNC is that you can configure a different desktop system for each user. For example, on my web-servers, I have the Gnome desktop as the main desktop for direct logins, and LXDE (a lightweight X11 desktop environment) for VNC sessions.

VNC is inherently multi-user (i.e. there are no session number limits). As long as you have the necessary per-user VNC server processes running, you can use as many simultaneous sessions as you want.

Routing Information Exchange

Routers exchange information about how to reach destintion servers, using a variety of Routing Information Exchange protocols. Examples include RIP (Routing Information Protocol), BGP (Border Gateway Protocol) and EGP(Exterior Gateway Protocol).

These information exchanges are used to build and maintain routing tables, needed to determine via which intermediate addresses (routers and gateways) to send traffic. This is all automatic and invisible to ordinary users.

Email Services

There are four common protocols used for email, although there are others.

SMTP (Simple Mail Transfer Protocol) is used for sending and relaying emails. If you write an email in Outlook, it is usually sent to the SMTP server of your email service provider (e.g. googlemail or yahoo), which then relays it to the SMTP server of the recipient's email service provider. SMTP clients use port number 25 for email sending, and email servers use port numbers 587 or 465 for relaying.

POP3 (version 3 of the Post Office Protocol) is used for retrieving incoming emails from your inbox at your email service provider. You can configure your POP3 client (e.g. Outlook or Thunderbird) to either copy (i.e. leave a copy of the emails on the server) or move the email onto local storage in your email client. POP3 uses port number 110, and port number 995 for secure (using TLS encryption) email download.

IMAP (Internet Message Access Protocol) is a protocol used to view emails on your email service provider. It allows you to create email folders, and to move or copy emails to those folders (i.e. not only your inbox), and thus to leave your emails on the server. IMAP usually operates on port number 143, and port number 993 for SSL (secure). IMAP works by synchronisation of remote (on the email server) and local (on your email client) storage, meaning that you are able to view a local copy (excluding any new emails since you last connected) even if you are offline.

Exchange ActiveSync (EAS) is an email synchronisation protocol used for connecting to Microsoft Exchange servers (e.g. by Outlook) and other email clients. It handles emails, contacts, calendars and to-do lists. Because it works by synchronisation, there is a mirror copy of all conent on the server and your client(s). EAS uses port number 593.

Network Management Protocols

There are two common network management protocols (actually, protocols for managaging devices attached to networks): SNMP (Simple Network Management Protocol) and CMIP (Common Management Information Protocol).

These protocols are can be used to monitor and control devices attached to networks: not only routers and switches, but also faxes, printers and storage devices.

CMIP is a much more feature rich protocol than SNMP. Although it is an OSI protocol, it is also commonly used on TCP/IP networks.