710 lines
41 KiB
Plaintext
710 lines
41 KiB
Plaintext
|
==Phrack Inc.==
|
||
|
|
||
|
Volume Three, Issue 29, File #3 of 12
|
||
|
|
||
|
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
|
||
|
<> <>
|
||
|
<> Introduction to the Internet Protocols <>
|
||
|
<> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <>
|
||
|
<> Chapter Nine Of The Future Transcendent Saga <>
|
||
|
<> <>
|
||
|
<> Part Two of Two Files <>
|
||
|
<> <>
|
||
|
<> Presented by Knight Lightning <>
|
||
|
<> September 27, 1989 <>
|
||
|
<> <>
|
||
|
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
|
||
|
|
||
|
|
||
|
Prologue - Part Two
|
||
|
~~~~~~~~
|
||
|
A great deal of the material in this file comes from "Introduction to the
|
||
|
Internet Protocols" by Charles L. Hedrick of Rutgers University. That material
|
||
|
is copyrighted and is used in this file by permission. Time differention and
|
||
|
changes in the wide area networks have made it neccessary for some details of
|
||
|
the file to updated and in some cases reworded for better understanding by our
|
||
|
readers. Also, Unix is a trademark of AT&T Technologies, Inc. -- Again, just
|
||
|
thought I'd let you know.
|
||
|
|
||
|
Table of Contents - Part Two
|
||
|
~~~~~~~~~~~~~~~~~
|
||
|
* Introduction - Part Two
|
||
|
* Well Known Sockets And The Applications Layer
|
||
|
* Protocols Other Than TCP: UDP and ICMP
|
||
|
* Keeping Track Of Names And Information: The Domain System
|
||
|
* Routing
|
||
|
* Details About The Internet Addresses: Subnets And Broadcasting
|
||
|
* Datagram Fragmentation And Reassembly
|
||
|
* Ethernet Encapsulation: ARP
|
||
|
* Getting More Information
|
||
|
|
||
|
|
||
|
Introduction - Part Two
|
||
|
~~~~~~~~~~~~
|
||
|
This article is a brief introduction to TCP/IP, followed by suggestions on
|
||
|
what to read for more information. This is not intended to be a complete
|
||
|
description, but it can give you a reasonable idea of the capabilities of the
|
||
|
protocols. However, if you need to know any details of the technology, you
|
||
|
will want to read the standards yourself.
|
||
|
|
||
|
Throughout this file, you will find references to the standards, in the form of
|
||
|
"RFC" (Request For Comments) or "IEN" (Internet Engineering Notes) numbers --
|
||
|
these are document numbers. The final section (Getting More Information)
|
||
|
explains how you can get copies of those standards.
|
||
|
|
||
|
|
||
|
Well-Known Sockets And The Applications Layer
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
In part one of this series, I described how a stream of data is broken up into
|
||
|
datagrams, sent to another computer, and put back together. However something
|
||
|
more is needed in order to accomplish anything useful. There has to be a way
|
||
|
for you to open a connection to a specified computer, log into it, tell it what
|
||
|
file you want, and control the transmission of the file. (If you have a
|
||
|
different application in mind, e.g. computer mail, some analogous protocol is
|
||
|
needed.) This is done by "application protocols." The application protocols
|
||
|
run "on top" of TCP/IP. That is, when they want to send a message, they give
|
||
|
the message to TCP. TCP makes sure it gets delivered to the other end.
|
||
|
Because TCP and IP take care of all the networking details, the applications
|
||
|
protocols can treat a network connection as if it were a simple byte stream,
|
||
|
like a terminal or phone line.
|
||
|
|
||
|
Before going into more details about applications programs, we have to describe
|
||
|
how you find an application. Suppose you want to send a file to a computer
|
||
|
whose Internet address is 128.6.4.7. To start the process, you need more than
|
||
|
just the Internet address. You have to connect to the FTP server at the other
|
||
|
end. In general, network programs are specialized for a specific set of tasks.
|
||
|
Most systems have separate programs to handle file transfers, remote terminal
|
||
|
logins, mail, etc. When you connect to 128.6.4.7, you have to specify that you
|
||
|
want to talk to the FTP server. This is done by having "well-known sockets"
|
||
|
for each server. Recall that TCP uses port numbers to keep track of individual
|
||
|
conversations. User programs normally use more or less random port numbers.
|
||
|
However specific port numbers are assigned to the programs that sit waiting for
|
||
|
requests. For example, if you want to send a file, you will start a program
|
||
|
called "ftp." It will open a connection using some random number, say 1234,
|
||
|
for the port number on its end. However it will specify port number 21 for the
|
||
|
other end. This is the official port number for the FTP server. Note that
|
||
|
there are two different programs involved. You run ftp on your side. This is
|
||
|
a program designed to accept commands from your terminal and pass them on to
|
||
|
the other end. The program that you talk to on the other machine is the FTP
|
||
|
server. It is designed to accept commands from the network connection, rather
|
||
|
than an interactive terminal. There is no need for your program to use a
|
||
|
well-known socket number for itself. Nobody is trying to find it. However the
|
||
|
servers have to have well-known numbers, so that people can open connections to
|
||
|
them and start sending them commands. The official port numbers for each
|
||
|
program are given in "Assigned Numbers."
|
||
|
|
||
|
Note that a connection is actually described by a set of 4 numbers: The
|
||
|
Internet address at each end, and the TCP port number at each end. Every
|
||
|
datagram has all four of those numbers in it. (The Internet addresses are in
|
||
|
the IP header, and the TCP port numbers are in the TCP header.) In order to
|
||
|
keep things straight, no two connections can have the same set of numbers.
|
||
|
However it is enough for any one number to be different. For example, it is
|
||
|
perfectly possible for two different users on a machine to be sending files to
|
||
|
the same other machine. This could result in connections with the following
|
||
|
parameters:
|
||
|
|
||
|
Internet addresses TCP ports
|
||
|
connection 1 128.6.4.194, 128.6.4.7 1234, 21
|
||
|
connection 2 128.6.4.194, 128.6.4.7 1235, 21
|
||
|
|
||
|
Since the same machines are involved, the Internet addresses are the same.
|
||
|
Since they are both doing file transfers, one end of the connection involves
|
||
|
the well-known port number for FTP. The only thing that differs is the port
|
||
|
number for the program that the users are running. That's enough of a
|
||
|
difference. Generally, at least one end of the connection asks the network
|
||
|
software to assign it a port number that is guaranteed to be unique. Normally,
|
||
|
it's the user's end, since the server has to use a well-known number.
|
||
|
|
||
|
Now that we know how to open connections, let's get back to the applications
|
||
|
programs. As mentioned earlier, once TCP has opened a connection, we have
|
||
|
something that might as well be a simple wire. All the hard parts are handled
|
||
|
by TCP and IP. However we still need some agreement as to what we send over
|
||
|
this connection. In effect this is simply an agreement on what set of commands
|
||
|
the application will understand, and the format in which they are to be sent.
|
||
|
Generally, what is sent is a combination of commands and data. They use
|
||
|
context to differentiate. For example, the mail protocol works like this:
|
||
|
Your mail program opens a connection to the mail server at the other end. Your
|
||
|
program gives it your machine's name, the sender of the message, and the
|
||
|
recipients you want it sent to. It then sends a command saying that it is
|
||
|
starting the message. At that point, the other end stops treating what it sees
|
||
|
as commands, and starts accepting the message. Your end then starts sending
|
||
|
the text of the message. At the end of the message, a special mark is sent (a
|
||
|
dot in the first column). After that, both ends understand that your program
|
||
|
is again sending commands. This is the simplest way to do things, and the one
|
||
|
that most applications use.
|
||
|
|
||
|
File transfer is somewhat more complex. The file transfer protocol involves
|
||
|
two different connections. It starts out just like mail. The user's program
|
||
|
sends commands like "log me in as this user," "here is my password," "send me
|
||
|
the file with this name." However once the command to send data is sent, a
|
||
|
second connection is opened for the data itself. It would certainly be
|
||
|
possible to send the data on the same connection, as mail does. However file
|
||
|
transfers often take a long time. The designers of the file transfer protocol
|
||
|
wanted to allow the user to continue issuing commands while the transfer is
|
||
|
going on. For example, the user might make an inquiry, or he might abort the
|
||
|
transfer. Thus the designers felt it was best to use a separate connection for
|
||
|
the data and leave the original command connection for commands. (It is also
|
||
|
possible to open command connections to two different computers, and tell them
|
||
|
to send a file from one to the other. In that case, the data couldn't go over
|
||
|
the command connection.)
|
||
|
|
||
|
Remote terminal connections use another mechanism still. For remote logins,
|
||
|
there is just one connection. It normally sends data. When it is necessary to
|
||
|
send a command (e.g. to set the terminal type or to change some mode), a
|
||
|
special character is used to indicate that the next character is a command. If
|
||
|
the user happens to type that special character as data, two of them are sent.
|
||
|
|
||
|
I am not going to describe the application protocols in detail in this file.
|
||
|
It is better to read the RFCs yourself. However there are a couple of common
|
||
|
conventions used by applications that will be described here. First, the
|
||
|
common network representation: TCP/IP is intended to be usable on any
|
||
|
computer. Unfortunately, not all computers agree on how data is represented.
|
||
|
|
||
|
There are differences in character codes (ASCII vs. EBCDIC), in end of line
|
||
|
conventions (carriage return, line feed, or a representation using counts), and
|
||
|
in whether terminals expect characters to be sent individually or a line at a
|
||
|
time. In order to allow computers of different kinds to communicate, each
|
||
|
applications protocol defines a standard representation. Note that TCP and IP
|
||
|
do not care about the representation. TCP simply sends octets. However the
|
||
|
programs at both ends have to agree on how the octets are to be interpreted.
|
||
|
|
||
|
The RFC for each application specifies the standard representation for that
|
||
|
application. Normally it is "net ASCII." This uses ASCII characters, with end
|
||
|
of line denoted by a carriage return followed by a line feed. For remote
|
||
|
login, there is also a definition of a "standard terminal," which turns out to
|
||
|
be a half-duplex terminal with echoing happening on the local machine. Most
|
||
|
applications also make provisions for the two computers to agree on other
|
||
|
representations that they may find more convenient. For example, PDP-10's have
|
||
|
36-bit words. There is a way that two PDP-10's can agree to send a 36-bit
|
||
|
binary file. Similarly, two systems that prefer full-duplex terminal
|
||
|
conversations can agree on that. However each application has a standard
|
||
|
representation, which every machine must support.
|
||
|
|
||
|
So that you might get a better idea of what is involved in the application
|
||
|
protocols, here is an imaginary example of SMTP (the simple mail transfer
|
||
|
protocol.) Assume that a computer called FTS.PHRACK.EDU wants to send the
|
||
|
following message.
|
||
|
|
||
|
Date: Fri, 17 Nov 89 15:42:06 EDT
|
||
|
From: knight@fts.phrack.edu
|
||
|
To: taran@msp.phrack.edu
|
||
|
Subject: Anniversary
|
||
|
|
||
|
Four years is quite a long time to be around. Happy Anniversary!
|
||
|
|
||
|
Note that the format of the message itself is described by an Internet standard
|
||
|
(RFC 822). The standard specifies the fact that the message must be
|
||
|
transmitted as net ASCII (i.e. it must be ASCII, with carriage return/linefeed
|
||
|
to delimit lines). It also describes the general structure, as a group of
|
||
|
header lines, then a blank line, and then the body of the message. Finally, it
|
||
|
describes the syntax of the header lines in detail. Generally they consist of
|
||
|
a keyword and then a value.
|
||
|
|
||
|
Note that the addressee is indicated as TARAN@MSP.PHRACK.EDU. Initially,
|
||
|
addresses were simply "person at machine." Today's standards are much more
|
||
|
flexible. There are now provisions for systems to handle other systems' mail.
|
||
|
This can allow automatic forwarding on behalf of computers not connected to the
|
||
|
Internet. It can be used to direct mail for a number of systems to one central
|
||
|
mail server. Indeed there is no requirement that an actual computer by the
|
||
|
name of FTS.PHRACK.EDU even exist (and it doesn't). The name servers could be
|
||
|
set up so that you mail to department names, and each department's mail is
|
||
|
routed automatically to an appropriate computer. It is also possible that the
|
||
|
part before the @ is something other than a user name. It is possible for
|
||
|
programs to be set up to process mail. There are also provisions to handle
|
||
|
mailing lists, and generic names such as "postmaster" or "operator."
|
||
|
|
||
|
The way the message is to be sent to another system is described by RFCs 821
|
||
|
and 974. The program that is going to be doing the sending asks the name
|
||
|
server several queries to determine where to route the message. The first
|
||
|
query is to find out which machines handle mail for the name FTS.PHRACK.EDU.
|
||
|
In this case, the server replies that FTS.PHRACK.EDU handles its own mail. The
|
||
|
program then asks for the address of FTS.PHRACK.EDU, which for the sake of this
|
||
|
example is is 269.517.724.5. Then the the mail program opens a TCP connection
|
||
|
to port 25 on 269.517.724.5. Port 25 is the well-known socket used for
|
||
|
receiving mail. Once this connection is established, the mail program starts
|
||
|
sending commands. Here is a typical conversation. Each line is labelled as to
|
||
|
whether it is from FTS or MSP. Note that FTS initiated the connection:
|
||
|
|
||
|
MSP 220 MSP.PHRACK.EDU SMTP Service at 17 Nov 89 09:35:24 EDT
|
||
|
FTS HELO fts.phrack.edu
|
||
|
MSP 250 MSP.PHRACK.EDU - Hello, FTS.PHRACK.EDU
|
||
|
FTS MAIL From:<knight@fts.phrack.edu>
|
||
|
MSP 250 MAIL accepted
|
||
|
FTS RCPT To:<taran@msp.phrack.edu>
|
||
|
MSP 250 Recipient accepted
|
||
|
FTS DATA
|
||
|
MSP 354 Start mail input; end with <CRLF>.<CRLF>
|
||
|
FTS Date: Fri, 17 Nov 89 15:42:06 EDT
|
||
|
FTS From: knight@fts.phrack.edu
|
||
|
FTS To: taran@msp.phrack.edu
|
||
|
FTS Subject: Anniversary
|
||
|
FTS
|
||
|
FTS Four years is quite a long time to be around. Happy Anniversary!
|
||
|
FTS .
|
||
|
MSP 250 OK
|
||
|
FTS QUIT
|
||
|
MSP 221 MSP.PHRACK.EDU Service closing transmission channel
|
||
|
|
||
|
The commands all use normal text. This is typical of the Internet standards.
|
||
|
Many of the protocols use standard ASCII commands. This makes it easy to watch
|
||
|
what is going on and to diagnose problems. The mail program keeps a log of
|
||
|
each conversation so if something goes wrong, the log file can simply be mailed
|
||
|
to the postmaster. Since it is normal text, he can see what was going on. It
|
||
|
also allows a human to interact directly with the mail server, for testing.
|
||
|
|
||
|
The responses all begin with numbers. This is also typical of Internet
|
||
|
protocols. The allowable responses are defined in the protocol. The numbers
|
||
|
allow the user program to respond unambiguously. The rest of the response is
|
||
|
text, which is normally for use by any human who may be watching or looking at
|
||
|
a log. It has no effect on the operation of the programs. The commands
|
||
|
themselves simply allow the mail program on one end to tell the mail server the
|
||
|
information it needs to know in order to deliver the message. In this case,
|
||
|
the mail server could get the information by looking at the message itself.
|
||
|
|
||
|
Every session must begin with a HELO, which gives the name of the system that
|
||
|
initiated the connection. Then the sender and recipients are specified. There
|
||
|
can be more than one RCPT command, if there are several recipients. Finally
|
||
|
the data itself is sent. Note that the text of the message is terminated by a
|
||
|
line containing just a period, but if such a line appears in the message, the
|
||
|
period is doubled. After the message is accepted, the sender can send another
|
||
|
message, or terminate the session as in the example above.
|
||
|
|
||
|
Generally, there is a pattern to the response numbers. The protocol defines
|
||
|
the specific set of responses that can be sent as answers to any given command.
|
||
|
However programs that don't want to analyze them in detail can just look at the
|
||
|
first digit. In general, responses that begin with a 2 indicate success.
|
||
|
Those that begin with 3 indicate that some further action is needed, as shown
|
||
|
above. 4 and 5 indicate errors. 4 is a "temporary" error, such as a disk
|
||
|
filling. The message should be saved, and tried again later. 5 is a permanent
|
||
|
error, such as a non-existent recipient. The message should be returned to the
|
||
|
sender with an error message.
|
||
|
|
||
|
For more details about the protocols mentioned in this section, see RFCs
|
||
|
821/822 for mail, RFC 959 for file transfer, and RFCs 854/855 for remote
|
||
|
logins. For the well-known port numbers, see the current edition of Assigned
|
||
|
Numbers, and possibly RFC 814.
|
||
|
|
||
|
|
||
|
Protocols Other Than TCP: UDP and ICMP
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
Thus far only connections that use TCP have been described. Remember that TCP
|
||
|
is responsible for breaking up messages into datagrams, and reassembling them
|
||
|
properly. However in many applications, there are messages that will always
|
||
|
fit in a single datagram. An example is name lookup. When a user attempts to
|
||
|
make a connection to another system, he will generally specify the system by
|
||
|
name, rather than Internet address. His system has to translate that name to
|
||
|
an address before it can do anything. Generally, only a few systems have the
|
||
|
database used to translate names to addresses. So the user's system will want
|
||
|
to send a query to one of the systems that has the database.
|
||
|
|
||
|
This query is going to be very short. It will certainly fit in one datagram.
|
||
|
So will the answer. Thus it seems silly to use TCP. Of course TCP does more
|
||
|
than just break things up into datagrams. It also makes sure that the data
|
||
|
arrives, resending datagrams where necessary. But for a question that fits in
|
||
|
a single datagram, all of the complexity of TCP is not needed. If there is not
|
||
|
an answer after a few seconds, you can just ask again. For applications like
|
||
|
this, there are alternatives to TCP.
|
||
|
|
||
|
The most common alternative is UDP ("user datagram protocol"). UDP is designed
|
||
|
for applications where you don't need to put sequences of datagrams together.
|
||
|
It fits into the system much like TCP. There is a UDP header. The network
|
||
|
software puts the UDP header on the front of your data, just as it would put a
|
||
|
TCP header on the front of your data. Then UDP sends the data to IP, which
|
||
|
adds the IP header, putting UDP's protocol number in the protocol field instead
|
||
|
of TCP's protocol number.
|
||
|
|
||
|
UDP doesn't do as much as TCP does. It does not split data into multiple
|
||
|
datagrams and it does not keep track of what it has sent so it can resend if
|
||
|
necessary. About all that UDP provides is port numbers so that several
|
||
|
programs can use UDP at once. UDP port numbers are used just like TCP port
|
||
|
numbers. There are well-known port numbers for servers that use UDP.
|
||
|
|
||
|
The UDP header is shorter than a TCP header. It still has source and
|
||
|
destination port numbers, and a checksum, but that's about it. UDP is used by
|
||
|
the protocols that handle name lookups (see IEN 116, RFC 882, and RFC 883) and
|
||
|
a number of similar protocols.
|
||
|
|
||
|
Another alternative protocol is ICMP ("Internet control message protocol").
|
||
|
ICMP is used for error messages, and other messages intended for the TCP/IP
|
||
|
software itself, rather than any particular user program. For example, if you
|
||
|
attempt to connect to a host, your system may get back an ICMP message saying
|
||
|
"host unreachable." ICMP can also be used to find out some information about
|
||
|
the network. See RFC 792 for details of ICMP.
|
||
|
|
||
|
ICMP is similar to UDP, in that it handles messages that fit in one datagram.
|
||
|
However it is even simpler than UDP. It does not even have port numbers in its
|
||
|
header. Since all ICMP messages are interpreted by the network software
|
||
|
itself, no port numbers are needed to say where an ICMP message is supposed to
|
||
|
go.
|
||
|
|
||
|
|
||
|
Keeping Track Of Names And Information: The Domain System
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
As we indicated earlier, the network software generally needs a 32-bit Internet
|
||
|
address in order to open a connection or send a datagram. However users prefer
|
||
|
to deal with computer names rather than numbers. Thus there is a database that
|
||
|
allows the software to look up a name and find the corresponding number.
|
||
|
|
||
|
When the Internet was small, this was easy. Each system would have a file that
|
||
|
listed all of the other systems, giving both their name and number. There are
|
||
|
now too many computers for this approach to be practical. Thus these files
|
||
|
have been replaced by a set of name servers that keep track of host names and
|
||
|
the corresponding Internet addresses. (In fact these servers are somewhat more
|
||
|
general than that. This is just one kind of information stored in the domain
|
||
|
system.) A set of interlocking servers are used rather than a single central
|
||
|
one.
|
||
|
|
||
|
There are now so many different institutions connected to the Internet that it
|
||
|
would be impractical for them to notify a central authority whenever they
|
||
|
installed or moved a computer. Thus naming authority is delegated to
|
||
|
individual institutions. The name servers form a tree, corresponding to
|
||
|
institutional structure. The names themselves follow a similar structure. A
|
||
|
typical example is the name BORAX.LCS.MIT.EDU. This is a computer at the
|
||
|
Laboratory for Computer Science (LCS) at MIT. In order to find its Internet
|
||
|
address, you might potentially have to consult 4 different servers.
|
||
|
|
||
|
First, you would ask a central server (called the root) where the EDU server
|
||
|
is. EDU is a server that keeps track of educational institutions. The root
|
||
|
server would give you the names and Internet addresses of several servers for
|
||
|
EDU. You would then ask EDU where the server for MIT is. It would give you
|
||
|
names and Internet addresses of several servers for MIT. Then you would ask
|
||
|
MIT where the server for LCS is, and finally you would ask one of the LCS
|
||
|
servers about BORAX. The final result would be the Internet address for
|
||
|
BORAX.LCS.MIT.EDU. Each of these levels is referred to as a "domain." The
|
||
|
entire name, BORAX.LCS.MIT.EDU, is called a "domain name." (So are the names
|
||
|
of the higher-level domains, such as LCS.MIT.EDU, MIT.EDU, and EDU.)
|
||
|
|
||
|
Fortunately, you don't really have to go through all of this most of the time.
|
||
|
First of all, the root name servers also happen to be the name servers for the
|
||
|
top-level domains such as EDU. Thus a single query to a root server will get
|
||
|
you to MIT. Second, software generally remembers answers that it got before.
|
||
|
So once we look up a name at LCS.MIT.EDU, our software remembers where to find
|
||
|
servers for LCS.MIT.EDU, MIT.EDU, and EDU. It also remembers the translation
|
||
|
of BORAX.LCS.MIT.EDU. Each of these pieces of information has a "time to live"
|
||
|
associated with it. Typically this is a few days. After that, the information
|
||
|
expires and has to be looked up again. This allows institutions to change
|
||
|
things.
|
||
|
|
||
|
The domain system is not limited to finding out Internet addresses. Each
|
||
|
domain name is a node in a database. The node can have records that define a
|
||
|
number of different properties. Examples are Internet address, computer type,
|
||
|
and a list of services provided by a computer. A program can ask for a
|
||
|
specific piece of information, or all information about a given name. It is
|
||
|
possible for a node in the database to be marked as an "alias" (or nickname)
|
||
|
for another node. It is also possible to use the domain system to store
|
||
|
information about users, mailing lists, or other objects.
|
||
|
|
||
|
There is an Internet standard defining the operation of these databases as well
|
||
|
as the protocols used to make queries of them. Every network utility has to be
|
||
|
able to make such queries since this is now the official way to evaluate host
|
||
|
names. Generally utilities will talk to a server on their own system. This
|
||
|
server will take care of contacting the other servers for them. This keeps
|
||
|
down the amount of code that has to be in each application program.
|
||
|
|
||
|
The domain system is particularly important for handling computer mail. There
|
||
|
are entry types to define what computer handles mail for a given name to
|
||
|
specify where an individual is to receive mail and to define mailing lists.
|
||
|
|
||
|
See RFCs 882, 883, and 973 for specifications of the domain system. RFC 974
|
||
|
defines the use of the domain system in sending mail.
|
||
|
|
||
|
Routing
|
||
|
~~~~~~~
|
||
|
The task of finding how to get a datagram to its destination is referred to as
|
||
|
"routing." Many of the details depend upon the particular implementation.
|
||
|
However some general things can be said.
|
||
|
|
||
|
It is necessary to understand the model on which IP is based. IP assumes that
|
||
|
a system is attached to some local network. It is assumed that the system can
|
||
|
send datagrams to any other system on its own network. (In the case of
|
||
|
Ethernet, it simply finds the Ethernet address of the destination system, and
|
||
|
puts the datagram out on the Ethernet.) The problem comes when a system is
|
||
|
asked to send a datagram to a system on a different network. This problem is
|
||
|
handled by gateways.
|
||
|
|
||
|
A gateway is a system that connects a network with one or more other networks.
|
||
|
Gateways are often normal computers that happen to have more than one network
|
||
|
interface. The software on a machine must be set up so that it will forward
|
||
|
datagrams from one network to the other. That is, if a machine on network
|
||
|
128.6.4 sends a datagram to the gateway, and the datagram is addressed to a
|
||
|
machine on network 128.6.3, the gateway will forward the datagram to the
|
||
|
destination. Major communications centers often have gateways that connect a
|
||
|
number of different networks.
|
||
|
|
||
|
Routing in IP is based entirely upon the network number of the destination
|
||
|
address. Each computer has a table of network numbers. For each network
|
||
|
number, a gateway is listed. This is the gateway to be used to get to that
|
||
|
network. The gateway does not have to connect directly to the network, it just
|
||
|
has to be the best place to go to get there.
|
||
|
|
||
|
When a computer wants to send a datagram, it first checks to see if the
|
||
|
destination address is on the system's own local network. If so, the datagram
|
||
|
can be sent directly. Otherwise, the system expects to find an entry for the
|
||
|
network that the destination address is on. The datagram is sent to the
|
||
|
gateway listed in that entry. This table can get quite big. For example, the
|
||
|
Internet now includes several hundred individual networks. Thus various
|
||
|
strategies have been developed to reduce the size of the routing table. One
|
||
|
strategy is to depend upon "default routes." There is often only one gateway
|
||
|
out of a network.
|
||
|
|
||
|
This gateway might connect a local Ethernet to a campus-wide backbone network.
|
||
|
In that case, it is not neccessary to have a separate entry for every network
|
||
|
in the world. That gateway is simply defined as a "default." When no specific
|
||
|
route is found for a datagram, the datagram is sent to the default gateway. A
|
||
|
default gateway can even be used when there are several gateways on a network.
|
||
|
There are provisions for gateways to send a message saying "I'm not the best
|
||
|
gateway -- use this one instead." (The message is sent via ICMP. See RFC
|
||
|
792.) Most network software is designed to use these messages to add entries
|
||
|
to their routing tables. Suppose network 128.6.4 has two gateways, 128.6.4.59
|
||
|
and 128.6.4.1. 128.6.4.59 leads to several other internal Rutgers networks.
|
||
|
128.6.4.1 leads indirectly to the NSFnet. Suppose 128.6.4.59 is set as a
|
||
|
default gateway, and there are no other routing table entries. Now what
|
||
|
happens when you need to send a datagram to MIT? MIT is network 18. Since
|
||
|
there is no entry for network 18, the datagram will be sent to the default,
|
||
|
128.6.4.59. This gateway is the wrong one. So it will forward the datagram to
|
||
|
128.6.4.1. It will also send back an error saying in effect: "to get to
|
||
|
network 18, use 128.6.4.1." The software will then add an entry to the routing
|
||
|
table. Any future datagrams to MIT will then go directly to 128.6.4.1. (The
|
||
|
error message is sent using the ICMP protocol. The message type is called
|
||
|
"ICMP redirect.")
|
||
|
|
||
|
Most IP experts recommend that individual computers should not try to keep
|
||
|
track of the entire network. Instead, they should start with default gateways
|
||
|
and let the gateways tell them the routes as just described. However this
|
||
|
doesn't say how the gateways should find out about the routes. The gateways
|
||
|
can't depend upon this strategy. They have to have fairly complete routing
|
||
|
tables. For this, some sort of routing protocol is needed. A routing protocol
|
||
|
is simply a technique for the gateways to find each other and keep up to date
|
||
|
about the best way to get to every network. RFC 1009 contains a review of
|
||
|
gateway design and routing.
|
||
|
|
||
|
|
||
|
Details About Internet Addresses: Subnets And Broadcasting
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
Internet addresses are 32-bit numbers, normally written as 4 octets (in
|
||
|
decimal), e.g. 128.6.4.7. There are actually 3 different types of address.
|
||
|
The problem is that the address has to indicate both the network and the host
|
||
|
within the network. It was felt that eventually there would be lots of
|
||
|
networks. Many of them would be small, but probably 24 bits would be needed to
|
||
|
represent all the IP networks. It was also felt that some very big networks
|
||
|
might need 24 bits to represent all of their hosts. This would seem to lead to
|
||
|
48 bit addresses. But the designers really wanted to use 32 bit addresses. So
|
||
|
they adopted a kludge. The assumption is that most of the networks will be
|
||
|
small. So they set up three different ranges of address.
|
||
|
|
||
|
Addresses beginning with 1 to 126 use only the first octet for the network
|
||
|
number. The other three octets are available for the host number. Thus 24
|
||
|
bits are available for hosts. These numbers are used for large networks, but
|
||
|
there can only be 126 of these. The ARPAnet is one and there are a few large
|
||
|
commercial networks. But few normal organizations get one of these "class A"
|
||
|
addresses.
|
||
|
|
||
|
For normal large organizations, "class B" addresses are used. Class B
|
||
|
addresses use the first two octets for the network number. Thus network
|
||
|
numbers are 128.1 through 191.254. (0 and 255 are avoided for reasons to be
|
||
|
explained below. Addresses beginning with 127 are also avoided because they
|
||
|
are used by some systems for special purposes.) The last two octets are
|
||
|
available for host addesses, giving 16 bits of host address. This allows for
|
||
|
64516 computers, which should be enough for most organizations. Finally, class
|
||
|
C addresses use three octets in the range 192.1.1 to 223.254.254. These allow
|
||
|
only 254 hosts on each network, but there can be lots of these networks.
|
||
|
Addresses above 223 are reserved for future use as class D and E (which are
|
||
|
currently not defined).
|
||
|
|
||
|
0 and 255 have special meanings. 0 is reserved for machines that do not know
|
||
|
their address. In certain circumstances it is possible for a machine not to
|
||
|
know the number of the network it is on, or even its own host address. For
|
||
|
example, 0.0.0.23 would be a machine that knew it was host number 23, but
|
||
|
didn't know on what network.
|
||
|
|
||
|
255 is used for "broadcast." A broadcast is a message that you want every
|
||
|
system on the network to see. Broadcasts are used in some situations where you
|
||
|
don't know who to talk to. For example, suppose you need to look up a host
|
||
|
name and get its Internet address. Sometimes you don't know the address of the
|
||
|
nearest name server. In that case, you might send the request as a broadcast.
|
||
|
There are also cases where a number of systems are interested in information.
|
||
|
It is then less expensive to send a single broadcast than to send datagrams
|
||
|
individually to each host that is interested in the information. In order to
|
||
|
send a broadcast, you use an address that is made by using your network
|
||
|
address, with all ones in the part of the address where the host number goes.
|
||
|
For example, if you are on network 128.6.4, you would use 128.6.4.255 for
|
||
|
broadcasts. How this is actually implemented depends upon the medium. It is
|
||
|
not possible to send broadcasts on the ARPAnet, or on point to point lines, but
|
||
|
it is possible on an Ethernet. If you use an Ethernet address with all its
|
||
|
bits on (all ones), every machine on the Ethernet is supposed to look at that
|
||
|
datagram.
|
||
|
|
||
|
Because 0 and 255 are used for unknown and broadcast addresses, normal hosts
|
||
|
should never be given addresses containing 0 or 255. Addresses should never
|
||
|
begin with 0, 127, or any number above 223.
|
||
|
|
||
|
|
||
|
Datagram Fragmentation And Reassembly
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
TCP/IP is designed for use with many different kinds of networks.
|
||
|
Unfortunately, network designers do not agree about how big packets can be.
|
||
|
Ethernet packets can be 1500 octets long. ARPAnet packets have a maximum of
|
||
|
around 1000 octets. Some very fast networks have much larger packet sizes.
|
||
|
You might think that IP should simply settle on the smallest possible size, but
|
||
|
this would cause serious performance problems. When transferring large files,
|
||
|
big packets are far more efficient than small ones. So it is best to be able
|
||
|
to use the largest packet size possible, but it is also necessary to be able to
|
||
|
handle networks with small limits. There are two provisions for this.
|
||
|
|
||
|
TCP has the ability to "negotiate" about datagram size. When a TCP connection
|
||
|
first opens, both ends can send the maximum datagram size they can handle. The
|
||
|
smaller of these numbers is used for the rest of the connection. This allows
|
||
|
two implementations that can handle big datagrams to use them, but also lets
|
||
|
them talk to implementations that cannot handle them. This does not completely
|
||
|
solve the problem. The most serious problem is that the two ends do not
|
||
|
necessarily know about all of the steps in between. For this reason, there are
|
||
|
provisions to split datagrams up into pieces. This is referred to as
|
||
|
"fragmentation."
|
||
|
|
||
|
The IP header contains fields indicating that a datagram has been split and
|
||
|
enough information to let the pieces be put back together. If a gateway
|
||
|
connects an Ethernet to the Arpanet, it must be prepared to take 1500-octet
|
||
|
Ethernet packets and split them into pieces that will fit on the Arpanet.
|
||
|
Furthermore, every host implementation of TCP/IP must be prepared to accept
|
||
|
pieces and put them back together. This is referred to as "reassembly."
|
||
|
|
||
|
TCP/IP implementations differ in the approach they take to deciding on datagram
|
||
|
size. It is fairly common for implementations to use 576-byte datagrams
|
||
|
whenever they can't verify that the entire path is able to handle larger
|
||
|
packets. This rather conservative strategy is used because of the number of
|
||
|
implementations with bugs in the code to reassemble fragments. Implementors
|
||
|
often try to avoid ever having fragmentation occur. Different implementors
|
||
|
take different approaches to deciding when it is safe to use large datagrams.
|
||
|
Some use them only for the local network. Others will use them for any network
|
||
|
on the same campus. 576 bytes is a "safe" size which every implementation must
|
||
|
support.
|
||
|
|
||
|
Ethernet Encapsulation: ARP
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
In Part One of Introduction to the Internet Protocols (Phrack Inc., Volume
|
||
|
Three, Issue 28, File #3 of 12) there was a brief description about what IP
|
||
|
datagrams look like on an Ethernet. The discription showed the Ethernet header
|
||
|
and checksum, but it left one hole: It did not say how to figure out what
|
||
|
Ethernet address to use when you want to talk to a given Internet address.
|
||
|
There is a separate protocol for this called ARP ("address resolution
|
||
|
protocol") and it is not an IP protocal as ARP datagrams do not have IP
|
||
|
headers.
|
||
|
|
||
|
Suppose you are on system 128.6.4.194 and you want to connect to system
|
||
|
128.6.4.7. Your system will first verify that 128.6.4.7 is on the same
|
||
|
network, so it can talk directly via Ethernet. Then it will look up 128.6.4.7
|
||
|
in its ARP table to see if it already knows the Ethernet address. If so, it
|
||
|
will stick on an Ethernet header and send the packet. Now suppose this system
|
||
|
is not in the ARP table. There is no way to send the packet because you need
|
||
|
the Ethernet address. So it uses the ARP protocol to send an ARP request.
|
||
|
Essentially an ARP request says "I need the Ethernet address for 128.6.4.7".
|
||
|
Every system listens to ARP requests. When a system sees an ARP request for
|
||
|
itself, it is required to respond. So 128.6.4.7 will see the request and will
|
||
|
respond with an ARP reply saying in effect "128.6.4.7 is 8:0:20:1:56:34". Your
|
||
|
system will save this information in its ARP table so future packets will go
|
||
|
directly.
|
||
|
|
||
|
ARP requests must be sent as "broadcasts." There is no way that an ARP request
|
||
|
can be sent directly to the right system because the whole reason for sending
|
||
|
an ARP request is that you do not know the Ethernet address. So an Ethernet
|
||
|
address of all ones is used, i.e. ff:ff:ff:ff:ff:ff. By convention, every
|
||
|
machine on the Ethernet is required to pay attention to packets with this as an
|
||
|
address. So every system sees every ARP requests. They all look to see
|
||
|
whether the request is for their own address. If so, they respond. If not,
|
||
|
they could just ignore it, although some hosts will use ARP requests to update
|
||
|
their knowledge about other hosts on the network, even if the request is not
|
||
|
for them. Packets whose IP address indicates broadcast (e.g. 255.255.255.255
|
||
|
or 128.6.4.255) are also sent with an Ethernet address that is all ones.
|
||
|
|
||
|
|
||
|
Getting More Information
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
This directory contains documents describing the major protocols. There are
|
||
|
hundreds of documents, so I have chosen the ones that seem most important.
|
||
|
Internet standards are called RFCs (Request for Comments). A proposed standard
|
||
|
is initially issued as a proposal, and given an RFC number. When it is finally
|
||
|
accepted, it is added to Official Internet Protocols, but it is still referred
|
||
|
to by the RFC number. I have also included two IENs (Internet Engineering
|
||
|
Notes). IENs used to be a separate classification for more informal
|
||
|
documents, but this classification no longer exists and RFCs are now used for
|
||
|
all official Internet documents with a mailing list being used for more
|
||
|
informal reports.
|
||
|
|
||
|
The convention is that whenever an RFC is revised, the revised version gets a
|
||
|
new number. This is fine for most purposes, but it causes problems with two
|
||
|
documents: Assigned Numbers and Official Internet Protocols. These documents
|
||
|
are being revised all the time and the RFC number keeps changing. You will
|
||
|
have to look in rfc-index.txt to find the number of the latest edition. Anyone
|
||
|
who is seriously interested in TCP/IP should read the RFC describing IP (791).
|
||
|
RFC 1009 is also useful as it is a specification for gateways to be used by
|
||
|
NSFnet and it contains an overview of a lot of the TCP/IP technology.
|
||
|
|
||
|
Here is a list of the documents you might want:
|
||
|
|
||
|
rfc-index List of all RFCs
|
||
|
rfc1012 Somewhat fuller list of all RFCs
|
||
|
rfc1011 Official Protocols. It's useful to scan this to see what tasks
|
||
|
protocols have been built for. This defines which RFCs are
|
||
|
actual standards, as opposed to requests for comments.
|
||
|
rfc1010 Assigned Numbers. If you are working with TCP/IP, you will
|
||
|
probably want a hardcopy of this as a reference. It lists all
|
||
|
the offically defined well-known ports and lots of other
|
||
|
things.
|
||
|
rfc1009 NSFnet gateway specifications. A good overview of IP routing
|
||
|
and gateway technology.
|
||
|
rfc1001/2 NetBIOS: Networking for PCs
|
||
|
rfc973 Update on domains
|
||
|
rfc959 FTP (file transfer)
|
||
|
rfc950 Subnets
|
||
|
rfc937 POP2: Protocol for reading mail on PCs
|
||
|
rfc894 How IP is to be put on Ethernet, see also rfc825
|
||
|
rfc882/3 Domains (the database used to go from host names to Internet
|
||
|
address and back -- also used to handle UUCP these days). See
|
||
|
also rfc973
|
||
|
rfc854/5 Telnet - Protocol for remote logins
|
||
|
rfc826 ARP - Protocol for finding out Ethernet addresses
|
||
|
rfc821/2 Mail
|
||
|
rfc814 Names and ports - General concepts behind well-known ports
|
||
|
rfc793 TCP
|
||
|
rfc792 ICMP
|
||
|
rfc791 IP
|
||
|
rfc768 UDP
|
||
|
rip.doc Details of the most commonly-used routing protocol
|
||
|
ien-116 Old name server (still needed by several kinds of systems)
|
||
|
ien-48 The Catenet model, general description of the philosophy behind
|
||
|
TCP/IP
|
||
|
|
||
|
The following documents are somewhat more specialized.
|
||
|
|
||
|
rfc813 Window and acknowledgement strategies in TCP
|
||
|
rfc815 Datagram reassembly techniques
|
||
|
rfc816 Fault isolation and resolution techniques
|
||
|
rfc817 Modularity and efficiency in implementation
|
||
|
rfc879 The maximum segment size option in TCP
|
||
|
rfc896 Congestion control
|
||
|
rfc827,888,904,975,985 EGP and related issues
|
||
|
|
||
|
The most important RFCs have been collected into a three-volume set, the DDN
|
||
|
Protocol Handbook. It is available from the DDN Network Information Center at
|
||
|
SRI International. You should be able to get them via anonymous FTP from
|
||
|
SRI-NIC.ARPA. The file names are:
|
||
|
|
||
|
RFCs:
|
||
|
rfc:rfc-index.txt
|
||
|
rfc:rfcxxx.txt
|
||
|
IENs:
|
||
|
ien:ien-index.txt
|
||
|
ien:ien-xxx.txt
|
||
|
|
||
|
Sites with access to UUCP, but not FTP may be able to retreive them via
|
||
|
UUCP from UUCP host rutgers. The file names would be
|
||
|
|
||
|
RFCs:
|
||
|
/topaz/pub/pub/tcp-ip-docs/rfc-index.txt
|
||
|
/topaz/pub/pub/tcp-ip-docs/rfcxxx.txt
|
||
|
IENs:
|
||
|
/topaz/pub/pub/tcp-ip-docs/ien-index.txt
|
||
|
/topaz/pub/pub/tcp-ip-docs/ien-xxx.txt
|
||
|
|
||
|
>--------=====END=====--------<
|