Node Discovery Protocol
-----------------------

1 Motivation

Within an openMosix cluster, all participating nodes must have a loosely 
synchronized map of all nodes.  In other words, all nodes must be aware of each
other, but their maps need not be consistent at any given instant.

The purpose of the node discovery protocol is to fulfill this requirement.  It 
provides a mechanism to allow existing nodes to recognize new nodes which would
like to join the cluster, and a method to allow new joining nodes to build maps
of existing machines within the cluster.


2 Design

2.1 Messaging

The initial implementation of the node discovery protocol is minimal.  When a 
new node is initialized (when node discovery is activated), the following 
sequence of events occurs.

1. The new node is initialized, which means that its network interface(s) are 
   configured up and openMosix is ready to operate.

2. The new node sends a "join" message to all other openMosix nodes signifying
   its existence.  Receiving nodes can then add the sending to their map.  

   OpenMosix has the concept of interface aliases.  If a host has can send 
   packets with different source addresses on the same network, and would like
   other nodes on the network to recognize them as the same host, then an 
   alias entry mapping one node identifier to two interface addresses can be
   specified.  As part of a join message, up to six aliase entries can be 
   specified.  Hosts will keep track of these addresses in order to know how
   to set their "number of gateway" entries, as well as aliases.

3. Each receiving node can then respond by sending an "acknowledgment" 
   broadcast message to all openMosix nodes signifying its existence.  This 
   broadcast helps nodes maintain more accurate maps.  The same alias
   information can be passed with acknowledgements as with joins.

4. [ not implemented or decided ] Before a node becomes unavailable, it can 
   send a "leaving" message to all other nodes.  Other nodes can remove the
   departing node from their map, along with its aliases, and gateway 
   entries.


3 Implementation

3.1 Communication

All nodes in the openMosix cluster will join a multicast group to be used for 
auto-discovery communication.  In many clusters this will effectively be a 
routeable broadcast, because all nodes will join the multicast group.  

When a auto-discovery is activated (the auto-discovery daemon is started), it
sends a "join" message to the multicast group.  Upon receipt, nodes running the 
auto-discovery daemon send an "acknowledgment" message to the multicast group.

Another approach would be for receiving nodes to send a single UDP datagram to
the sending node.  This approach has two disadvantages: (1) it does not 
necessarily reduce traffic because of potential ARP requests, and (2) does not
have the benefit of aiding correction of other node maps, perhaps due to a lost
datagram.  

3.1.1 Message Structure

The structure of all messages (payload of each datagram) sent by auto-discovery
is as follows:

(This is a future structure, currently the mskX fields are not there)

 0000 0000 0011 1111 1111 2222 2222 2233 3333 3334 4444 4444 4544 5
 0123 4567 8901 2345 6789 0123 4567 8901 2345 6890 1234 5678 9012 3
+----+----+----+----+----+----+----+----+----+----+----+----+----+-+
+mgcn|src |msk |ifn1|msk1|ifn2|msk2|ifn3|msk3|ifn4|msk4|ifn5|msk5|t|
+----+----+----+----+----+----+----+----+----+----+----+----+----+-+

Definition of fields:

1. mgcn: a magic number to aid in verifying integrity of a message.  If a 
   preset magic number is not the first four bytes of the payload of a packet,
   it is discarded.

2. src: the source address of the message.  This is used instead of the SRC
   in the IP header because it makes routing easier.

3. msk: the netmask for the source address.

4. ifnX: Interface alias fields.  Each of these fields is an interface which 
   nodes should consider an alias for the source address of the datagram.
   If there are no aliases, these fields are set to zero.

5. mskX: The respective netmask for each ifnX.

4. t: A type field describes the type of a message.  Valid types are 'j' for
   a join message, and 'a' for an acknowledgement message.
	

3.2 Interaction with openMosix Kernel

The auto-discovery daemon communicates with the openMosix kernel by reading and 
writing to /proc/mosix (/proc/hpc).

[...]

4 Constraints:

1. Join messages are not retransmitted.  In the case that one is lost, the
   system relies on multicasted acknowledgements to correct missing map entries.

2. Node identifier selection is based on the last two octets of an IPv4 address
   of the first specified interface.  When interfaces are configured with
   certain netmasks (e.g. 0xffff0000), node identifier collisions can occur.

3. There is no routing loop detection.  Using real multicast routing is 
   recommended in complex networks.

5 Future:

5.1 New /proc Interface 

Below is a proposal of what a new /proc interface for autodiscovery might
look like.  I've started on it a bit, but it's up to the openMosix team if
they actually want it. 

The existing /proc interface was not designed for dynamic node addition and
removal.  For example, in order to add a single node to the kernel's map, an
entire array of structures must be written, and these structures match 
structures within the kernel.

A cleaner abstraction and interface between auto-discover and /proc should be
established.  For example:

1. /proc/hpc/autodiscovery/add:  A user application can write one or more 
   new node entries to this interface notifying the kernel to add them to
   its map.

2. /proc/hpc/autodiscovery/remove: A user application can write one or more 
   node removal entries to this interface notifying the kernel to remove them
   from its map.

3. /proc/hpc/autodiscovery/list: A user application can cat this interface to
   display nodes which reside in the kernel's map.

Each of these interfaces would use ASCII to communicate.