Auto-Discovery Daemon --------------------- 1. Introduction The auto-discovery daemon is a user level program to allow multiple hosts on a single network (for now) to become aware of each other. When a host running auto-discovery becomes aware of another host's existence, it can inform the openMosix kernel. The openMosix kernel can then add it to its internal map of existing machines. Currently the daemon is complied to run in alpha mode which means that does not interact or affect the openMosix kernel. Normally it would perform I/O on /proc/hpc/admin/mospe and /proc/hpc/admin/config. Instead it writes to two /tmp files: /tmp/om_mospe and /tmp/om_config in much the same way as it would to the /proc files. These /tmp files are binary. A small utility called showmap is included. By running showmap, the values that would have been written to /proc can be seen in a human-readable format. For example, the "mosix.map" that would have been passed to the kernel can be viewed. This is done so the daemon can be tested in a few environments before it is used in a potentially harmful way. (When not compiled in ALPHA mode, this utility is similar to running "setpe -r".) 2. Requirements Your kernel must be configured to support IP multicast (CONFIG_IP_MULTICAST kernel option). This is probably configured by default. 3. Building and Running % make clean % make % ./omdiscd -n The "-n" or "--nodaemon" options cause omdiscd to run in the foreground, sending messages and debugging output to standard error. Without this option, all output will go to syslog, potentially causing confusion with testing. By default, the daemon will allow the kernel to choose an interface to use for multicast communication. A specific interface can be used by adding the "-i " option. On my cluster, "ariette", I run (at least) three copies for testing. The gateway "ariette" specifies the "-i" option because it has multiple interfaces, only one residing in the cluster: ariette# ./omdiscd -n -i eth1,eth0 node1# ./omdiscd -n node2# ./omdiscd -n [...] node1 and node2 are on the same network as eth1 on ariette. Specifying two interfaces (and no multicast TTL) means that auto-discovery will route messages between the network connected to eth1 on ariette and the network connected to eth0 on ariette. In addition, node1 and node2 will configure its map with one of ariette's interfaces as an alias entry. To run showmap: % ./showmap There are some tests that are used to validate some internals of the daemon. The tests can be built and run by the following commands. It is important to note that building for testing produces different binaries, so a "make clean" is necessary. To build and run tests: % make clean % make -f Makefile.test % ./test To run the daemon in live mode, remove (or macro out) "#define ALPHA" from openmosix.c and showmap.c. Then make clean, and make. 4. Limitations 4.1 Node-id Generation Node identifiers are generated by taking the last two octets of the IP address of a given machine. The obvious problem with this is potential node-id collisions, which I think will not arise in most clusters. 4.2. Routing When auto-discovery is doing routing of messages between networks, there is no routing loop detection---in fact, the route that the auto-discovery messages take may be different than the traffic between the openMosix nodes. For anything but a very simple network, use of real multicast routing (e.g. mrouted) is recommended. 5 Command line options: --interface or -i [,[,]...]]: The interface option can be used to specify between one and six interfaces which will be used in the openMosix cluster. It is specified as a comma separated list. Each interface listed will receive and send multicast notifications, unless the "-m" or "--multicast-ttl" option is specified. In that case, only the first interface listed will send and receive messages, and others will be configured as openMosix aliases. When this option is not specified, auto-discovery will allow the kernel to select a default interface. This option is not necessary when a host only has one configured interface. --nodaemon or -n: The nodaemon option causes auto-discovery to run in the foreground, and not as a daemon. All output will go to standard error, as opposed to syslog when running as a daemon. --multicast-ttl or -m : When this option is specified, the value passed is used as the time-to-live (TTL) for multicast. This option assumes that multicast routing is configured on gateways connecting clusters. When it is specified, auto-discovery configured with multiple interfaces will only send and recieve notifications on one interface. --help or -h: displays basic usage. 6 Debug Messages: There is a function call from main to log_set_debug(). This is used to enable debug messages for various features. See log.h for a list of parameters. Full debug messaging can be enabled on a running daemon by giving it a SIGUSR1 signal. SIGUSR1 signals toggle the daemon between the compile-time default and full messaging. 7 TODO List: Below is a group of things which still need to be done to the daemon. To do: + net.c/openmosix.c: add basic gateway discovery. + general: add ability for nodes to leave the cluster. + general: write man page. + event.c: add event handling code if/when finally needed. + general: add pid file + sys.c: add omdiscd.pid file to /var/... + general: clean up command line options (add --route, etc). Performance/Optimizations: + net.c: if more than one message is waiting to be received, meaning multiple nodes will join, place messages in a queue and process them in batch. Things to decide: + should multiple join messages be sent? perhaps one per hour? + should nodes be removed from the kernel if they send a leave message? + IPv6 support here and in openMosix