[Prev][Next][Index][Thread]

Re: some bugs in netboot



Hello all,

David G Andersen wrote:
> 
> Thanks for the note, Bernard.  I'll take a bit of time today and whack on
> netboot to get rid of some of the problems you've pointed out here, and
> send out a patch to the list in a while.

Please don't spend too much work on this. I already did most of the work
but didn't take the time to make a patch out of this that can be integrated
back into the regular oskit distribution. Seems that it has to be done now.
If I don't submit the patch by let's say thursday please remind me, because
I've probably forgotten that I should really look after it.

I even have a dns lookup function (ugly, but working). I'm not sure if I'll
include it in my first patch, because it currently relies on the DNS server
address being passed via a kernel environment variable instead of using the
DHCP/BOOTP reply value for dns servers.

By the way: thanks a lot for the netboot metakernel - it saved literally
weeks of my time already, and my project is just in its beginnings. The only
thing I'll be missing is that it currently cannot load the debugging symbols
included in an ELF executable. Some brain-transfer work (AFAIK one of the
loaders included in the oskit supports it). Generally merging of similar
or identical loader code would be a good thing (tm) to do.

> Lo and behold, Bernard Cassagne once said:
> >
> > I have been using netboot for some time now, and I found it immensely useful.
> > However, in the process of using it, I have found some bugs, here they are :
> >
> > 1/ first, something minuscule : main is declared void instead of int, not
> >    really a bug, but gcc complains (and compilations without warning makes
> >    the user happy :-)
> >
> > 2/ in file main.c, the function build_cmdline() has the declaration
> >    char *toks[strlen(input)];
> >    this is incorrect since input can be NULL (it will be everytime the command
> >    is simply a <progname> (no booting option, no argument to main)

I think the first two are already fixed in the 990402 snapshot which can be
downloaded from the oskit ftp server.
> >
> > 3/ in file main.c, in the function main(), the variable input is initialized
> >    on entry :
> >    char *input = buf;
> >    It should be initialized at every iteration in the loop beginning with
> >    the label reprompt: since input is moved at the begining of the <progname>
> >
> >    I know the bug will show up in very rare events (in the case of plenty of
> >    erroneous commands beginning with plenty of white spaces) but it is there.
> >

This is fixed indirectly in my private version because I rewrote some parts
of the input handling to make it easier to expand.

> > 4/ now something more serious.
> >    netboot does not aswer ARP requests : it sends ARP requests and relies on
> >    the fact that the server will update its arp table with the information
> >    provided with the request of the client.
> >
> >    This scheme will not work in the following sequence of events :
> >
> >    - the user types a command <machine>:/<dir>/<file>
> >       if <machine> is ok, but <file> is not, the client will issue an
> >       ARP request , and will memorise it knows the ethernet address of the
> >       server. Since <file> is erroneous, the command results in an error.
> >
> >    - it takes some time for the user to understand why it is wrong
> >       (typo, file not in the right place etc ...)
> >       during this time, the arp entry IN THE SERVER obsoletes.
> >       [ my server is a linux box and arp entries obsolete in a matter of a
> >         few seconds ]
> >
> >    - now, the user types the right command and it does not work !
> >      Explanation : the client knows the ethernet address of the server and
> >      does not issue an ARP request. But the server will issue an ARP
> >      request and the client will never answer.
> >
> >    - at this moment, netboot is in a completely useless state : right
> >      commands are not serviced.
> >
> >    The problem shows up only in the case of a wrong command followed by a
> >    right command. This is so because after every right command, netboot
> >    completely re-initialises itself.

[ descrtiption of a crude but working workaround omitted ]

Adding ARP replies turned out to be really easy once I found out how the
network code works.

Greetings
Klaus

-- 
Klaus Espenlaub                      Email:  espenlaub@informatik.uni-ulm.de
Universitaet Ulm                     Phone:  +49 731 50-24178
Abteilung Rechnerstrukturen          Fax:    +49 731 50-24182
D-89069 Ulm                          Office: Building O27, Room 316


References: