[Prev][Next][Index][Thread]

BUG (?,fluke) on iret to user mode



[Reminder: this may be a bug in my understanding!]

Since I couldn't find anything like a "return to userland" within OsKit, I
decided to have a look at the Fluke code to see how that worked. I believe I
have identified a bug in Fluke's trap.S, and I would be interested to know
if/why I am mistaken.

The problem lies in the IRET at line 542 in trap.S. The issue is that the
IRET can take a page fault if pageframe(EIP) is invalid or has inadequate
permissions. This manifests either as a general protection fault or a page
fault (I don't recall which) that the processor reports as a *kernel* page
fault incurred by the IRET instruction. If you ask me, it's a messed up way
of reporting such things, but there it is.

The workaround used in EROS is to have a label on the iret instruction and
recognize this special case in the GP fault handler. In the case of a fault
in the IRET instruction, the processor spills an exception frame all over
the stack that we are currently returning on. In EROS, it actually dumps
this state into the task control block. Fortunately, the stuff it overwrites
(at least in the stack frame layout that EROS uses) is by this point
recoverable by perfoming a PUSHA (this is part of why our stack frame is
arranged the way that it is). The GP handler and the page fault handlers
recognize the special case of a fault on the IRET instruction and patch up
the frame to make it look like this fault was incurred by the instruction
pointed to by EIP in user land. It then simply transfers control back into
the kernel as though the exception had come from there in the first place.

There are a variety of other ways you can go south in the return code that
the fluke 0.5 code doesn't seem to handle, mostly having to do with a
debugger setting the segment registers to contain invalid selectors.
Fortunately, these can be caught in the register setting code.  Since I'm
sure that both Bryan and Mike read the processor manual at least as closely
as I did, I'm wondering how the page fault on IRET error is prevented?

As an aside, the EROS trap handler didn't handle this originally either. I
learned about it the hard way when running the kernel under artificially
starved memory. Sure enough the cleaner nailed the active user page, and
there we sat with little GPF-flavored bits on the floor... :-)


Jonathan


Jonathan