Hi, I replaced LBAcache 09jun2004 by a new 17jun2004 version.
This should hopefully fix the (XMS) errors in the new 16-way
(instead of fully) associative 6/2004 LBAcache family. Please
test.

Change: better xmscopy.asm error messages / checks, fixed a
stupid binsel2.asm bug which allowed the 16-way assoc code to
search for / allocate cache in up to (surprise surprise) 16
cache elements (each 8k in size usually) beyond the end of the
cache.

http://www.coli.uni-sb.de/~eric/stuff/soft/ lbacache-17jun2004.zip

Some statistics (Tom's RAREAD on my MVP3 board / K6-2 500 MHz CPU with 100 MHz
SDRAM; UMBPCI-UMBs are excluded from all memory (L1/L2) caches, alas)...

The change from fully (all places in cache can be used for everything) to
16 way associative mode (only 16 element slots can be used for every
particular element - the sector number determines the search range) means
FASTER search in the table but WORSE cache hit percentage: Sectors have more
often to be removed from cache again, because the cache is less flexible in
16 way associative mode when it comes to allocating new cache slots.

If all accesses are cache HITS: Speed gain from 1 to 4 MB/s in UMBPCI UMB,
3.5/5.5 to 8.x MB/s otherwise (3.5 with FDXXMS, 5.5 with DREMM386, both
with fully associative cache, 8.x with FreeDOS EMM386. Sorry, but I did not
re-test the new cache with FDXXMS / DREMM386 again, nor did I re-test the
old cache with FreeDOS HIMEM / EMM386. You may want to do that yourself...
I used the RAREAD "cached" test modes.)

If all accesses are cache MISSES (linear read of huge amounts of data will
result in cache misses usually), I get a speed gain from... Argh... The
above was the data for the MISSES. Now for the raread "cached" / cache HIT
data: Speed gain for cache HITS for LBAcache in UMBPCI-UMB: 4->16 MB/s.
For LBAcache in fast normal RAM: fully assoc + FDXXMS 8.x MB/s, fully assoc +
DREMM386 50 MB/s (!?), 16-way assoc plus FreeDOS HIMEM / EMM386 50-58 MB/s.

Some more systematic tests would be a nice idea, I guess .

Note that both "all HITS" and "all MISSES" are no realistic measures.
You should test with some real-life program and check 1. the SPEED (use
RUNTIME or a stopwatch) and 2. the CACHE HIT RATIO (use LBACACHE STAT
and if needed also ZERO and SYNC commands of LBACACHE).

It may sound cool to have 100-300% faster cache miss processing and
0%-300% faster cache hit processing, but if cache hit percentage dropped
too much because of the 16-way-assoc (as opposed to fully-assoc) behaviour,
you will get WORSE overall speed.

Happy testing! Maybe I could make associativity a command line option?
Note that speed differences will especially occur on slow CPU / RAM for
bigger caches (you will have guessed that). On fast systems with small
caches, the cache hit percentage drop might outweigh the gains. If this
happens, try using a bigger cache to get the cache hit percentage higher
again.

[Summary of stats: fully 5/2004 -> 16way-assoc 6/2004 cache speeds MBy/s:
Cache MISS: umbpci 1->4 fdxxms 3.5->? dremm386 5.5->? himem/emm386 ?->8.x
Cache HIT:  umbpci 4->12 fdxxms 8.x->? 50->? himem/emm386 ?->50-58 MBy/s
...]

Eric