Master thesis in data compression
(Generation and compression of endgame tables in chess
with fast random access using OBDDs)
As my
master thesis I'm working on endgame tables in chess. My
objective is to maximise the number of endgame combinations that
can be stored in RAM, while still being able
to access the tables fast enough to be used during full depth
searches.
My original goal was to use data mining methods to improve the
strength of an AI chess player, but I didn't have any concrete
ideas, and after implementing a functional engine my interest
shifted to endgame tables and how to compress them (motivated by
having just completed the course
data compression).
2005.02.15: The thesis has just been handed in
(available from here).
2005.03.15: The defence, where I was also examined in the course
Data
Compression.
My supervisor is
Peter Bro Miltersen
Before I started working on the endgame tables, I implemented a
decent chess engine (an alpha-beta search with hash tables,
quiescence search, iterative deepening). Also a lot of work was put
in making functionality for e.g. loading games (in pgn or fen format).
I started doing some experiments with the nontrivial KBBK endgame (king
and 2 bishops (one on black square and one on white) versus king).
This is explained
here, together with some basic introduction to endgame tables
in chess.
This experiment showed that the KBBK endgame table of size
10*32*32*64*2 (1310720) bytes could be compressed down to 181353
bytes using gzip.
However, for this purpose it is not an option to decompress an
entire file, just for accessing one single value. Hence the
compressed data format should allow for fast retrieval of a single
value. That's why I have focused on compression based on ordered
binary decision diagrams which should allow decompression of a
single entry approximately as fast as a cpu engine evaluates a position.
Results
- The probe speed varies from 0.5 to 1.7 micro second per
lookup (on a 3 GHz P4). This is about as fast as a state-of-the-art computer
program searches one position.
- All 2-4 men endgames with distance to mate values are
compressed to about than 30 MB - or about the same as in Nalimov's
format.
-
All 2-4 men endgames with win/draw/loss values are compressed to
about 1 MB. This is interesting as it indicates that all 2-5 men
endgames will have a size of only 250 MB (or 125 MB if they are
only used for one side to move). Also, some programs use special code
for e.g. recognising stale mates in the KQQK endgame and hence avoid
the need for this endgame table, but when it is compressible to
less that 10 Kbytes it can be justified to include it (KQQK is
uncompressed 423 Kbytes if 2 bits used per entry).
-
Only a few 5 men endgames have been compressed so far. Further
optimisations are needed to make the algorithms feasible for all 5
men endgames.
-
A separate program for indexing the compressed endgame tables has
not yet been made. There is still a few things I would like to
examine before resting firmly on the precise format.
The thesis
Presentation at the defence
(in danish, 20 pages pdf, 587.998 bytes)
Presentation in the course Strategic
Game Playing (in danish)
Links:
Rules, formats, definitions
Legality of positions of simple chess endgames.
Wikipedia:
Chess,
Rules of chess,
Chess terminology,
Computer Chess,
Computer Olympiad.
Specification of
Forsyth-Edwards Notation for a chess position
Binary Decision Diagrams
Dynamic reordering by sifting (slides)
Endgames:
online search in
Nalimov's 3-5 men endgames
(relevant
frame).
Download Nalimov's
Tablebases (Often too many users)
Download Nalimov's
Tablebases (Only 3-5 men endgames available)
Space-efficient indexing of chess endgame tables (gzipped .ps)
DarkThought. Contains most (all?) of the book "Scalable Search
in Computer Chess".
An Approach to Data Compression in Coding Chess Positions.
How many bits are needed to represent a typical chess position?
Chess programming:
In best first order:
Using bitboards
XBoard (a graphic interface used by many chess engines -
including mine).
Chesslib (here I have downloaded a file containing 2.000.000+ games)
Chess programs:
See Tim Mann's Chess
Pages for the programs below:
Beowulf project. C source code available.
Other stuff:
Chinook, a checkers program
Arimaa, an even greater programming challenge.
Data mining:
Backgammon, neural net
Programming
Threads, concurrency, etc. in C
Using and Porting the GNU Compiler Collection (GCC)
Links to more links
Articles about computer chess