>From version 0.0 to 0.2
-----------------------

* Algorithm:

Solve-State(state, prev_states, ret)
    if (state == empty_state)
        Push(ret, state)
        return SOLVED
    for each move possible on state
        new_state <- state
        apply(new_state, move)
        if (new_state in prev_states)
            ; Do nothing
        else
            add new_state to prev_states
            if (Solve-State(new_state, prev_states, ret) == SOLVED)
                Push(ret, state)
                return SOLVED
    return UNSOLVED

Freecell-Solver(init_state)
    if (Solve-State(init_state, Null, ret) == SOLVED)
        print "State was solved"
        while (state <- Pop(ret))
            print state
    else
        print "I could not solve this board";

Basically a DFS scan.

* Initial programming in perl:
------------------------------

** State was represnted as a perl5 data-structure.

** State was duplicated by serializing it and unserializing it.

** @prev_states contained the states unsorted.

** Only a small amount of the tests were implmented before the conversion to
   C.

** Running the program from the command-line gave it a very slow output.

** Conclusion: perl is slow, if you want your application to be as fast
    as possible, code it in C from the first stage.


* Converting to C:

** Wrote the functions card_user2perl card_perl2user etc. in C (show the card.c module)

** For representing a board used the following scheme
    - include some code of state.h
    - Note that non-vacant cards in stacks have to be set to the
        empty card so the states can be memcmp'ed.

** I decided to use memcmp for comparing states. It's faster than a stack by
    stack, card by card comparison of states.

** For getting the state information I wrote some macros:
    stack_card_card_num, stack_card_deck() etc.
    I figured out that they will be used a lot in solve_for_state
    so writing them as functions will consume a lot of CPU.
    Thus, they were written as macros.

** In order to avoid a case where two states which differ only with their
    order of the stacks and/or freecells were generated, the states were
    "canonized":
    - The stacks were qsorted according to their bottommost card.
    - The freecells were qsorted according to their card.

    That way, the possibility is avoided.

    - Downside: the order of the stacks and freecells is not preserved.

    Show the code of canonize_state, card_compare, stack_compare

** The prev_states were stored in a sorted array, with an unsorted
   sort-margin.
    - bsearch() was used to search for the current state inside
        the sorted array.
    - lfind() was used to search for it inside the sort margin.
    - A state which had not been previously encountered was placed at
        the end of the sort margin.
    - Once the sort margin grew beyond PREV_STATES_SORT_MARGIN the
        array was qsorted(), and the length of the sort margin was
        set to 0.
    - When the array exceeds its allocated size it is
        realloc()ed.

    - PREV_STATES_GROW_BY was defined as 256 while
        PREV_STATES_SORT_MARGIN was defined as 16. Both could be
        defined to a different value by the user.


** The various tests conducted:
    (Enumerate the tests)

** Two tests are special: moving a freecell or stack card to the foundations.
    Moving a card to the founds where both of the founds of the other colour
    are higher than its card number plus 2. In that case if the child-state is
    not solvable neither is the original state.

** Note that the entire checking and adding code for new states was
implemented inside the tests code. (not in a separate function).
    - Show the code of freecell.c

* Testing the final code:

** I generated 1000 test boards. (unfortunately not in a reproducible manner)
and ran the function of them.

** Most of the boards ran fine. Some took too long to solve (several dozen of
minutes).

** Out of which, some were terminated in the middle.

** Some boards were reported by the program as unsolvable. (as I discovered
later, they were all solvable).

** I placed Freecell Solver version 0.2 on a web-site, and posted an
announcement for it on Freshmeat.

>From version 0.2 to 0.4
-----------------------

* Changes I had in mind:

** The states are quite large in memory size. Reduce their size by using bytes
    for their representation.

** Put the "check and add state" code in a separate function, so duplicate code
will be avoided.

* Implementing COMPACT_STATES:

** Describe:
    - card is a char - low half is card num upper half is deck
    - deck is a char
    - stack counter is a char

** Because of the macros I used to access and modify the state's data, I was
   able to rewrite the macros as COMPACT_STATES ones and have most of the work
   done for me.

** I decided to include a compile-time directive to choose between the
    COMPACT_STATES and the FAST_STATES.

** state_as_string required a thorough modification because it did not use the
    macros in the first place.

** I noticed that COMPACT_STATES were much faster than the so called
    FAST_STATES. As some of you remember I posted a message to linux-il
    asking why.

    http://plasma-gate.weizmann.ac.il/Linux/maillists/00/04/msg00356.html

    The answers I received were:

    1. Less memory means less cache misses, thus faster running.
    2. Operations on chars don't take more time than operations on 16-bit
        or 32-bit integers.
    3. The Pentium cache is 8-bits wide.

    To me, #1 seems the most likely.

** Thus, I decided to rename FAST_STATES as DEBUG_STATES because
    they are easier to debug using gdb and friends.

** Conclusion: use as little bit-space as you need, and use a lot of
    typedefs and macros to manage them.

* Implementing check_and_add_state as a separate function:

** I took a look at the code and realized that the two distinct tests are
    going to cause me problems.

** I implemented the function check_and_add_state(), which:
    1. Accepts a pointer to the new state.
    2. Checks if it is already present in prev_states.
    3. If so, returns STATE_ALREADY_EXISTS
    4. If not, calls solve_for_state on it.
    5. If the inner solve_for_state succeeeds returns STATE_WAS_SOLVED
        to  solve_for_state.
    6. If it didn't return STATE_IS_NOT_SOLVEABLE to solve_for_state.

    - Show the code of check_and_add_state.

** I then coded four macros:
    1. sfs_check_state_init
    2. sfs_check_state_finish
    3. sfs_check_state_begin
    4. sfs_check_state_end

    All the tests excepts the two NSFOS (not solvable for original state)
    used sfs_check_state_begin and sfs_check_state_end.

    The two NSFOS states used just sfs_check_state_init and
    sfs_check_state_finish.

    - Show the code of the macros.

** At the end, the routines at the end of the tests, only had the code that
    requires to perform the moves on the new state. All the other code was
    written inside the macros and the function check_and_add_state.

** Conclusion:
    Avoid duplicate code, use functions and macros to encapsulate it.

* Switching from qsort to merge:

** In my "Introduction to Data Structures and Algorithms" class I learned
    that Quick Sort has a minimal complexity of n*log(n) and a maximal
    complexity of n^2. Plus, when the array is sorted maximal complexity
    is achieved.

    In FCS, the array of prev_states is mostly sorted, except for the sort
    margin at the end. Thus, it will rougly take Omega(n^2) to sort it.

** Since the lion's share of the array is already sorted, we can use a merge
    function (which I also studied in class) to sort it. Merge works on two
    sorted arrays, so we need to sort the sort margin before that, or keep it
    sorted too.

    Merge takes O(n) to perform.

** I decided to keep the sort margin sorted, and place it separately from the
    main prev_states array. That way, I can allocate an extra place at the
    end of the main array for the merging.

** By performing the merge from the end I was able to merge without allocating
    a new array and then copying it to the original one.

** I figured that since the main array will usually be considerably larger
    than the sort margin, then an item-by-item check will be wasteful. Thus, I
    decided to base it on binary searches.

** The bsearch() function was inadequate because it doesn't the place
    in which an unfound item should be inserted. Thus, I had to write one of
    my own.
    - show the code of SFO_bsearch

** Based on SFO_bsearch I wrote SFO_merge_large_and_small_sorted_arrays.
    - Show the code.

** SFO_bsearch and SFO_merge_large_and_small_sorted_arrays were originally
    written in perl, and then translated to C. This time, writing them in perl
    first was a good idea, because it reduced the development time. Remember
    that the running time for the testing was not that critical.

** SFO_bsearch also allowed me to add a new state at the appropriate place in
    the sort margin.

** I rewrote check_and_add_state to use the merge.
    - Show the code in freecell.c

** Benchmarks:
    For 50 test boards:
    - FCS 0.2 takes 1 minute and 11 seconds to solve them.
    - FCS 0.4.1 takes 14 seconds
    A factor of 5 improvement.


* Implementing INDIRECT_STATE_STORAGE

** Motivation: I noticed that when I ran FCS on Windows NT 4.0 Workstation,
    with DEBUG_STATES, I ran out of stack space. The reason: solve_for_state
    accepts the entire state data structure as a function argument, and the
    DFS depth of that board was considerable.

    Therefore, I decided to implement a scheme in which only pointers to the
    states will be passed to solve_for_state.

** I decided to replace prev_states with indirect_prev_states which would
    be an array of pointers to states.

** Because of the algorithm, only the last allocated state needs to be
    released.

** At first, I tried to implement the actual states, to which the
    pointers will point as a stack, that will be realloced. Then I realized,
    that because of the realloc() call, the physical location of the states
    will change.

** In Linux, at least on the kernel level, the memory is allocated in
    powers of 2. Thus, it would be a waste if every state were allocated
    separately. I decided to allocate the states in chunks that fit a 64KB
    boundary. I called these chunks packs.

** I wrote the state_ia (state indirect allocation) routines.
    - show the fcs_isa.c file.

** Note that there is a compile-time option to compile with either the
    DIRECT_STATES_STORAGE mechanism or the INDIRECT_STATE_STORAGE.

** check_and_add_state and the sfs_check_state_ macros were adapted to
    allocate the states indirectly.

** Some macros make sure that "state", "ptr_state", "new_state" and
    "ptr_new_state" are all available in the code scope.

** Benchmarks:
    For 100 test boards on a Pentium 16 with 48 MB of RAM:

    Debug Direct: 44 minutes and 37 seconds
    Debug Indirect: 8 minutes and 10 seconds
    Compact Direct: 7 minutes and 7 seconds
    Compact Indirect: 1 minutes and 54 seconds

    As you can see, the indirect state management improves the running time
    by a big factor in both cases.

** Conclusions:
    - When passing structs as arguments to functions always do that by a
        a pointer.
    - Never sort, search in, etc with a direct vector of structs. Instead
        use an array of pointers.

* Usability features:

** modified initial_user_state_to_c so it will also accept states in the
    middle of play:
    1. With stacks of variable length.
    2. With filled freecells.
    3. With non-initial decks.

    - Possibly show the source, although it's just an ugly C parsing
        code.

** Wrote a function check_state_validity that checks that all cards are
    present in an initial board once and only once.

    - When I used the program I noticed that I had many mistakes when
        trying to manually input boards.

* I placed Freecell Solver 0.4 on its web-site and posted an announcement
    for it on Freshmeat.

>From version 0.4 to 0.6
------------------------

* I randomized a 1000 board and noticed that FCS could not solve all of them.
    I decided to investigate and after patching GNOME Freecell so it will
    input a board from a file, I realized that I forgot to add a test that
    will move a card to a parent card on the same stack.

** I added the extra move and afterwards it managed to solve all of them.

* Moving the global variables to an instance strucutre.

** FCS made extensive use of global variables. But I wanted it also to be
available as a library sooner or later. Now, if only one instance of the
library can be run we are set. But what about multi-threaded applications?

** I decided to create an instance structure that will be passed as the first
    parameters to solve_for_state() and check_and_add_state(). That way, more
    than one simultaenous instances can be run at any given time.

** I moved prev_states, indirect_prev_states, num_prev_states, etc. to the
    instance variable. Gradually, I eventually moved all the variables their.

** I created functions to initialize and terminate an instance. (show the code
    of freecell.c)

** Conclusion: Global variables are wrong because they do not go well with
multi-threading. In C++, use objects. In ANSI C, create an instance struct
that will hold all the "global" variables of the function, and create
functions to instantiate it.

* Remembering the stack and freecell order.

** Because Freecell Solver "canonizes" the order of the stacks and freecells
    they are not preserved throught the solution that is displayed to the
    user. (You can give an example).

** This is a big usability flaw. I wanted to correct it.

** The reason for the canonization is that I wanted that states with a
    different ordering of the stacks or freecells will not be generated.

** If I place an ordering integer inside each stack and freecell, I won't be
    able to memcmp() two states.

** But what if I put them outside the main state sturcture?

** Solution: I created a structure called state_with_locations_t that:
    1. Contain the state as its first element. (So its pointers can still
        be memcmp()ed)
    2. Contain 8 integers that specify the order of the stacks.
    3. Contain 4 integers that specify the order of the freecells.

** I had to modify canonize_state() that it will swap the order specifiers
    as well as the stacks or freecells themselves. I could not do that
    with C's qsort() so I wrote my own insertion-sort routine. (Show the
    code of canonize_state() in state.h)

** I also had to modify the code itself to access the s element of the
    state_with_locations stucture. I did some of it using macros.

** I also had to modify the state_ia routines to manage state_with_locations
    instead of regular states.

** state_as_string was modified to make use of the new locations.

** Eventually, the order of the states was preserved.

** Conculsion: The user is always right. If you make some software design
    considerations with possible implications for the user, always make
    sure the user does not notice.

* Adding the fcs_ prefix.

** Despite moving the global variables to the instance struct, FCS still
    have a lot of functions. But names like state_as_string may might
    as well be names of user's routines. Thus, FCS may interfere with the
    main program of the user.

** Solution: Add a "fcs_" or "freecell_solver" prefixes to the names of
    the global functions. That way, they won't collide with names of
    other functions, in case Freecell Solver is compiled as a library.

** Conclusion: Avoid namespace collision. In C++ use the namespaces facility,
    In C append prefixes to the functions you create.

* Each test in a separate function.

** To enumerate the different moves possible on each state, solve_for_state
    performs several tests. Those tests include:

    1. Moving a top card to the foundations.
    2. Moving a card to a parent card.
    3. Moving a non-top card to the foundations.
    4. Moving a sequence of cards to a different parent.
    5. Emptying a stack into the freecells.

** Those tests can be done in any order whatsoever. It's hard to know in
    advance what will be the order which will solve a state more quickly.

** Solution: Let the user specify a test order from the command line.

** Implementation: The code of solve_for_state was split into 10 test
    functions and one main function. Each test function implemented one of
    the tests and called check_and_add_state().

** The solve_for_state() function holds ten pointers to test functions, and
    executes them one by one according to a test order which is specified in
    the instance struct.

** The instance can also specify the number of tests to conduct, thus enabling
    the user to omit some of the tests.

** Results:
    - For every board there's usually a test order that can solve it very
        fast. For other orders it usually get stuck and doesn't finish
        very quickly.

    - For example, running FCS 0.6.2 with a limit of 150000 boards with the
        following test orders:

        0123456789:
        -----------

        Duration: 19:25 minutes
        Number of unsolved boards: 19

        0124567839:
        -----------

        Duration: 14:27 minutes

        Number of unsolved boards: 12

        0123467:
        --------

        Duration: 2:50 minutes
        Number of unsolved boards: 5 (some of them after a complete scan)


** Conclusion: If several things can be done with any given order, make sure
    they are implemented as separate routines. You may wish to play with the
    order and see which one gives the optimal results.


* Interface separation and code cleaning.

** Like I said, I wanted FCS to be also available as a library.

** I placed the initialization, solution and de-allocation routines in their
    own functions, instead of inside main.

** The following functions were created:
    - freecell_solver_alloc_instance
    - freecell_solver_free_instance
    - freecell_solver_init_instance
    - freecell_solver_free_instance