>From version 0.0 to 0.2 ----------------------- * Algorithm: Solve-State(state, prev_states, ret) if (state == empty_state) Push(ret, state) return SOLVED for each move possible on state new_state <- state apply(new_state, move) if (new_state in prev_states) ; Do nothing else add new_state to prev_states if (Solve-State(new_state, prev_states, ret) == SOLVED) Push(ret, state) return SOLVED return UNSOLVED Freecell-Solver(init_state) if (Solve-State(init_state, Null, ret) == SOLVED) print "State was solved" while (state <- Pop(ret)) print state else print "I could not solve this board"; Basically a DFS scan. * Initial programming in perl: ------------------------------ ** State was represnted as a perl5 data-structure. ** State was duplicated by serializing it and unserializing it. ** @prev_states contained the states unsorted. ** Only a small amount of the tests were implmented before the conversion to C. ** Running the program from the command-line gave it a very slow output. ** Conclusion: perl is slow, if you want your application to be as fast as possible, code it in C from the first stage. * Converting to C: ** Wrote the functions card_user2perl card_perl2user etc. in C (show the card.c module) ** For representing a board used the following scheme - include some code of state.h - Note that non-vacant cards in stacks have to be set to the empty card so the states can be memcmp'ed. ** I decided to use memcmp for comparing states. It's faster than a stack by stack, card by card comparison of states. ** For getting the state information I wrote some macros: stack_card_card_num, stack_card_deck() etc. I figured out that they will be used a lot in solve_for_state so writing them as functions will consume a lot of CPU. Thus, they were written as macros. ** In order to avoid a case where two states which differ only with their order of the stacks and/or freecells were generated, the states were "canonized": - The stacks were qsorted according to their bottommost card. - The freecells were qsorted according to their card. That way, the possibility is avoided. - Downside: the order of the stacks and freecells is not preserved. Show the code of canonize_state, card_compare, stack_compare ** The prev_states were stored in a sorted array, with an unsorted sort-margin. - bsearch() was used to search for the current state inside the sorted array. - lfind() was used to search for it inside the sort margin. - A state which had not been previously encountered was placed at the end of the sort margin. - Once the sort margin grew beyond PREV_STATES_SORT_MARGIN the array was qsorted(), and the length of the sort margin was set to 0. - When the array exceeds its allocated size it is realloc()ed. - PREV_STATES_GROW_BY was defined as 256 while PREV_STATES_SORT_MARGIN was defined as 16. Both could be defined to a different value by the user. ** The various tests conducted: (Enumerate the tests) ** Two tests are special: moving a freecell or stack card to the foundations. Moving a card to the founds where both of the founds of the other colour are higher than its card number plus 2. In that case if the child-state is not solvable neither is the original state. ** Note that the entire checking and adding code for new states was implemented inside the tests code. (not in a separate function). - Show the code of freecell.c * Testing the final code: ** I generated 1000 test boards. (unfortunately not in a reproducible manner) and ran the function of them. ** Most of the boards ran fine. Some took too long to solve (several dozen of minutes). ** Out of which, some were terminated in the middle. ** Some boards were reported by the program as unsolvable. (as I discovered later, they were all solvable). ** I placed Freecell Solver version 0.2 on a web-site, and posted an announcement for it on Freshmeat. >From version 0.2 to 0.4 ----------------------- * Changes I had in mind: ** The states are quite large in memory size. Reduce their size by using bytes for their representation. ** Put the "check and add state" code in a separate function, so duplicate code will be avoided. * Implementing COMPACT_STATES: ** Describe: - card is a char - low half is card num upper half is deck - deck is a char - stack counter is a char ** Because of the macros I used to access and modify the state's data, I was able to rewrite the macros as COMPACT_STATES ones and have most of the work done for me. ** I decided to include a compile-time directive to choose between the COMPACT_STATES and the FAST_STATES. ** state_as_string required a thorough modification because it did not use the macros in the first place. ** I noticed that COMPACT_STATES were much faster than the so called FAST_STATES. As some of you remember I posted a message to linux-il asking why. http://plasma-gate.weizmann.ac.il/Linux/maillists/00/04/msg00356.html The answers I received were: 1. Less memory means less cache misses, thus faster running. 2. Operations on chars don't take more time than operations on 16-bit or 32-bit integers. 3. The Pentium cache is 8-bits wide. To me, #1 seems the most likely. ** Thus, I decided to rename FAST_STATES as DEBUG_STATES because they are easier to debug using gdb and friends. ** Conclusion: use as little bit-space as you need, and use a lot of typedefs and macros to manage them. * Implementing check_and_add_state as a separate function: ** I took a look at the code and realized that the two distinct tests are going to cause me problems. ** I implemented the function check_and_add_state(), which: 1. Accepts a pointer to the new state. 2. Checks if it is already present in prev_states. 3. If so, returns STATE_ALREADY_EXISTS 4. If not, calls solve_for_state on it. 5. If the inner solve_for_state succeeeds returns STATE_WAS_SOLVED to solve_for_state. 6. If it didn't return STATE_IS_NOT_SOLVEABLE to solve_for_state. - Show the code of check_and_add_state. ** I then coded four macros: 1. sfs_check_state_init 2. sfs_check_state_finish 3. sfs_check_state_begin 4. sfs_check_state_end All the tests excepts the two NSFOS (not solvable for original state) used sfs_check_state_begin and sfs_check_state_end. The two NSFOS states used just sfs_check_state_init and sfs_check_state_finish. - Show the code of the macros. ** At the end, the routines at the end of the tests, only had the code that requires to perform the moves on the new state. All the other code was written inside the macros and the function check_and_add_state. ** Conclusion: Avoid duplicate code, use functions and macros to encapsulate it. * Switching from qsort to merge: ** In my "Introduction to Data Structures and Algorithms" class I learned that Quick Sort has a minimal complexity of n*log(n) and a maximal complexity of n^2. Plus, when the array is sorted maximal complexity is achieved. In FCS, the array of prev_states is mostly sorted, except for the sort margin at the end. Thus, it will rougly take Omega(n^2) to sort it. ** Since the lion's share of the array is already sorted, we can use a merge function (which I also studied in class) to sort it. Merge works on two sorted arrays, so we need to sort the sort margin before that, or keep it sorted too. Merge takes O(n) to perform. ** I decided to keep the sort margin sorted, and place it separately from the main prev_states array. That way, I can allocate an extra place at the end of the main array for the merging. ** By performing the merge from the end I was able to merge without allocating a new array and then copying it to the original one. ** I figured that since the main array will usually be considerably larger than the sort margin, then an item-by-item check will be wasteful. Thus, I decided to base it on binary searches. ** The bsearch() function was inadequate because it doesn't the place in which an unfound item should be inserted. Thus, I had to write one of my own. - show the code of SFO_bsearch ** Based on SFO_bsearch I wrote SFO_merge_large_and_small_sorted_arrays. - Show the code. ** SFO_bsearch and SFO_merge_large_and_small_sorted_arrays were originally written in perl, and then translated to C. This time, writing them in perl first was a good idea, because it reduced the development time. Remember that the running time for the testing was not that critical. ** SFO_bsearch also allowed me to add a new state at the appropriate place in the sort margin. ** I rewrote check_and_add_state to use the merge. - Show the code in freecell.c ** Benchmarks: For 50 test boards: - FCS 0.2 takes 1 minute and 11 seconds to solve them. - FCS 0.4.1 takes 14 seconds A factor of 5 improvement. * Implementing INDIRECT_STATE_STORAGE ** Motivation: I noticed that when I ran FCS on Windows NT 4.0 Workstation, with DEBUG_STATES, I ran out of stack space. The reason: solve_for_state accepts the entire state data structure as a function argument, and the DFS depth of that board was considerable. Therefore, I decided to implement a scheme in which only pointers to the states will be passed to solve_for_state. ** I decided to replace prev_states with indirect_prev_states which would be an array of pointers to states. ** Because of the algorithm, only the last allocated state needs to be released. ** At first, I tried to implement the actual states, to which the pointers will point as a stack, that will be realloced. Then I realized, that because of the realloc() call, the physical location of the states will change. ** In Linux, at least on the kernel level, the memory is allocated in powers of 2. Thus, it would be a waste if every state were allocated separately. I decided to allocate the states in chunks that fit a 64KB boundary. I called these chunks packs. ** I wrote the state_ia (state indirect allocation) routines. - show the fcs_isa.c file. ** Note that there is a compile-time option to compile with either the DIRECT_STATES_STORAGE mechanism or the INDIRECT_STATE_STORAGE. ** check_and_add_state and the sfs_check_state_ macros were adapted to allocate the states indirectly. ** Some macros make sure that "state", "ptr_state", "new_state" and "ptr_new_state" are all available in the code scope. ** Benchmarks: For 100 test boards on a Pentium 16 with 48 MB of RAM: Debug Direct: 44 minutes and 37 seconds Debug Indirect: 8 minutes and 10 seconds Compact Direct: 7 minutes and 7 seconds Compact Indirect: 1 minutes and 54 seconds As you can see, the indirect state management improves the running time by a big factor in both cases. ** Conclusions: - When passing structs as arguments to functions always do that by a a pointer. - Never sort, search in, etc with a direct vector of structs. Instead use an array of pointers. * Usability features: ** modified initial_user_state_to_c so it will also accept states in the middle of play: 1. With stacks of variable length. 2. With filled freecells. 3. With non-initial decks. - Possibly show the source, although it's just an ugly C parsing code. ** Wrote a function check_state_validity that checks that all cards are present in an initial board once and only once. - When I used the program I noticed that I had many mistakes when trying to manually input boards. * I placed Freecell Solver 0.4 on its web-site and posted an announcement for it on Freshmeat. >From version 0.4 to 0.6 ------------------------ * I randomized a 1000 board and noticed that FCS could not solve all of them. I decided to investigate and after patching GNOME Freecell so it will input a board from a file, I realized that I forgot to add a test that will move a card to a parent card on the same stack. ** I added the extra move and afterwards it managed to solve all of them. * Moving the global variables to an instance strucutre. ** FCS made extensive use of global variables. But I wanted it also to be available as a library sooner or later. Now, if only one instance of the library can be run we are set. But what about multi-threaded applications? ** I decided to create an instance structure that will be passed as the first parameters to solve_for_state() and check_and_add_state(). That way, more than one simultaenous instances can be run at any given time. ** I moved prev_states, indirect_prev_states, num_prev_states, etc. to the instance variable. Gradually, I eventually moved all the variables their. ** I created functions to initialize and terminate an instance. (show the code of freecell.c) ** Conclusion: Global variables are wrong because they do not go well with multi-threading. In C++, use objects. In ANSI C, create an instance struct that will hold all the "global" variables of the function, and create functions to instantiate it. * Remembering the stack and freecell order. ** Because Freecell Solver "canonizes" the order of the stacks and freecells they are not preserved throught the solution that is displayed to the user. (You can give an example). ** This is a big usability flaw. I wanted to correct it. ** The reason for the canonization is that I wanted that states with a different ordering of the stacks or freecells will not be generated. ** If I place an ordering integer inside each stack and freecell, I won't be able to memcmp() two states. ** But what if I put them outside the main state sturcture? ** Solution: I created a structure called state_with_locations_t that: 1. Contain the state as its first element. (So its pointers can still be memcmp()ed) 2. Contain 8 integers that specify the order of the stacks. 3. Contain 4 integers that specify the order of the freecells. ** I had to modify canonize_state() that it will swap the order specifiers as well as the stacks or freecells themselves. I could not do that with C's qsort() so I wrote my own insertion-sort routine. (Show the code of canonize_state() in state.h) ** I also had to modify the code itself to access the s element of the state_with_locations stucture. I did some of it using macros. ** I also had to modify the state_ia routines to manage state_with_locations instead of regular states. ** state_as_string was modified to make use of the new locations. ** Eventually, the order of the states was preserved. ** Conculsion: The user is always right. If you make some software design considerations with possible implications for the user, always make sure the user does not notice. * Adding the fcs_ prefix. ** Despite moving the global variables to the instance struct, FCS still have a lot of functions. But names like state_as_string may might as well be names of user's routines. Thus, FCS may interfere with the main program of the user. ** Solution: Add a "fcs_" or "freecell_solver" prefixes to the names of the global functions. That way, they won't collide with names of other functions, in case Freecell Solver is compiled as a library. ** Conclusion: Avoid namespace collision. In C++ use the namespaces facility, In C append prefixes to the functions you create. * Each test in a separate function. ** To enumerate the different moves possible on each state, solve_for_state performs several tests. Those tests include: 1. Moving a top card to the foundations. 2. Moving a card to a parent card. 3. Moving a non-top card to the foundations. 4. Moving a sequence of cards to a different parent. 5. Emptying a stack into the freecells. ** Those tests can be done in any order whatsoever. It's hard to know in advance what will be the order which will solve a state more quickly. ** Solution: Let the user specify a test order from the command line. ** Implementation: The code of solve_for_state was split into 10 test functions and one main function. Each test function implemented one of the tests and called check_and_add_state(). ** The solve_for_state() function holds ten pointers to test functions, and executes them one by one according to a test order which is specified in the instance struct. ** The instance can also specify the number of tests to conduct, thus enabling the user to omit some of the tests. ** Results: - For every board there's usually a test order that can solve it very fast. For other orders it usually get stuck and doesn't finish very quickly. - For example, running FCS 0.6.2 with a limit of 150000 boards with the following test orders: 0123456789: ----------- Duration: 19:25 minutes Number of unsolved boards: 19 0124567839: ----------- Duration: 14:27 minutes Number of unsolved boards: 12 0123467: -------- Duration: 2:50 minutes Number of unsolved boards: 5 (some of them after a complete scan) ** Conclusion: If several things can be done with any given order, make sure they are implemented as separate routines. You may wish to play with the order and see which one gives the optimal results. * Interface separation and code cleaning. ** Like I said, I wanted FCS to be also available as a library. ** I placed the initialization, solution and de-allocation routines in their own functions, instead of inside main. ** The following functions were created: - freecell_solver_alloc_instance - freecell_solver_free_instance - freecell_solver_init_instance - freecell_solver_free_instance