This is the second and last post about my solution to the Traveling Santa Kaggle competition.
If we have a small set of cities, rewiring the connections of the two paths between them to find the optimal configuration is a straightforward idea: just try all possibilities!
In theory this tool is at least as powerful as k-opt, it’s very flexible as different heuristics could be used to select the set of cities to rewire, and compared to the other methods I’ve used in the competition it allows optimizing both paths at once.
Finally, the method is easy to generalize to work when the set of cities includes path endpoints.
The obvious problem with trying all combinations is the time it takes: For N cities, the time complexity for each of the two paths may be O(N!).
In practice, this method was feasible for up to about 20 cities in my implementation and the number could be likely improved by using a few more tricks.
Imagine we select a set of cities C. They could be anywhere, but one obvious heuristic is to choose cities close to each other. In the picture there are 9 cities inside a square area. The square around them is to make it clear which edges exit the area, but it’s not used in practice: All that matters is whether the neighbors of a city are also in the set C or outside.
I chose to only rewire edges between pairs of cities in C. Another approach could have been to rewire all edges adjacent to at leas one city in C.
We optimize one path at the time. After the algorithm produces a rewiring for path 1, all possible rewirings for path 2 are evaluated, excluding edges that were already taken by path 1. Because of the exhaustive nature of the search, all possibilities for both paths will be evaluated, regardless of which path is looked at first.
The numbers in the picture are the order in which the cities are found in the current path.
Walking along the path from the start, city 1 is the first city in C that we encounter. From here the path moves to city 2, also in C, then to a city outside C. From there, the path may meander among multiple cities not in C, until it re-enters C at city 3.
Cities can be classified as inside, incoming or outcoming depending on which of their neighbors are in C. The first city is always incoming and the last always outcoming.
In this example, N=9 but the complexity is much less than 9! for one path multiplied by 9! for the other path (that number would be 131,681,894,400 and it would take too long to explore that many possibilities).
First of all, we can exclude city 9 as both of its edges go to cities outside C and we can’t rewire any of them. City 9 can still be considered while rewiring the other path if in that path at least one of the edges are a connection inside C.
We are changing edges, not cities, so for N cities there are at most N – 1 edges (this corresponds to the case where all cities except the first and last ones are of the inside type). Once we decide how to rewire the first N-2 edges, the last edge is forced: It can only go to the last city as the path needs to continue to exit C from there. The first edge will always start from the first city and the last edge always end in the last one.
When the path exits and then re-enters C at a city pair (cities 2-3 and 6-7 in the figure) there are no edges internal to C between the pair, so the only way these city pairs contribute to the algorithm complexity is by multiplying the total number of combination by 2: Considering for example the 2-3 pair in the figure, either the path exits C at 2 and re-enters at 3, or it exits at 3 and re-enters at 2 (this corresponds to inverting that segment of the path).
Due to these considerations, the number of all possible edge rewiring permutations for these 9 cities is only 96 cases (= 4! * 22), significantly less than 9! (= 362,880). The second path may be more or less complex, but on average even less combinations will be allowed because the edges already used by path 1 are off-limits.
In k-opt, a rule based on triangle inequalities is used to avoid considering most of the possibilities and greatly speed up the algorithm. To keep the computation time reasonable as N increases, a similar rule is needed. The one I implemented consists in estimating the remaining length of the edges not already chosen by multiplying their number by the minimum possible edge distance. If this minimum remaining length added to the length so far is longer than the total length of the original edges, we can skip enumerating all combinations for the remaining edges because none of them can result in an improvement. This simple rule excludes early un-optimal paths that keep jumping between far away cities and it speeds up the algorithm by an order of magnitude. In hindsight, more precise rules could have been used instead of just using the minimum edge length, as even if the computation is expensive it would likely be repaid by pruning the search tree earlier. Another small code optimization was pre-computing a table of distances between all possible pairs of cities.
Multithreading
To further speed up the algorithm, I used a worker thread for each processor core. The main thread would create a queue and fill it in with all cities, randomly shuffled. Each worker thread run in a loop (at low priority, to avoid slowing down the main thread and to keep the machine responsive), extracting cities from the queue and trying to rewire their neighborhood. If an improvement was found, it would send the corresponding edge permutation to the main thread. Only the main thread could apply the permutation to the solution (an operation that is very quick compared to looking for the best permutation) to keep the solution consistent. Occasionally a permutation could not be applied anymore due to a recent change in the solution that involved some of the same cities; In this case, the main thread would send the city back to the worker thread for re-evaluation. Finally, the main thread would also occasionally run the other optimizations like 2-opt to help improvements propagate faster across the cities, it rebalanced the paths to have the same length and it kept saving the best solution to a file.
Optimizing the sum of path lengths
Looking at the edge permutations found by the algorithm, a very common case was that the length of one path increased by L and the length of the other decreased by exactly the same amount L.
This is useful as these moves allow to rebalance the length of the path. The goal can then be changed to minimizing the sum of path lengths. Even if the length of one of the paths increases, we can later use rebalancing moves to spread the overall improvement to both paths in equal measure.
To this end, the worker threads also added all rebalancing moves found to a sorted balanced tree. When the main thread needed to rebalance the paths because one of the two paths was longer than the other by L, it could look into the table to find a rebalancing move that decreased its length by almost exactly L/2 (and increased the other path’s length by L/2).
To improve the sum of path lengths, the exploration for edge permutations needs to allow one of the two paths to increase in length, but what allowed to greatly improve the execution speed was exactly the opposite, i.e. the heuristic based on disallowing length increases. To solve this, the algorithm runs twice. The first time path 1 is not allowed to increase in length. If the length of path 1 decreases by L, when exploring path 2 it is allowed to increase in length, but only up to L. Once this search completes, it is tried again after inverting the order of the two paths. Path 2 is now the one that cannot increase in length. Overall, all possibilities are explored but the slow down is only a factor of 2 instead of giving back the order of magnitude improvement that comes from not using limits on the path lengths.
Choosing the set of cities
The first way I tried to come up with sets of cities C to rewire was to recursively split the area in rectangles until each rectangle contained N cities. Using N = 15 or N = 16 resulted in rapid improvements in the solution score (for example in a test with N = 15 it took 4 hours and 4 minutes to reduce the length of the initial solution from 6,842,237 to 6,799,832 by running on a single core). Edges between the rectangles were not rewired, so later I moved to a more expensive search, looking at the neighborhood of each of the 150,000 cities (with N = 20, it would take about 24 hours to explore 150,000 neighborhoods on a 4 core machine with hyper-threading). I tried various ideas of defining “neighborhoods”, each further improving the score. Some examples for choosing the neighbors:
- Geometric neighbors according to various shapes (circles, squares, rectangles of different shapes and orientations).
- Focus on the endpoints with a deeper search.
- Look at the longest edges, selecting cities that ere neighbors of the two cities that define the edge.
- Neighbors according to the Delauney triangulation graph (breadth-first search on the graph). This gave different results than geometric neighbors especially in areas with non-homogeneous city densities.
- Cities preceding and following the given city along the two paths.
Using fixed groups of N cities is not too efficient. Most groups don’t contain enough edges in the path to allow exploration of meaningful permutations. Another problem is that there are degenerate cases where there are lots of path edges in the group and most edges have comparable lengths: the pruning heuristic becomes ineffective in this case and a significant amount of time is spent enumerating permutations of just a few of these city sets.
To avoid both problems, I moved to dynamic number of cities for each rewiring problem, starting with a fixed number of cities (typically 17-19 and up to 21) in each rewiring problem, but then adjusting their number: Increasing the number of cities if there were too few path edges and canceling computations that took more than 1 or 2 minutes to try them again with a reduced number of cities.
Double bridge move
Concerned that the rewiring code was not exploring distant groups of cities, I implemented a simple optimization that could cover at least a few of those cases, sometimes known as “double bridge”. The idea is to identify a 2-opt move that improves the solution but was discarded because it would split the path in two parts, a shorter path and a circuit. If a second, allowed or disallowed, 2-opt mode is found between the circuit and the path, it would join together again the two pieces and result in a single path that may be shorter than the initial one.
By this time the solution was already quite optimized and the improvement from adding this optimization was small.