Recursive Goose

Sunday, April 28, 2013

Rewiring Santa

This is the second and last post about my solution to the Traveling Santa Kaggle competition.

If we have a small set of cities, rewiring the connections of the two paths between them to find the optimal configuration is a straightforward idea: just try all possibilities!

In theory this tool is at least as powerful as k-opt, it’s very flexible as different heuristics could be used to select the set of cities to rewire, and compared to the other methods I’ve used in the competition it allows optimizing both paths at once.
Finally, the method is easy to generalize to work when the set of cities includes path endpoints.

The obvious problem with trying all combinations is the time it takes: For N cities, the time complexity for each of the two paths may be O(N!).

In practice, this method was feasible for up to about 20 cities in my implementation and the number could be likely improved by using a few more tricks.

Imagine we select a set of cities C. They could be anywhere, but one obvious heuristic is to choose cities close to each other. In the picture there are 9 cities inside a square area. The square around them is to make it clear which edges exit the area, but it’s not used in practice: All that matters is whether the neighbors of a city are also in the set C or outside.

I chose to only rewire edges between pairs of cities in C. Another approach could have been to rewire all edges adjacent to at leas one city in C.

We optimize one path at the time. After the algorithm produces a rewiring for path 1, all possible rewirings for path 2 are evaluated, excluding edges that were already taken by path 1. Because of the exhaustive nature of the search, all possibilities for both paths will be evaluated, regardless of which path is looked at first.

The numbers in the picture are the order in which the cities are found in the current path.
Walking along the path from the start, city 1 is the first city in C that we encounter. From here the path moves to city 2, also in C, then to a city outside C. From there, the path may meander among multiple cities not in C, until it re-enters C at city 3.

Cities can be classified as inside, incoming or outcoming depending on which of their neighbors are in C. The first city is always incoming and the last always outcoming.

In this example, N=9 but the complexity is much less than 9! for one path multiplied by 9! for the other path (that number would be 131,681,894,400 and it would take too long to explore that many possibilities).

First of all, we can exclude city 9 as both of its edges go to cities outside C and we can’t rewire any of them. City 9 can still be considered while rewiring the other path if in that path at least one of the edges are a connection inside C.

We are changing edges, not cities, so for N cities there are at most N – 1 edges (this corresponds to the case where all cities except the first and last ones are of the inside type). Once we decide how to rewire the first N-2 edges, the last edge is forced: It can only go to the last city as the path needs to continue to exit C from there. The first edge will always start from the first city and the last edge always end in the last one.

When the path exits and then re-enters C at a city pair (cities 2-3 and 6-7 in the figure) there are no edges internal to C between the pair, so the only way these city pairs contribute to the algorithm complexity is by multiplying the total number of combination by 2: Considering for example the 2-3 pair in the figure, either the path exits C at 2 and re-enters at 3, or it exits at 3 and re-enters at 2 (this corresponds to inverting that segment of the path).

Due to these considerations, the number of all possible edge rewiring permutations for these 9 cities is only 96 cases (= 4! * 2²), significantly less than 9! (= 362,880). The second path may be more or less complex, but on average even less combinations will be allowed because the edges already used by path 1 are off-limits.

In k-opt, a rule based on triangle inequalities is used to avoid considering most of the possibilities and greatly speed up the algorithm. To keep the computation time reasonable as N increases, a similar rule is needed. The one I implemented consists in estimating the remaining length of the edges not already chosen by multiplying their number by the minimum possible edge distance. If this minimum remaining length added to the length so far is longer than the total length of the original edges, we can skip enumerating all combinations for the remaining edges because none of them can result in an improvement. This simple rule excludes early un-optimal paths that keep jumping between far away cities and it speeds up the algorithm by an order of magnitude. In hindsight, more precise rules could have been used instead of just using the minimum edge length, as even if the computation is expensive it would likely be repaid by pruning the search tree earlier. Another small code optimization was pre-computing a table of distances between all possible pairs of cities.

Multithreading

To further speed up the algorithm, I used a worker thread for each processor core. The main thread would create a queue and fill it in with all cities, randomly shuffled. Each worker thread run in a loop (at low priority, to avoid slowing down the main thread and to keep the machine responsive), extracting cities from the queue and trying to rewire their neighborhood. If an improvement was found, it would send the corresponding edge permutation to the main thread. Only the main thread could apply the permutation to the solution (an operation that is very quick compared to looking for the best permutation) to keep the solution consistent. Occasionally a permutation could not be applied anymore due to a recent change in the solution that involved some of the same cities; In this case, the main thread would send the city back to the worker thread for re-evaluation. Finally, the main thread would also occasionally run the other optimizations like 2-opt to help improvements propagate faster across the cities, it rebalanced the paths to have the same length and it kept saving the best solution to a file.

Optimizing the sum of path lengths

Looking at the edge permutations found by the algorithm, a very common case was that the length of one path increased by L and the length of the other decreased by exactly the same amount L.

This is useful as these moves allow to rebalance the length of the path. The goal can then be changed to minimizing the sum of path lengths. Even if the length of one of the paths increases, we can later use rebalancing moves to spread the overall improvement to both paths in equal measure.

To this end, the worker threads also added all rebalancing moves found to a sorted balanced tree. When the main thread needed to rebalance the paths because one of the two paths was longer than the other by L, it could look into the table to find a rebalancing move that decreased its length by almost exactly L/2 (and increased the other path’s length by L/2).

To improve the sum of path lengths, the exploration for edge permutations needs to allow one of the two paths to increase in length, but what allowed to greatly improve the execution speed was exactly the opposite, i.e. the heuristic based on disallowing length increases. To solve this, the algorithm runs twice. The first time path 1 is not allowed to increase in length. If the length of path 1 decreases by L, when exploring path 2 it is allowed to increase in length, but only up to L. Once this search completes, it is tried again after inverting the order of the two paths. Path 2 is now the one that cannot increase in length. Overall, all possibilities are explored but the slow down is only a factor of 2 instead of giving back the order of magnitude improvement that comes from not using limits on the path lengths.

Choosing the set of cities

The first way I tried to come up with sets of cities C to rewire was to recursively split the area in rectangles until each rectangle contained N cities. Using N = 15 or N = 16 resulted in rapid improvements in the solution score (for example in a test with N = 15 it took 4 hours and 4 minutes to reduce the length of the initial solution from 6,842,237 to 6,799,832 by running on a single core). Edges between the rectangles were not rewired, so later I moved to a more expensive search, looking at the neighborhood of each of the 150,000 cities (with N = 20, it would take about 24 hours to explore 150,000 neighborhoods on a 4 core machine with hyper-threading). I tried various ideas of defining “neighborhoods”, each further improving the score. Some examples for choosing the neighbors:

Geometric neighbors according to various shapes (circles, squares, rectangles of different shapes and orientations).
Focus on the endpoints with a deeper search.
Look at the longest edges, selecting cities that ere neighbors of the two cities that define the edge.
Neighbors according to the Delauney triangulation graph (breadth-first search on the graph). This gave different results than geometric neighbors especially in areas with non-homogeneous city densities.
Cities preceding and following the given city along the two paths.

Using fixed groups of N cities is not too efficient. Most groups don’t contain enough edges in the path to allow exploration of meaningful permutations. Another problem is that there are degenerate cases where there are lots of path edges in the group and most edges have comparable lengths: the pruning heuristic becomes ineffective in this case and a significant amount of time is spent enumerating permutations of just a few of these city sets.

To avoid both problems, I moved to dynamic number of cities for each rewiring problem, starting with a fixed number of cities (typically 17-19 and up to 21) in each rewiring problem, but then adjusting their number: Increasing the number of cities if there were too few path edges and canceling computations that took more than 1 or 2 minutes to try them again with a reduced number of cities.

Double bridge move

Concerned that the rewiring code was not exploring distant groups of cities, I implemented a simple optimization that could cover at least a few of those cases, sometimes known as “double bridge”. The idea is to identify a 2-opt move that improves the solution but was discarded because it would split the path in two parts, a shorter path and a circuit. If a second, allowed or disallowed, 2-opt mode is found between the circuit and the path, it would join together again the two pieces and result in a single path that may be shorter than the initial one.
By this time the solution was already quite optimized and the improvement from adding this optimization was small.

Monday, February 4, 2013

Views from the Dark Worlds

I’m deleting the gigabytes of uncompressed visualizations I produced during the Observing Dark Worlds competition on Kaggle.com to help to understand the problem, and compressing the few images I’ll keep.

The competition consisted in finding the center of dark matter halos from detecting how they affected the ellipticities of galaxies between us and the halos. Observed galaxies have random ellipticities when no halo is present and this additive noise complicates the problem.

Here is a typical picture, and one of the simplest visualizations, showing the amount of tangential ellipticity for every position in sky 58:

Other images show how well a particular model fits for each pixel of a high resolution image. This means that the optimizer has to evaluate the model multiple times for each pixel and the tangential and cross-tangential ellipticities of all galaxies also have to be re-computed for each pixel (720,000,000 evaluations for a 1200 x 1200 image and a sky with 500 galaxies).

This was quite slow initially, with most of the time spent evaluating the trigonometric functions used to compute the two ellipticity components. Changing the number representation from double to float gave a small improvement, but it was still slow. There are two related functions involved:

Cos2Atan

This seemed like a good excuse to try out Mathematica’s MiniMaxApproximation function in the NumericalMath`Approximations` package.

After some experiments, I ended up with a formula approximating these functions within about 0.001 by using a 4th degree polynomial and the following code:

double Cos2ArcTanApprox(double x)
{
return 0.9989279587792748 + x * (0.06351399783942091 + x * (- 2.6025942588287068 + x * (1.9881034919631744 - x * 0.44697660682523027)));
}

void SinCos2ArcTan(double xHalo, double yHalo, GalaxyInfo& info, double *pCos, double *pSin)
{
double dx = info.x - xHalo;
double dy = info.y - yHalo;

    double adx = fabs(dx);
    double ady = fabs(dy);
    double c;
    if (adx > ady)
    {
        c = Cos2ArcTanApprox(ady / adx);
    }
    else
    {
        c = -Cos2ArcTanApprox(adx / ady);
    }

    double s = sqrt(1 - c * c);
    if (dx * dy < 0)
        s = -s;

*pCos = c;
*pSin = s;
}

This method was twice as fast compared to calling sin(), cos() and atan(), and together with multithreading allowed to get fresh visualizations for all skies of the ideas I explored in a reasonable amount of time.

To find the positions of the halos, I tried a number of variants of the following method:

Use a model where the halo adds to the tangential ellipticity of a galaxy in the following measure, depending on their mutual distance d:

The model has two parameters A and B (halo intensity and scale) and I also got good results by using a single parameter, i.e. by using a formula inferred from the training data that roughly linked the values of A and B.
Optionally model strong lensing effects, where the tangential ellipticity of a galaxy very close to a halo may be strongly negative rather than strongly positive.
Find the best values for the (x, y, A, B) parameters for a halo by starting with fixed reasonable values and optimizing with the Nelder-Mead method.
The function to maximize is the likelihood of the parameters, assuming that halos have an additive effects on tangential ellipticities and that when there are no halos the ellipticities have a random gaussian distribution with zero mean and standard deviation of 0.21 (it can be measured in the single-halo training skies).
Both tangential and cross-tangential ellipticities are used.
To avoid local minima close to isolated galaxies with strong ellipticity, start the process multiple times from each of the intersections of a 10 x 10 grid.
After a halo position, intensity and scale are found, subtract its effects from the ellipticities of all galaxies and iterate to find more halos.

The next picture shows the likelihood function for sky 107. A strong halo is visible in the left image. After halo subtraction and re-scaling, a second weaker halo becomes visible (right image):

I tried different models for the decay of the halo effect, but multiple functions gave similar results and noise made it hard to compare them, so I just chose the simplest formula that gave good results (the one shown above). In any case, for the single halo skies the exact function didn’t seem to matter much, provided that it was not too different from the function above.

When subtracting the halos, the ability to correctly identify more halos was affected by how close the formula came to the actual (unknown) model. To avoid having to consider a wide variety of formulas, I created a model that can approximate many curves: take the logarithm of both the ellipticities and distances and find the 3rd degree polynomial that best interpolates the transformed tangential ellipticities for all galaxies.

When looking for (x, y), the simple model with just two parameters (A and B) was used, otherwise with a polynomial the optimizer ended up overfitting. The polynomial fit was only done after fixing the halo coordinates. With four model parameters instead of two, it could approximate more closely a variety of possible models, hopefully improving the quality of the data after the halo subtraction.

The position found for the first halo is slightly skewed by the presence of other halos, but attempts to simultaneously optimize the positions of all the halos resulted in overfitting in my experiments.

Trying an elliptical simmetry for the halo effects also resulted in overfitting, although it might have been a good starting point to detect cases where there are two halos close to each other.

Ending on a lighter note, here are two more visualization that go in a mildly more artistic direction.
The first one shows the contribution of each galaxy to the tangential ellipticity (each circle is centered on a galaxy):

The visualization for a failed model inspired my six year old daughter to contribute Smile :

Sunday, January 20, 2013

Traveling Santa

Kaggle.com hosted a fun coding competition called Traveling Santa.

The goal is to solve a given traveling salesman (TSP) instance with 150,000 cities and some additional rules:

The solution is a path rather than a cycle (like it’s more common for TSP).
The problem is to find not just one, but two paths and they must have no edge in common. The measure to minimize is the length of the longest of the two paths.

The cities are provided as coordinates in the 2D plane.

When plotted, the 150,000 points display a drawing of Santa (originally by Thomas Nast).

Results

Here is the first path from my final submission, which scored 6,547,910 and placed 5^th in the competition:

You may notice the path crossing itself. This would be a non-optimal TSP solution, but in this problem having to accommodate two paths makes everything a bit more complex and even optimal (pairs of) paths can contain crossings.

The winning team posted a solution of length 6,526,972. They also solved a relaxed problem that should give a lower bound for the optimal solution and got a value of 6,525,446. This means my solution is from 0.32% to 0.34% worse than optimal. With a margin of just 0.02%, it’s amazing how close the winning team got to the optimum!

First implementation

As the fun of solving a puzzle and coding are more important for me than trying to get a good placement, I reserved the first couple of days to just brainstorming and developing some intuitive ideas without looking at the literature. I had studied the linear programming formulation of TSP at the university, but decided to try a different approach, especially considering the Euclidean nature of this instance.

2-opt

The first two obvious ideas were to divide the area in smaller squares and use nearest neighbor to build the paths in each of them. I then removed crossings in the paths (where the uncrossing move was not blocked by the other path) and “cut corners”, i.e. cutting a city out of a corner in the path and rather moving it to split another closer nearby edge. This gave solutions with a score around 7,000,000 and getting two paths of comparable lengths required much tweaking.

Corner opt

Unless the paths have very long edges, these optimizations can only involve cities that are close to each other, so I used a pre-computed table listing for each city the 200 (later increased to 1200) closest cities and their distances. To track which edges were not available because they belonged to the other path, I just used a hash table.

I then did a bit of online reading, finding I had implemented 2-opt and 2.5-opt.I found a very good TSP tutorial by David S. Johnson and Lyle A. McGeoch at http://www2.research.att.com/~dsj/papers/TSPchapter.pdf. It has clear and complete explanations and a wealth of experimental results. The paper claims that the state of the art is the Lin-Kernighan algorithm (LK) and variants. This paper also discouraged me from trying a simulated annealing approach (slower than LK in their experiments) and showed a further 2-opt code optimization that allowed my implementation to complete in a fraction of a second.

Final implementation

To generate the initial paths I replaced nearest neighbor with a greedy algorithm where the two paths alternate in choosing the next shortest available edge. Some randomness was also added to the choice.

It may seem that always choosing the shortest edge should lead to a good solution, but the algorithm proceeds creating many disjoint paths and most short edges need to be discarded as they would create forks or cycles… at some point the multiple segments of the same path grow blocking each other, plus many good choices are blocked by the other path and the algorithm ends up adding some very long edges.

The greedy initial paths have comparable lengths of about 7,645,000 and are nice starting points for local optimizations. Alternating 2-opt on the two paths until no further move was possible improved the solution to a score of about 6,890,0000. This animation shows one of the two paths created by the greedy algorithm before and after 2-opt:

Up to this point, the total execution time is 16 seconds (single core, but excluding the long time it may take to pre-compute the table of neighboring cities), 6 seconds for the greedy algorithm and a total of 10 seconds for multiple alternating 2-opt passes.

Adding 2.5-opt increased the run time by 11 minutes but only improved the solution by about 50,000. This pattern continued the following days: each additional optimization required more and more computation time while providing diminishing returns.

To improve my result, I tried random perturbations of the city coordinates, using 2 and 2.5 opt to change the paths, then restoring the original coordinates and using the same optimizations again, in the hope that this would help untangle the two paths. This improved the solution, but was too slow.

I then considered looking closer at areas where 2-opt moves were blocked by the other path or implementing 3-opt, but in the end I decided for a simpler and more general mechanism: given a set of cities, remove all edges between them and exhaustively try all possible ways of rewiring them.

In the next post, I’ll describe this rewiring mechanism.