After watching a Numberphile video on "awkward primes" I spent a month building an exact solver for a problem that's deceptively simple to state and genuinely hard to certify.
THE PROBLEM
Plot the first N primes as (index, value) points: (1, 2), (2, 3), (3, 5), (4, 7), and so on. What is the minimum number of straight lines needed to pass through every point? Call that minimum f(N).
Finding a good solution isn't hard — many primes happen to land on lines you've already drawn. The hard part is proving no solution with fewer lines exists. That's an NP-complete weighted set cover problem. The candidate lines grow quadratically in N, and the search space explodes exponentially.
The previous state of the art was N=861, certified by Max Alekseyev (GWU) using CPLEX — an industrial MIP solver — running for 282 hours. Extending it further was considered computationally intractable with that approach.
WHY GENERIC MIP SOLVERS HIT A WALL
The problem has deep arithmetic structure that a general-purpose solver is blind to. A line through two prime-indexed points (i, pᵢ) and (j, pⱼ) has slope (pⱼ − pᵢ)/(j − i). Whether a third point (k, pₖ) lies on that line is a divisibility condition: (pₖ − pᵢ)(j − i) = (pⱼ − pᵢ)(k − i). MIP treats this as a numerical constraint; an arithmetic-aware solver can exploit it structurally.
THE APPROACH
The solver is an incremental exact algorithm — it processes primes one at a time and certifies f(N) at each step. Several ideas combine to make it fast:
Bitmask representation. There are 12,162 "heavy lines" — lines passing through 3 or more of the first 1024 primes. Each is stored as a 1024-bit bitmask of which primes it covers. The full working set fits L2-resident, so coverage checks and set operations run at memory bandwidth, not cache-miss cost.
Witness propagation. Before any search, the solver checks for forced lines: if any uncovered prime lies on exactly one remaining candidate line, that line must be in the solution. Propagation chains. About 60% of all N values are certified this way with zero branching — the solution is forced by logical necessity alone.
Lagrangian relaxation for lower bounds. To prune the branch-and-bound tree you need tight lower bounds on the number of lines still needed. The solver uses Lagrangian relaxation of the covering constraints, optimised with projected subgradient descent. This yields bounds sharp enough to prune most branches immediately.
Exclusive Dependency Rule. This is the key branching innovation. If adding a candidate line L to the partial solution would make some other line L′ the only possible cover for a particular prime, then L and L′ are exclusively dependent: choosing L forces L′. The solver doesn't branch — it includes both lines and continues. This collapses large parts of the search tree that would otherwise require explicit exploration.
RESULTS
N=861 (previous record): 22 minutes on a single machine vs. 282 hours with CPLEX — ~750× faster N=1024 (new record): certified in under 40 hours, f(1024)=143 163 new f(N) values certified for the first time 20 new "awkward primes" — primes that provably force an increase in f(N) — confirmed for the first time
Code is MIT-licensed C++, full write-up and a live browser demo at the link above. OEIS: https://oeis.org/A373813
[link] [comments]










English (US) ·