🚀 Discover this must-read post from Hacker News 📖
📂 **Category**:
💡 **What You’ll Learn**:
_ _ ____ _ _____
/ \ __ _ ___ _ __ | |_/ ___| / \|_ _|
/ _ \ / _` |/ _ \ '_ \| __\___ \ / _ \ | |
/ ___ \ (_| | __/ | | | |_ ___) / ___ \| |
/_/ \_\__, |\___|_| |_|\__|____/_/ \_\_|
|___/
An autonomous AI agent that teaches itself to become the world’s top expert on MaxSAT. Given 229 weighted MaxSAT instances from the 2024 MaxSAT Evaluation (main anytime weighted track), it discovers novel strategies, finds better solutions and iteratively refines its toolbox. No human guidance.
- An AI agent (e.g. Claude Code) reads
program.mdfor instructions - It reads
expert.mdfor accumulated knowledge from prior runs - It reads the library for available tools
- It runs solvers on instances, discovers what works, updates everything
- It commits and pushes to this repo so other agents can build on its findings
┌─────────────────┐
│ GitHub Repo │
│ │
│ expert.md │
│ library/ │
│ best-solutions │
│ experiments.log │
└────────┬─────────┘
git pull/push │
┌─────────────┬────────┴────────┬─────────────┐
│ │ │ │
┌──────▼──────┐ ┌───▼────────┐ ┌──────▼──────┐ ...
│ VM 1 │ │ VM 2 │ │ VM 3 │
│ │ │ │ │ │
│ ┌─────────┐ │ │ ┌────────┐ │ │ ┌─────────┐ │
│ │ Agent 1 │ │ │ │Agent 3 │ │ │ │ Agent 5 │ │
│ │ ┌─┬─┬─┐ │ │ │ │┌─┬─┬─┐ │ │ │ │ ┌─┬─┬─┐ │ │
│ │ │S│S│S│ │ │ │ ││S│S│S│ │ │ │ │ │S│S│S│ │ │
│ │ └─┴─┴─┘ │ │ │ │└─┴─┴─┘ │ │ │ │ └─┴─┴─┘ │ │
│ ├─────────┤ │ │ ├────────┤ │ │ ├─────────┤ │
│ │ Agent 2 │ │ │ │Agent 4 │ │ │ │ Agent 6 │ │
│ │ ┌─┬─┬─┐ │ │ │ │┌─┬─┬─┐ │ │ │ │ ┌─┬─┬─┐ │ │
│ │ │S│S│S│ │ │ │ ││S│S│S│ │ │ │ │ │S│S│S│ │ │
│ │ └─┴─┴─┘ │ │ │ │└─┴─┴─┘ │ │ │ │ └─┴─┴─┘ │ │
│ └─────────┘ │ │ └────────┘ │ │ └─────────┘ │
└─────────────┘ └────────────┘ └─────────────┘
S = solver process (python)
# Launch on EC2 (handles everything: installs deps, clones repo,
# downloads benchmarks from Helsinki, launches agents in tmux)
./run.sh --host ec2-user@<ip> --agents 3
Requires a .env file with CLAUDE_CODE_API_KEY and GITHUB_ACCESS_TOKEN. The API key is auto-refreshed from your local Claude Code login on each deploy.
Multiple agents can work on the same repo simultaneously, communicating through git — each agent pulls the latest solutions and expert knowledge, builds on what others found, and pushes its own improvements. No coordination needed beyond git pull and git push.
| Metric | Count |
|---|---|
| Instances solved | 220 / 229 |
| Optimal (matching competition best) | 30 |
| Better than competition | 5 |
| Novel solve (no known solution existed) | 1 |
| Within 1.1x of reference | 123 |
| Within 1.5x | 183 |
| Within 2x | 209 |
| Unsolved | 9 |
Beat the 2024 MaxSAT Competition
| Instance | Our cost | Competition best | Improvement |
|---|---|---|---|
| switchingactivity_74 | 10 | 16 | 37.5% better |
| synplicate dag_run2_10_size_11 | 374 | 518 | 27.8% better |
| synplicate dag_run2_16_size_9 | 333 | 398 | 16.3% better |
| switchingactivity_68 | 8 | 9 | 11.1% better |
| BTBNSL hailfinder_10000 | 49,986,819,152 | 50,007,681,202 | 0.04% better |
| pseudoBoolean mod010 | 8,081 | no solution | novel solve |
| Instance | Ratio | Why it’s hard |
|---|---|---|
| relational-inference pa-1 | 603x | 2.5M vars, 1.1M soft clauses |
| 9.7x | 51K softs, 9.7M hard clauses | |
| causal-discovery Water | 4.4x | 8.3M cost vs 1.9M ref |
| timetabling test4 | 3.3x | 131K vars |
| decision-tree tic-tac-toe | 2.6x | adaboost ensemble |
9 instances remain unsolved — mostly >16M variables or no known reference solution.
The agent developed these approaches autonomously, discovering what works through experimentation:
| Technique | Best for | Key insight |
|---|---|---|
| Greedy SAT with selector variables | Few soft clauses (<500) | Heaviest-first greedy with CaDiCaL assumptions |
| Core-guided search | Unit soft clauses | Iterative UNSAT core relaxation. comp07.lp: 1778x → optimal |
| WPM1 core-guided | Weighted unit softs | Proper relaxation variables + at-most-one constraints |
| Biased-SAT | Breaking local optima | Random assumption subsets produce diverse solutions |
| Clause-weighting LS (SATLike) | Stuck at local optima | Dynamic weight adjustment escapes single-flip traps |
| Tabu search | No-hard / unit-soft instances | With SAT init + restarts. judgment-aggregation: 49x → 1.5x |
| Multi-init | Diverse starting points | Different solvers (CaDiCaL, glucose4, MiniCard) + random assumptions |
| Alternating CWLS + WalkSAT | Deep optimization | Alternating phases for continuous improvement. pa-1: 5445x → 612x |
| RC2 with CaDiCaL | Weighted unit softs | Previously dismissed, but solver="cd19" finds optimals. haplotyping-12: 3.8x → 1.01x |
All code the agent writes lives in library/:
| Module | Functions | Purpose |
|---|---|---|
solvers.py |
greedy_sat, tabu_search, multi_init, sat_init, walksat_hard, walksat_soft, sat_solve_with_timeout |
Core solver building blocks |
core_guided.py |
core_guided_solve, core_guided_budget, wpm1_solve |
UNSAT core-based optimization for unit soft clauses |
clause_weight_ls.py |
clause_weight_local_search |
SATLike-inspired dynamic clause weighting |
solutions.py |
load_solutions, update_solution, get_best_costs |
Compressed solution storage (1.7GB → 1.5MB) |
wcnf_parser.py |
parse_wcnf, evaluate_cost, check_hard_clauses |
Single-pass streaming WCNF parser |
- Low parallelism: Claude Code rarely launches more than 6 parallel scripts, and often runs just 1-2 at a time, leaving most cores idle on large machines.
- Tunnel vision: The agent can fixate on grinding one instance for hours (e.g. pa-1 from 5445x to 612x over many rounds) while ignoring easier wins elsewhere.
- Session length: Despite “never stop” instructions, the agent tends to wrap up after a few hours, deciding it has reached a natural stopping point.
The agent maintains expert.md as a living knowledge base and improves the library as it learns.
🔥 **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#iliazintchenkoagentsat #Agent #learns #worlds #top #expert #SAT #GitHub**
🕒 **Posted on**: 1773883829
🌟 **Want more?** Click here for more info! 🌟
