CFR Edge - Poker GTO Solver

The tournament

After a Jane Street recruiting event, there was a small poker tournament at the end of the night. I got knocked out pretty early and it bothered me in a specific way. Not because I lost, but because I genuinely had no idea what I was supposed to be doing at any point.

Chess felt approachable by comparison. Both players see the whole board. Poker was different in a way I couldn't quite articulate yet. The hidden cards weren't just an inconvenience. They changed the structure of the problem entirely. And I had no framework for reasoning about it.

The thing I kept thinking about

When you can't see the full state of a system, what does it mean to act well? The clean idea of a single best move stops working. You have to think about what your opponent might be holding, what they think you might be holding, and how to mix your actions so you don't just telegraph everything you do. It's a different category of problem than the search and optimization stuff I was used to.

I wanted to understand it properly, not just read a Wikipedia summary of it.

Finding CFR

I went looking for how real poker solvers work and kept running into Counterfactual Regret Minimization. The core idea is pretty intuitive once you sit with it: at every decision point, track what you regret not doing. Do that across thousands of iterations. Let the regrets shape the strategy. Average it all out. The math is surprisingly clean for something that actually solves a hard problem.

Reading the theory is one thing. Writing code that does what the theory describes is a much longer project.

The visual shell is just a window into the solver output.

Starting with the smallest possible game

Kuhn Poker has three cards, two players, and maybe six decision points total. I picked it on purpose. It's small enough to traverse completely, which means you can verify that regret accumulation is actually doing what you think it is. When the strategies started converging toward equilibrium across iterations, that was the first moment the algorithm felt real rather than theoretical.

It also immediately made clear that most of the work was in the bookkeeping, not the math.

When a function becomes a system

Once Kuhn was working, I started caring about the structure. A recursive function is not a solver. I needed game logic separated from traversal, an information-set representation that would hold up as games got bigger, a storage layer for regrets and accumulated strategies that wouldn't become the bottleneck, and an export format I could actually read. The C++ side and the Next.js visualization needed to stay independent so neither would drag the other down when I changed something.

Getting those boundaries right early saved me from a lot of bad refactors later.

The part that took the longest to get right

Information set keys encode exactly what a player can observe: hole cards, community cards, betting history. Nothing else. No peeking at opponent cards, even in simulation. Getting that encoding right is what makes the solver correct. But the key structure also affects memory layout, lookup speed, how well the data fits in cache, and how readable the output is when something goes wrong. A sloppy representation does not just make things slower. It compounds across every single iteration of the algorithm.

This is where the project stopped feeling like an exercise and started feeling like real software.

Trying different algorithms

Vanilla CFR gave me the baseline and showed me what convergence actually looks like in practice. It's not as smooth as the theory implies. CFR+ introduced regret clipping and changed the convergence profile in ways I had to see to believe. DCFR made the down-weighting of early iterations explicit. Then games got big enough that full traversal stopped being feasible, so I added MCCFR with external sampling. Each variant taught me something different about the tradeoffs between iteration cost, memory pressure, and how fast the strategies actually converge.

Scaling up

Leduc Hold'em has a public card and pushed the exported infoset count to 288. That felt manageable. When I moved to an abstracted heads-up no-limit Hold'em with 33,260 infosets, the difference between slow and unusable became very concrete very fast. Sampling stopped being optional. So did thinking carefully about what I was allocating and how often.

Why I built the visualizer

Log files are a terrible way to debug a solver. You can't see whether a mixed strategy is converging, whether regrets are moving the right direction, or whether something is subtly wrong until it shows up as a bad number at the end. I built the strategy browser, convergence plots, and interactive demo because I needed to actually see what the solver was doing. It started as a debugging tool and turned into the most useful part of the whole project.

Kuhn convergenceopen

Hold'em abstractionopen

Where things stand

These numbers are not a claim about anything state-of-the-art. They are a snapshot of what this project has actually shipped and what I can inspect right now. I would rather show something honest than make preliminary work look shinier than it is.

...

Kuhn DCFR final epsilon

...

Leduc Hold'em

...

Hold'em MCCFR

...

Generated solver bundle

What actually got better

The biggest shift was not any single feature. It was the point where I stopped guessing about what the solver was doing and started being able to check. Cleaner separation between C++ and the frontend meant changes in one place stopped breaking the other. Better export formats meant the strategy data was actually readable. More algorithm variants meant I could compare convergence behavior instead of just trusting the default. Each of those was a small thing, but together they changed how I worked on the project day to day.

What I took away from this

Theory that looks clean on paper gets complicated the moment you have to lay it out in memory. The choice of data structure for an information set matters more than the update rule in practice, because a bad layout shows up in the profiler almost immediately. Small games are genuinely useful for building confidence in a design before you commit it to something bigger. And the visualizer caught bugs that no amount of log-reading would have found. If I were starting over, I would build it first.

How this connects to other things I care about

I have spent time around systems that make decisions quickly under real constraints, at DeepMind, through my incoming work at NVIDIA, and in competitive settings like IMC Prosperity where my team placed Top 25 out of 22,000 teams. The thread connecting all of it is the gap between a correct algorithm and a fast, inspectable, reliable system. That gap is where the interesting engineering problems live, and CFR-Edge sits right in the middle of it.

What comes next

Proper exploitability evaluation against a known Nash strategy. A benchmark harness that gives variance-aware results rather than single-run numbers. Parallel traversal is the most obvious thing the current design is missing, and figuring out the right synchronization model for concurrent regret updates is a problem I have been putting off because it deserves real attention. Allocator-aware infoset storage to reduce pressure during traversal. Strategy diffing across algorithm runs. Tighter invariants around game-state transitions so bugs surface at construction time rather than quietly mid-run.

Why I kept working on it

Getting knocked out of that poker tournament was annoying for about a day. Then it became useful, because it pointed me at a problem I had never thought carefully about. CFR-Edge is what happened when I decided to stop reading about the problem and start building something. It is still in progress. Probably always will be. But it is the project I keep coming back to when I want to actually think, and that is the best thing I can say about any side project.