Blog

Technical writeups on AI/ML engineering and research.

Posts

Implementation notes and lessons from building.

AI/ML Engineering ~9 min read April 2026

Published

CodeQ: Teaching an LLM to Debug Code with MCTS and DPO

How to build a self-improving code debugging agent: MCTS exploration, dual-temperature critique, DPO training from the model's own rollouts — and the critical refactor that went from 10% to 81.3% fix rate on DebugBench. Includes the 81% data duplication discovery, the bf16 NaN fix, and an honest accounting of what DPO did and didn't transfer.

Read →

AI/ML Engineering Coming soon

In Progress

Adding Eyes to Bug Report Triage: Multimodal VLM Fine-Tuning

Building a multimodal bug triage benchmark from scratch: Eclipse/Mozilla text baseline vs. SevPredict and MASP, then adding Rico screenshots to test whether visual context improves severity prediction. Will include the dataset release on HuggingFace.