Projects

AI/ML engineering, agentic systems, and research software.

AI/ML Engineering

Foundation model fine-tuning and agentic systems.

Flagship

CodeQ — Autonomous Code Debugging Agent

Qwen2.5-Coder-7B · MCTS · DPO · 2× H100

CodeQ is a self-improving code debugging agent inspired by Agent Q (Putta et al., 2024). It uses Monte Carlo Tree Search to systematically explore fix strategies for buggy code, an AI self-critique mechanism with dual-temperature scoring to rank proposed fixes, and Direct Preference Optimization to teach the model to prefer successful fixes over failed ones — all without human intervention.

The system runs across two NVIDIA H100 nodes: Machine A handles MCTS inference in 4-bit quantization (~4–6 GB VRAM), while Machine B runs DPO training with LoRA in bf16 (~30–35 GB VRAM). LoRA adapters are transferred between machines via scp, enabling a pipelined workflow where exploration and training overlap.

Key Results

Metric	Value
Pre-refactor line-edit baseline	~10% (parse failures)
Full rewrite baseline	43.9% (54/123)
MCTS rewrite (base model)	81.3% (100/123)
MCTS rewrite (+ DPO Round 2)	84.0% (42/50)
Improvement from rewrite refactor	10% → 81.3%
DPO transfer to full_rewrite mode	No transfer

Note: 81% data duplication discovered and fixed in DebugBench dataset during preprocessing.

Qwen2.5-Coder-7B-Instruct MCTS DPO LoRA r=32 bf16 4-bit bitsandbytes HuggingFace TRL Flash Attention 2 Docker 2× NVIDIA H100 94GB W&B

GitHub → Technical Blog Post →

In Progress

VisionTriage — Multimodal Bug Report Triage

Qwen2.5-VL-7B-Instruct · QLoRA · Eclipse/Mozilla · Rico

VisionTriage fine-tunes Qwen2.5-VL-7B-Instruct with QLoRA to automatically triage software bug reports that include screenshots. The model takes a screenshot of a UI bug plus a text description and outputs structured triage metadata: severity level, affected component, bug type classification, root cause hypothesis, and suggested fix.

The text-only severity prediction baseline is benchmarked against published methods (SevPredict, MASP, BERT-SBR) on the standard Eclipse/Mozilla Defect Tracking Dataset (~215K bug reports). The multimodal extension uses Rico screenshots with programmatic bug injection to demonstrate that adding visual context improves triage accuracy over text-only approaches.

This project connects directly to CodeQ — CodeQ fixes bugs from code; VisionTriage triages bugs from visual reports before they reach a developer.

Qwen2.5-VL-7B-Instruct QLoRA Eclipse/Mozilla Dataset Rico Synthetic Bug Injection HuggingFace TRL Gradio

Featured

Parallel Multi-Agent Code Generation

DAG-Based Agent Orchestration for Code Synthesis

A DAG-based multi-agent code generation system built with LangGraph and the native Anthropic SDK. An orchestrator agent analyzes coding tasks, builds a dependency graph, and dispatches parallel async coder workers that generate, review, and test code through structured handoffs.

LangGraph Anthropic SDK asyncio Python

GitHub →

Featured

Self-Evolving Code Generation

LLM-as-Judge · Autonomous Prompt Evolution

Extension of the multi-agent pipeline that adds an LLM-as-Judge evaluator, failure analyzer, and autonomous prompt evolver. The system forms a generation loop where the tester agent rewrites its own system prompt based on evaluation feedback, creating a self-improving code generation pipeline.

LangGraph LLM-as-Judge Prompt Evolution JSON Tracker Docker

GitHub →

Foundation

Multi-Agent Code Generation V1

Sequential Pipeline · LangSmith Tracing

Sequential multi-agent pipeline using an Orchestrator → Planner → Coder → Reviewer → Tester architecture. Includes LangSmith tracing for observability and Docker sandboxing for safe code execution. This was the foundation that led to the parallel and self-evolving versions.

LangChain LangSmith Docker Python

GitHub →

Published Research Software

CRAN packages and bioinformatics tools.

Published · CRAN

OptCirClust

Fast Optimal Circular Clustering

Published CRAN R package implementing a fast optimal clustering algorithm for circular (1-dimensional periodic) data. Uses dynamic programming to find the globally optimal partition, avoiding local optima issues of iterative methods like k-means. Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

R CRAN Dynamic Programming

CRAN → GitHub → IEEE TPAMI Paper →

Published · CRAN

CircularSilhouette

Cluster Validation for Circular Data

Published CRAN R package implementing a silhouette-based cluster validation index designed specifically for circular data. Provides a quantitative measure of clustering quality that accounts for the periodic nature of circular measurements. Published in IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).

R CRAN Statistical Methods

CRAN → GitHub → IEEE/ACM TCBB Paper →

Retrieval & Applied ML

RAG systems and applied machine learning.

Featured

NutriBot — RAG Nutrition Chatbot

Hybrid FAISS + BM25 · Reciprocal Rank Fusion

RAG-based nutrition Q&A chatbot using hybrid FAISS + BM25 retrieval merged via Reciprocal Rank Fusion. 30-question evaluation pipeline with hybrid retrieval achieving improved relevance over single-method baselines. Includes cost optimization strategies for API usage.

FAISS BM25 Reciprocal Rank Fusion LangChain Python

GitHub →