Tathagata Debnath Contact
Projects

AI/ML engineering, agentic systems, and research software.

AI/ML Engineering

Foundation model fine-tuning and agentic systems.

Flagship

CodeQ — Autonomous Code Debugging Agent

Qwen2.5-Coder-7B · MCTS · DPO · 2× H100

CodeQ is a self-improving code debugging agent inspired by Agent Q (Putta et al., 2024). It uses Monte Carlo Tree Search to systematically explore fix strategies for buggy code, an AI self-critique mechanism with dual-temperature scoring to rank proposed fixes, and Direct Preference Optimization to teach the model to prefer successful fixes over failed ones — all without human intervention.

The system runs across two NVIDIA H100 nodes: Machine A handles MCTS inference in 4-bit quantization (~4–6 GB VRAM), while Machine B runs DPO training with LoRA in bf16 (~30–35 GB VRAM). LoRA adapters are transferred between machines via scp, enabling a pipelined workflow where exploration and training overlap.

Key Results

Metric Value
Pre-refactor line-edit baseline ~10% (parse failures)
Full rewrite baseline 43.9% (54/123)
MCTS rewrite (base model) 81.3% (100/123)
MCTS rewrite (+ DPO Round 2) 84.0% (42/50)
Improvement from rewrite refactor 10% → 81.3%
DPO transfer to full_rewrite mode No transfer

Note: 81% data duplication discovered and fixed in DebugBench dataset during preprocessing.

Qwen2.5-Coder-7B-Instruct MCTS DPO LoRA r=32 bf16 4-bit bitsandbytes HuggingFace TRL Flash Attention 2 Docker 2× NVIDIA H100 94GB W&B
GitHub → Technical Blog Post →
In Progress

VisionTriage — Multimodal Bug Report Triage

Qwen2.5-VL-7B-Instruct · QLoRA · Eclipse/Mozilla · Rico

VisionTriage fine-tunes Qwen2.5-VL-7B-Instruct with QLoRA to automatically triage software bug reports that include screenshots. The model takes a screenshot of a UI bug plus a text description and outputs structured triage metadata: severity level, affected component, bug type classification, root cause hypothesis, and suggested fix.

The text-only severity prediction baseline is benchmarked against published methods (SevPredict, MASP, BERT-SBR) on the standard Eclipse/Mozilla Defect Tracking Dataset (~215K bug reports). The multimodal extension uses Rico screenshots with programmatic bug injection to demonstrate that adding visual context improves triage accuracy over text-only approaches.

This project connects directly to CodeQ — CodeQ fixes bugs from code; VisionTriage triages bugs from visual reports before they reach a developer.

Qwen2.5-VL-7B-Instruct QLoRA Eclipse/Mozilla Dataset Rico Synthetic Bug Injection HuggingFace TRL Gradio
Featured

Parallel Multi-Agent Code Generation

DAG-Based Agent Orchestration for Code Synthesis

A DAG-based multi-agent code generation system built with LangGraph and the native Anthropic SDK. An orchestrator agent analyzes coding tasks, builds a dependency graph, and dispatches parallel async coder workers that generate, review, and test code through structured handoffs.

LangGraph Anthropic SDK asyncio Python
GitHub →
Featured

Self-Evolving Code Generation

LLM-as-Judge · Autonomous Prompt Evolution

Extension of the multi-agent pipeline that adds an LLM-as-Judge evaluator, failure analyzer, and autonomous prompt evolver. The system forms a generation loop where the tester agent rewrites its own system prompt based on evaluation feedback, creating a self-improving code generation pipeline.

LangGraph LLM-as-Judge Prompt Evolution JSON Tracker Docker
GitHub →
Foundation

Multi-Agent Code Generation V1

Sequential Pipeline · LangSmith Tracing

Sequential multi-agent pipeline using an Orchestrator → Planner → Coder → Reviewer → Tester architecture. Includes LangSmith tracing for observability and Docker sandboxing for safe code execution. This was the foundation that led to the parallel and self-evolving versions.

LangChain LangSmith Docker Python
GitHub →
Published Research Software

CRAN packages and bioinformatics tools.

Published · CRAN

OptCirClust

Fast Optimal Circular Clustering

Published CRAN R package implementing a fast optimal clustering algorithm for circular (1-dimensional periodic) data. Uses dynamic programming to find the globally optimal partition, avoiding local optima issues of iterative methods like k-means. Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

R CRAN Dynamic Programming
CRAN → GitHub → IEEE TPAMI Paper →
Published · CRAN

CircularSilhouette

Cluster Validation for Circular Data

Published CRAN R package implementing a silhouette-based cluster validation index designed specifically for circular data. Provides a quantitative measure of clustering quality that accounts for the periodic nature of circular measurements. Published in IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).

R CRAN Statistical Methods
CRAN → GitHub → IEEE/ACM TCBB Paper →
Retrieval & Applied ML

RAG systems and applied machine learning.

Featured

NutriBot — RAG Nutrition Chatbot

Hybrid FAISS + BM25 · Reciprocal Rank Fusion

RAG-based nutrition Q&A chatbot using hybrid FAISS + BM25 retrieval merged via Reciprocal Rank Fusion. 30-question evaluation pipeline with hybrid retrieval achieving improved relevance over single-method baselines. Includes cost optimization strategies for API usage.

FAISS BM25 Reciprocal Rank Fusion LangChain Python
GitHub →