I am a Ph.D. student in Computer Science at Concordia University, advised by Dr. Tse-Hsun (Peter) Chen in the SPEAR Lab. I build LLM-based agents that read, reason about, and modify real software systems, applying them to performance-sensitive configuration tuning, program repair, and regression-test diagnosis. Most of my experiments run on big-data systems like Apache Spark, where one wrong configuration can change performance by orders of magnitude. My current focus: what would it take to make code-reasoning agents reliable enough to ship into real developer workflows?