About Me

๐Ÿ‘‹ I am Hanyu Li (ๆŽ็€šๅฎ‡), an undergraduate student jointly educated by Beijing University of Posts and Telecommunications and Queen Mary University of London. I expect to receive my Bachelor's degree in Information and Computing Science in 2027. My research interests lie in AI safety, AI agents, and multimodal models, with a particular focus on robustness and safety alignment across text and vision modalities.

๐ŸŽ“ I am looking for Ph.D. opportunities for Fall 2027 and am actively seeking collaborations on AI agents. Please feel free to contact me ๐Ÿš€.

Education
  • Beijing University of Posts and Telecommunications
    Beijing University of Posts and Telecommunications
    B.S. in Information and Computing Science
    Sep. 2023 - present
Honors & Awards
  • Honor Award - SpaVLE@NeurIPS2025
    2025
  • National Scholarship of China
    2025
  • First Prize (Beijing), China International College Studentsโ€™ Innovation Competition
    2025
  • First Prize (Hainan), Data Element ร— Competition
    2024
News
2026
One paper (RepoMirage) has been accepted to Realiable Autonomy and AIWILD@ICLR2026! ๐ŸŽ‰
Mar 02
2025
Started a research internship in Prof. Yinpeng Dongโ€™s group at the AI College, THU.
Oct 15
Started a research internship at Squirrel AI, supervised by Dr. Kun Wang (NTU), working on multimodal safety alignment.
Jul 15
Selected Publications (view all )
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Hanyu Li *, Yichi Zhang *, Speed Zhu, Hang Su, Jun Zhu, Yinpeng Dong (* equal contribution)

ICLR Workshop AIWILD and Realiable Autonomy 2026 Accepted

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To ... Show more Show less

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To investigate this question, we introduce RepoMirage, a two-stage evaluation suite built on SWE-Bench Verified that adopts perturbation as a diagnostic tool to increase the demand for context reasoning by transforming how the repository is exposed. First, RepoMirage-Perturb applies three types of semantics-preserving repository-level perturbations, revealing a clear performance drop when correct solving requires broader context access. RepoMirage-Extend further turns perturbation-targeted structural bottlenecks into explicit tasks beyond issue resolution, where the average performance declines from 66.8% in the original setting to 25.3%, indicating a significant deficiency in repository context reasoning. Further trajectory analysis reveals an exploration drift, where agents access broader repository context but fail to turn it into effective structure information. Motivated by this observation, we propose RepoAnchor, a structure-first prototype workflow that separates repository exploration from downstream problem solving, and show that explicit structural scaffolding yields notable gains. These results uncover an previously overlooked gap in repository context reasoning for code agents and suggest that stronger structure-aware methods are potential to improve them.

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Hanyu Li *, Yichi Zhang *, Speed Zhu, Hang Su, Jun Zhu, Yinpeng Dong (* equal contribution)

ICLR Workshop AIWILD and Realiable Autonomy 2026 Accepted

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To ... Show more Show less

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To investigate this question, we introduce RepoMirage, a two-stage evaluation suite built on SWE-Bench Verified that adopts perturbation as a diagnostic tool to increase the demand for context reasoning by transforming how the repository is exposed. First, RepoMirage-Perturb applies three types of semantics-preserving repository-level perturbations, revealing a clear performance drop when correct solving requires broader context access. RepoMirage-Extend further turns perturbation-targeted structural bottlenecks into explicit tasks beyond issue resolution, where the average performance declines from 66.8% in the original setting to 25.3%, indicating a significant deficiency in repository context reasoning. Further trajectory analysis reveals an exploration drift, where agents access broader repository context but fail to turn it into effective structure information. Motivated by this observation, we propose RepoAnchor, a structure-first prototype workflow that separates repository exploration from downstream problem solving, and show that explicit structural scaffolding yields notable gains. These results uncover an previously overlooked gap in repository context reasoning for code agents and suggest that stronger structure-aware methods are potential to improve them.

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics
SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

Haolong Hu *, Hanyu Li *, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng (* equal contribution)

2026

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR... Show more Show less

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR-Steer, a progressive multi-turn alignment framework that combines staged synthetic bootstrapping with tutor-in-the-loop GRPO to train a single student under adaptive, on-policy attacks. We also introduce TCSR, which uses trajectory-level minimum/average safety to propagate late-turn failures to earlier turns. I. Dataset. We release STEER, a multi-turn multimodal safety dataset with STEER-SFT (12,934), STEER-RL (2,000), and STEER-Bench (3,227) dialogues spanning 2~10 turns. II. Experiment. Starting from Qwen2.5-VL-3B/7B, SaFeR-Steer improves Safety/Helpfulness on both single-turn (48.30/45.86 โ†’ 81.84/70.77 for 3B; 56.21/60.32 โ†’ 87.89/77.40 for 7B) and multi-turn benchmarks (12.55/27.13 for 3B โ†’ 55.58/70.27; 24.66/46.48 โ†’ 64.89/72.35 for 7B), shifting failures to later turns and yielding robustness beyond scaling alone.

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

Haolong Hu *, Hanyu Li *, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng (* equal contribution)

2026

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR... Show more Show less

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR-Steer, a progressive multi-turn alignment framework that combines staged synthetic bootstrapping with tutor-in-the-loop GRPO to train a single student under adaptive, on-policy attacks. We also introduce TCSR, which uses trajectory-level minimum/average safety to propagate late-turn failures to earlier turns. I. Dataset. We release STEER, a multi-turn multimodal safety dataset with STEER-SFT (12,934), STEER-RL (2,000), and STEER-Bench (3,227) dialogues spanning 2~10 turns. II. Experiment. Starting from Qwen2.5-VL-3B/7B, SaFeR-Steer improves Safety/Helpfulness on both single-turn (48.30/45.86 โ†’ 81.84/70.77 for 3B; 56.21/60.32 โ†’ 87.89/77.40 for 7B) and multi-turn benchmarks (12.55/27.13 for 3B โ†’ 55.58/70.27; 24.66/46.48 โ†’ 64.89/72.35 for 7B), shifting failures to later turns and yielding robustness beyond scaling alone.

Digital Tools for Enhancing Rural Government Efficiency: A Case Study with a Focus on Older Adults
Digital Tools for Enhancing Rural Government Efficiency: A Case Study with a Focus on Older Adults

Yifan Chen, Zhiting Lei, Hanyu Li, Yilin Zhang, Weiwei Zhang

Chinese CHI 2024 AcceptedACM

In recent years, China has made notable progress in the digital transformation of its service sector; however, rural regions continue to face substantial challenges in this domain. Government cadres and residents of rural communities, particularly older adults, often have difficulties in adopting and using digital technologies. To address these challenges... Show more Show less

In recent years, China has made notable progress in the digital transformation of its service sector; however, rural regions continue to face substantial challenges in this domain. Government cadres and residents of rural communities, particularly older adults, often have difficulties in adopting and using digital technologies. To address these challenges, this study has developed a smartphone-based digital tool that utilizes the deviceโ€™s camera to capture, recognize, and organize textual or image information, which is subsequently uploaded to the system. This methodology improves the efficiency of administrative tasks and facilitates the submission of digital information by older adults. The study concentrates on the digital design of grassroots governance services in rural areas, emphasizing existing challenges and providing practical solutions and design recommendations to advance digital development in rural communities.

Digital Tools for Enhancing Rural Government Efficiency: A Case Study with a Focus on Older Adults

Yifan Chen, Zhiting Lei, Hanyu Li, Yilin Zhang, Weiwei Zhang

Chinese CHI 2024 AcceptedACM

In recent years, China has made notable progress in the digital transformation of its service sector; however, rural regions continue to face substantial challenges in this domain. Government cadres and residents of rural communities, particularly older adults, often have difficulties in adopting and using digital technologies. To address these challenges... Show more Show less

In recent years, China has made notable progress in the digital transformation of its service sector; however, rural regions continue to face substantial challenges in this domain. Government cadres and residents of rural communities, particularly older adults, often have difficulties in adopting and using digital technologies. To address these challenges, this study has developed a smartphone-based digital tool that utilizes the deviceโ€™s camera to capture, recognize, and organize textual or image information, which is subsequently uploaded to the system. This methodology improves the efficiency of administrative tasks and facilitates the submission of digital information by older adults. The study concentrates on the digital design of grassroots governance services in rural areas, emphasizing existing challenges and providing practical solutions and design recommendations to advance digital development in rural communities.

All publications