
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-04 | Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection | Yangyang Wei et.al. | 2603.04469 | null |
| 2026-03-03 | Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks | Junjie Chu et.al. | 2603.04459 | null |
| 2026-03-04 | Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions | Neha Nagaraja et.al. | 2603.03637 | null |
| 2026-03-04 | Goal-Driven Risk Assessment for LLM-Powered Systems: A Healthcare Case Study | Neha Nagaraja et.al. | 2603.03633 | null |
| 2026-03-03 | Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use | Aradhye Agarwal et.al. | 2603.03205 | null |
| 2026-03-02 | DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern | Xiaoyi Pang et.al. | 2603.01574 | null |
| 2026-03-01 | Tracking Capabilities for Safer Agents | Martin Odersky et.al. | 2603.00991 | null |
| 2026-02-28 | From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems | Yijun Yu et.al. | 2603.00472 | null |
| 2026-02-27 | LiaisonAgent: An Multi-Agent Framework for Autonomous Risk Investigation and Governance | Chuanming Tang et.al. | 2603.00200 | null |
| 2026-02-26 | Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection | Marcus Graves et.al. | 2603.00164 | null |
| 2026-02-27 | SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls | Qianxun Xu et.al. | 2602.23956 | null |
| 2026-02-26 | AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification | Tian Zhang et.al. | 2602.22724 | null |
| 2026-02-25 | Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace | Qianlong Lan et.al. | 2602.22450 | null |
| 2026-02-24 | Analysis of LLMs Against Prompt Injection and Jailbreak Attacks | Piyush Jaiswal et.al. | 2602.22242 | null |
| 2026-02-24 | SoK: Agentic Skills – Beyond Tool Use in LLM Agents | Yanna Jiang et.al. | 2602.20867 | null |
| 2026-02-24 | AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs | Che Wang et.al. | 2602.20720 | null |
| 2026-02-24 | ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction | Che Wang et.al. | 2602.20708 | null |
| 2026-02-25 | Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks | David Schmotz et.al. | 2602.20156 | null |
| 2026-02-23 | The LLMbda Calculus: AI Agents, Conversations, and Information Flow | Zac Garby et.al. | 2602.20064 | null |
| 2026-02-23 | CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents | Lei Ba et.al. | 2602.19547 | null |
| 2026-02-19 | Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models | Manuel Wirth et.al. | 2602.18514 | null |
| 2026-02-18 | The Vulnerability of LLM Rankers to Prompt Injection Attacks | Yu Yin et.al. | 2602.16752 | null |
| 2026-02-19 | Policy Compiler for Secure Agentic Systems | Nils Palumbo et.al. | 2602.16708 | null |
| 2026-02-15 | SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement | Xiaojun Jia et.al. | 2602.14211 | null |
| 2026-02-15 | When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift | Max Fomin et.al. | 2602.14161 | null |
| 2026-02-21 | AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks | Yuqi Jia et.al. | 2602.13597 | null |
| 2026-02-13 | OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage | Akshat Naik et.al. | 2602.13477 | null |
| 2026-02-12 | Sparse Autoencoders are Capable LLM Jailbreak Mitigators | Yannick Assogba et.al. | 2602.12418 | null |
| 2026-02-11 | Optimizing Agent Planning for Security and Autonomy | Aashish Kolluri et.al. | 2602.11416 | null |
| 2026-02-11 | Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection | J Alex Corll et.al. | 2602.11247 | null |
| 2026-02-13 | Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System | Zhenhua Zou et.al. | 2602.10915 | null |
| 2026-02-11 | When Skills Lie: Hidden-Comment Injection in LLM Agents | Qianli Wang et.al. | 2602.10498 | null |
| 2026-02-11 | Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI | Mohan Rajagopalan et.al. | 2602.10481 | null |
| 2026-02-11 | The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis | Peiran Wang et.al. | 2602.10453 | null |
| 2026-02-10 | Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime | Herman Errico et.al. | 2602.09433 | null |
| 2026-02-09 | MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks | Georgios Syros et.al. | 2602.09222 | null |
| 2026-02-09 | When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents | Yuting Ning et.al. | 2602.08995 | null |
| 2026-02-08 | Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation | Shayan Ali Hassan et.al. | 2602.08062 | null |
| 2026-02-08 | CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution | Minbeom Kim et.al. | 2602.07918 | null |
| 2026-02-07 | AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management | Ruoyao Wen et.al. | 2602.07398 | null |
| 2026-02-07 | When the Model Said ‘No Comment’, We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified | Gautam Siddharth Kashyap et.al. | 2602.07381 | null |
| 2026-02-06 | Extended to Reality: Prompt Injection in 3D Environments | Zhuoheng Li et.al. | 2602.07104 | null |
| 2026-02-06 | TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking | Sung-Hoon Yoon et.al. | 2602.06440 | null |
| 2026-02-06 | MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs | Junhyeok Lee et.al. | 2602.06268 | null |
| 2026-02-05 | Learning to Inject: Automated Prompt Injection via Reinforcement Learning | Xin Chen et.al. | 2602.05746 | null |
| 2026-02-05 | Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection | Takashi Koide et.al. | 2602.05484 | null |
| 2026-02-04 | Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks | Jafar Isbarov et.al. | 2602.05066 | null |
| 2026-02-04 | How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks | Yanshu Wang et.al. | 2602.04294 | null |
| 2026-02-03 | WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents | Xilong Wang et.al. | 2602.03792 | null |
| 2026-02-06 | AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System | Hao Li et.al. | 2602.03117 | null |
| 2026-02-03 | The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers | Blake Bullwinkel et.al. | 2602.03085 | null |
| 2026-02-02 | Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework | Alsharif Abuadbba et.al. | 2602.01942 | null |
| 2026-02-02 | RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse | Mingrui Liu et.al. | 2602.01795 | null |
| 2026-02-02 | Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment | Zehua Cheng et.al. | 2602.01587 | null |
| 2026-02-01 | Context Dependence and Reliability in Autoregressive Language Models | Poushali Sengupta et.al. | 2602.01378 | null |
| 2026-02-01 | SMCP: Secure Model Context Protocol | Xinyi Hou et.al. | 2602.01129 | null |
| 2026-01-31 | Bypassing Prompt Injection Detectors through Evasive Injections | Md Jahedur Rahman et.al. | 2602.00750 | null |
(<a href=#updated-on-20260309>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-06 | Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion | Hari Shankar et.al. | 2603.06264 | null |
| 2026-03-06 | Evaluating LLM Alignment With Human Trust Models | Anushka Debnath et.al. | 2603.05839 | null |
| 2026-03-05 | VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment | Jiawei Chen et.al. | 2603.04822 | null |
| 2026-03-04 | When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG | Junchen Li et.al. | 2603.03919 | null |
| 2026-03-03 | A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities | Faiz Ghifari Haznitrama et.al. | 2603.02540 | null |
| 2026-03-03 | RubricBench: Aligning Model-Generated Rubrics with Human Standards | Qiyuan Zhang et.al. | 2603.01562 | null |
| 2026-02-25 | Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual | Yining Li et.al. | 2602.22146 | null |
| 2026-02-24 | Oracle-Robust Online Alignment for Large Language Models | Zimeng Li et.al. | 2602.20457 | null |
| 2026-02-23 | IR $^3$ : Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking | Mohammad Beigi et.al. | 2602.19416 | null |
| 2026-02-26 | Soft Sequence Policy Optimization | Svetlana Glazyrina et.al. | 2602.19327 | null |
| 2026-02-23 | ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment | Hongjue Zhao et.al. | 2602.17560 | null |
| 2026-02-19 | Fail-Closed Alignment for Large Language Models | Zachary Coalson et.al. | 2602.16977 | null |
| 2026-02-18 | References Improve LLM Alignment in Non-Verifiable Domains | Kejian Shi et.al. | 2602.16802 | null |
| 2026-02-18 | Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment | Eva Paraschou et.al. | 2602.16438 | null |
| 2026-02-17 | Discovering Implicit Large Language Model Alignment Objectives | Edward Chen et.al. | 2602.15338 | null |
| 2026-02-15 | Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning | Zhi Zhang et.al. | 2602.14338 | null |
| 2026-02-14 | Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment | Jing Zhao et.al. | 2602.13575 | null |
| 2026-02-14 | Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning | Yanbo Wang et.al. | 2602.13562 | null |
| 2026-02-12 | How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics | Yurong Chen et.al. | 2602.12180 | null |
| 2026-02-12 | Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment | Jiajun Chen et.al. | 2602.12134 | null |
| 2026-02-11 | Evaluating Alignment of Behavioral Dispositions in LLMs | Amir Taubenfeld et.al. | 2602.11328 | null |
| 2026-02-08 | Fairness Aware Reward Optimization | Ching Lam Choi et.al. | 2602.07799 | null |
| 2026-02-07 | Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models | Yixuan Liu et.al. | 2602.07539 | null |
| 2026-02-09 | f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment | Rajdeep Haldar et.al. | 2602.05946 | null |
| 2026-02-10 | Learning Where It Matters: Geometric Anchoring for Robust Preference Alignment | Youngjae Cho et.al. | 2602.04909 | null |
| 2026-02-04 | Multi-scale hypergraph meets LLMs: Aligning large language models for time series analysis | Zongjiang Shang et.al. | 2602.04369 | null |
| 2026-02-04 | From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents | Xinyue Wang et.al. | 2602.04197 | null |
| 2026-02-11 | Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding | Zihao Jing et.al. | 2602.02742 | null |
| 2026-02-09 | Reward-free Alignment for Conflicting Objectives | Peter L. Chen et.al. | 2602.02495 | null |
| 2026-02-02 | Nearly Optimal Active Preference Learning and Its Application to LLM Alignment | Yao Zhao et.al. | 2602.01581 | null |
| 2026-01-29 | Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice? | Ala N. Tak et.al. | 2601.22329 | null |
| 2026-01-26 | One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment | Hongru Cai et.al. | 2601.18731 | null |
| 2026-01-26 | From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation | Yuxin Jiang et.al. | 2601.18533 | null |
| 2026-01-24 | Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment | Tiejin Chen et.al. | 2601.17329 | null |
| 2026-01-20 | CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks | Jiayu Lin et.al. | 2601.13669 | null |
(<a href=#updated-on-20260309>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-04 | Scalable Join Inference for Large Context Graphs | Shivani Tripathi et.al. | 2603.04176 | null |
| 2026-03-02 | Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study | Emmanuel Aboah Boateng et.al. | 2603.01486 | null |
| 2026-02-03 | Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs | Sean W. Kelley et.al. | 2603.00024 | null |
| 2026-02-23 | What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance | William Watson et.al. | 2602.20300 | null |
| 2026-02-15 | Detecting LLM Hallucinations via Embedding Cluster Geometry: A Three-Type Taxonomy with Measurable Signatures | Matic Korun et.al. | 2602.14259 | null |
| 2026-02-12 | Differentiable Modal Logic for Multi-Agent Diagnosis, Orchestration and Communication | Antonin Sulc et.al. | 2602.12083 | null |
| 2026-01-18 | Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering | Nathan Mao et.al. | 2602.11167 | null |
| 2026-02-05 | Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions | Léo Labat et.al. | 2602.05932 | null |
| 2026-02-03 | Data Verification is the Future of Quantum Computing Copilots | Junhao Song et.al. | 2602.04072 | null |
| 2026-02-03 | RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish | Süha Kağan Köse et.al. | 2602.03652 | null |
| 2026-02-04 | Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors | Erica Zhang et.al. | 2601.21410 | null |
| 2026-01-29 | GeoRC: A Benchmark for Geolocation Reasoning Chains | Mohit Talreja et.al. | 2601.21278 | null |
| 2026-01-26 | HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs | Xinyue Zeng et.al. | 2601.18753 | null |
| 2026-01-23 | Do LLM hallucination detectors suffer from low-resource effect? | Debtanu Datta et.al. | 2601.16766 | null |
| 2026-01-20 | IGAA: Intent-Driven General Agentic AI for Edge Services Scheduling using Generative Meta Learning | Yan Sun et.al. | 2601.13702 | null |
| 2026-01-17 | Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning | Chaowei Zhang et.al. | 2601.12019 | null |
| 2026-01-20 | AI Sycophancy: How Users Flag and Respond | Kazi Noshin et.al. | 2601.10467 | null |
| 2026-01-12 | Automating API Documentation from Crowdsourced Knowledge | Bonan Kou et.al. | 2601.08036 | null |
(<a href=#updated-on-20260309>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-02-15 | Evaluating LLMs in Finance Requires Explicit Bias Consideration | Yaxuan Kong et.al. | 2602.14233 | null |
| 2026-02-11 | Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval | William Xion et.al. | 2602.10833 | null |
| 2026-01-26 | PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation | Abhishek Divekar et.al. | 2601.18777 | null |
| 2026-01-28 | Common to Whom? Regional Cultural Commonsense and LLM Bias in India | Sangmitra Madhusudan et.al. | 2601.15550 | null |
| 2026-01-08 | Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection | Zhiwei Liu et.al. | 2601.05403 | null |
| 2025-12-18 | From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment | Himanshu Gharat et.al. | 2512.16532 | null |
| 2025-12-16 | PerProb: Indirectly Evaluating Memorization in Large Language Models | Yihan Liao et.al. | 2512.14600 | null |
| 2025-11-24 | A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models | Zhen Tao et.al. | 2511.21758 | null |
| 2025-10-31 | EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs | Ali Satvaty et.al. | 2511.00192 | null |
| 2025-10-27 | Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation | Kaveh Eskandari Miandoab et.al. | 2510.23921 | null |
| 2025-10-21 | Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency | Svetlana Maslenkova et.al. | 2510.18556 | null |
| 2025-10-12 | Therapeutic AI and the Hidden Risks of Over-Disclosure: An Embedded AI-Literacy Framework for Mental Health Privacy | Soraya S. Anvari et.al. | 2510.10805 | null |
(<a href=#updated-on-20260309>back to top</a>)