Research Feed

AI Research Papers

80 papers tracked across 1 categories

Tracked

Last 7 Days

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao +4 more

Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, comprising 20 verified, skill-dependent tasks across 15 sub-domains derived from a real-world skill taxonomy , evaluated at three levels: skill quality, execution trajectory, and task out

View Abstract PDF

cs.LGApr 21, 2026

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni +3

Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution of hidden states as a high-dimensional geometric trajectory and propose using the Ramer-Douglas-Peucker (RDP) algorithm, a parameter-free and training-free polygon simplification me

AI Research Papers

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

Micro Language Models Enable Instant Responses

CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search

Accurate and scalable exchange-correlation with deep learning

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

TEMPO: Scaling Test-time Training for Large Reasoning Models

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction

PlayCoder: Making LLM-Generated GUI Code Playable

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Evaluation-driven Scaling for Scientific Discovery

AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation

Mitigating Multimodal Hallucination via Phase-wise Self-reward

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Dual-View Training for Instruction-Following Information Retrieval

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

River-LLM: Large Language Model Seamless Exit Based on KV Share

MARCO: Navigating the Unseen Space of Semantic Correspondence

On the Reliability of Computer Use Agents

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Speculative Decoding for Autoregressive Video Generation

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

UniMesh: Unifying 3D Mesh Understanding and Generation

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

The Continuity Layer: Why Intelligence Needs an Architecture for What It Carries Forward

Understanding and Enforcing Weight Disentanglement in Task Arithmetic

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

VoxMind: An End-to-End Agentic Spoken Dialogue System

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

AgentSPEX: An Agent SPecification and EXecution Language

Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Predicting integers from continuous parameters

Significance and Stability Analysis of Gene-Environment Interaction using RGxEStat

SPRITE: From Static Mockups to Engine-Ready Game UI

A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data

Online Information Acquisition: Hiring Multiple Agents

MINDE: Mutual Information Neural Diffusion Estimation

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Hybrid Directional Graph Neural Network for Molecules

CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning

Leveraging Optimization for Adaptive Attacks on Image Watermarks

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Towards image compression with perfect realism at ultra-low bitrates

Compressed Context Memory for Online Language Model Interaction

Dual-Encoders for Extreme Multi-label Classification

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies

Decodable and Sample Invariant Continuous Object Encoder

Graphical Multioutput Gaussian Process with Attention

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Geometry-Aware Projective Mapping for Unbounded Neural Radiance Fields

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Multi-Scale Representations by Varying Window Attention for Semantic Segmentation

Guess & Sketch: Language Model Guided Transpilation

Implicit Neural Representations and the Algebra of Complex Wavelets

S2AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Graph Transformers on EHRs: Better Representation Improves Downstream Performance

Conformal Prediction via Regression-as-Classification

Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning

Distinguished In Uniform: Self-Attention Vs. Virtual Nodes

Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness

Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models

Unlocking the Power of Representations in Long-term Novelty-based Exploration