Research Analyzer
← Back ICRA 2026

Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

Weicong Ni, Tianbao Jiang, Linlin Wang

PDF

AI summary

Key figure (auto-extracted from paper)
PStar adaptively selects structured pseudocode reasoning paths based on question difficulty to drastically reduce hallucinations and outperform GPT-4V on multimodal benchmarks without training.
Vision-Language Models Hallucination Mitigation Structured Reasoning Pseudocode Planning Robotic Automation Difficulty-Aware Inference

Problem

Vision-Language Models suffer from hallucinations and rigid reasoning strategies that fail to adapt to varying task complexities, hindering their safe deployment in robotic automation.

Approach

The authors introduce PStar, a training-free framework that quantifies question complexity using a Difficulty Feature Vector and dynamically retrieves optimal pseudocode reasoning paths via A* search and hybrid similarity matching.

Key results

  • Reduces hallucination rates across multiple multimodal benchmarks
  • Achieves state-of-the-art scores of 87.1% on POPE and 68.0% on MMStar
  • Outperforms GPT-4V and larger proprietary models without additional training
  • Demonstrates high data efficiency using only 500 seed examples for path generation

Why it matters

Enables safer, more reliable deployment of vision-language models in real-world robotic systems by providing a lightweight, training-free mechanism to prevent catastrophic reasoning errors.

Abstract

Vision-Language Models (VLMs) are becoming the cornerstone of high-level reasoning for robotic automation, enabling robots to parse natural language commands and perceive their environments. However, their susceptibility to hallucinations introduces critical failures in decision-making, posing significant safety and reliability risks in physical deploy- ments. This challenge is exacerbated by the open-ended nature of real-world tasks, where questions vary vastly in difficulty and modality, demanding robust and adaptable reasoning strategies. To tackle this, we propose the Pseudocode-guided Structured Reasoning framework (PStar), which adaptively selects structured pseudocode reasoning paths to help VLMs perform flexible and step-by-step reasoning. We first design a set of abstract reasoning functions and formulate a structured pseudocode library to represent modular reasoning strategies. Crucially, we design a Difficulty Feature Vector (DFV) that allows the model to assess question complexity and adaptively choose appropriate reasoning strategies—enhancing robustness and interpretability. Extensive experiments demonstrate that PStar significantly reduces hallucination rates, achieving state- of-the-art scores of 87.1% on POPE and 68.0% on MMStar, outperforming even GPT-4V. By providing a validated mecha- nism to reduce visual-language errors, PStar offers a critical step toward deploying more trustworthy and deterministic VLMs for real-world automated systems, where such errors can lead to catastrophic outcomes.

Index terms

Multi-Modal Perception for HRI Hybrid Logical/Dynamical Planning and Verification Acceptability and Trust

Related papers