Hierarchical Visual Working Memory: Reducing Natural Scene Parsing from NP-Complete to Polynomial-Time Complexit

Rakesh Sengupta

0 evaluations Published on Aug 26, 2025

This article on Sciety

Abstract

Parsing natural scenes into constituent objects by binding multiple visual features is, in its most general formulation, combinatorially intractable. Here we present a theoretical and simulation study showing how a biologically grounded, hierarchical visual working memory (VWM) architecture can render the effective computational cost tractable. Central to our framework is the notion that VWM stores \emph{attentional samples} formed via feedforward processing and top-down recurrent refinement (Selective Tuning framework). We implement VWM units as Cohen–Grossberg (leaky-competitive) neurons and prove four interlocking results: (i) a structural complexity bound that yields $O(N)$ neurons and connections under pyramidal reduction and attenuated feedback; (ii) a contraction-based dynamic convergence guarantee giving feedforward stabilization in $O(L\log(1/\varepsilon))$ time; (iii) an interference-limited SNR capacity bound that quantifies how $p$-lattice clarity and cross-talk constrain resolvable memory samples; and (iv) a task-driven retrieval complexity bound that reduces scene-parsing cost to polynomial time under chunking and hierarchical pooling. Simulations calibrated to biologically plausible parameters validate the analytic bounds and expose a capacity landscape governed primarily by clarity and cross-talk. The model accounts for classic VWM phenomena (limited effective capacity despite large codebooks) and makes testable predictions about how attentional depth, receptive-field overlap, and neuromodulatory gain affect capacity and speed. These results bridge computational complexity theory and neurobiology, suggesting principled mechanisms by which the brain attains rapid, robust scene parsing.

Related articles are currently not available for this article.