DreamVu Publishes PRISM: A Multi-View Retail Video Dataset for Embodied AI Research

April 07, 2026 at 09:58 AM EDT

270,000-sample dataset covering spatial, physical, and embodied action reasoning reduces error rates by 66.6% on 20 capability probes; 100K open subset and fine-tuned model weights released on Hugging Face

DreamVu today released PRISM, a 270,000-sample multi-view video dataset collected across five real supermarkets for training and evaluating vision-language models on embodied AI tasks. Fine-tuning on PRISM reduces average error rates by 66.6% and cuts embodied reasoning errors by a factor of five compared to general-purpose baselines, across 20 capability probes evaluated in the accompanying paper.

Existing training datasets typically address spatial, physical, or action-level reasoning in isolation. PRISM covers all three in a single deployment domain, captured from both egocentric (worker-worn) and wide-angle 360° overhead cameras. Annotations were generated using LLM-produced chain-of-thought reasoning; the paper finds this format produces larger accuracy gains than template-based labeling, particularly on spatial and causal tasks. Fourteen of the 20 capability probes are not in any prior publicly available AI training corpus — the first dataset to cover all three reasoning dimensions simultaneously in a real deployment environment.

A data-scaling analysis shows that 60% of the corpus (162,000 samples) achieves 87.7% average accuracy — within 1.2 percentage points of the full-dataset ceiling — meaning strong results are attainable without training on the full corpus. Mixing egocentric and exocentric data improves cross-view performance without degrading egocentric task accuracy; the two camera perspectives are complementary rather than competitive.

“The core finding is that domain-specific fine-tuning on data covering spatial, physical, and action reasoning together produces gains that general-corpus scaling does not. We’re releasing the dataset and model weights so the research community can build on it.”
— Rajat Aggarwal, Co-Founder and CEO, DreamVu

The paper is at dreamvu.ai/prism (arXiv forthcoming). The 100,000-sample open subset and fine-tuned model weights (Cosmos-Reason2-2B-Retail-Grocery-EgoExo) are on Hugging Face at huggingface.co/datasets/DreamVu/PRISM-100K. The full 270,000-sample corpus is available under commercial license at sales@dreamvu.ai.

About DreamVu: DreamVu is a physical AI data infrastructure company. Its proprietary ALIA 360° omnidirectional camera system and multi-view capture infrastructure are used to build training datasets for embodied AI systems in retail, logistics, healthcare, and industrial environments. DreamVu is headquartered in Philadelphia, PA, with R&D in Hyderabad, India, and is a member of the NVIDIA Inception program.

View source version on businesswire.com: https://www.businesswire.com/news/home/20260407986545/en/

The core finding is that domain-specific fine-tuning on data covering spatial, physical, and action reasoning together produces gains that general-corpus scaling does not.

Contacts

Media Contact Sanju Pillai, sanju@dreamvu.ai

Symbol	Price	Change (%)
AMZN	221.25	+0.00 (0.00%)
AAPL	258.90	+0.00 (0.00%)
AMD	231.82	+0.00 (0.00%)
BAC	51.88	+0.00 (0.00%)
GOOG	314.74	+0.00 (0.00%)
META	612.42	+0.00 (0.00%)
MSFT	374.33	+0.00 (0.00%)
NVDA	182.08	+0.00 (0.00%)
ORCL	143.66	+0.00 (0.00%)
TSLA	343.25	+0.00 (0.00%)

Symbol

Price

Change (%)

AMZN

221.25

+0.00 (0.00%)

AAPL

258.90

+0.00 (0.00%)

AMD

231.82

+0.00 (0.00%)

BAC

51.88

+0.00 (0.00%)

GOOG

314.74

+0.00 (0.00%)

Latest E-Edition

Post Register

DreamVu Publishes PRISM: A Multi-View Retail Video Dataset for Embodied AI Research

Contacts

More News

Recent Quotes

News

Submissions

Contact Us

Services

Footer Offer Promo

Latest E-Edition

Post Register

Log In Using Your Account

DreamVu Publishes PRISM: A Multi-View Retail Video Dataset for Embodied AI Research

Contacts

More News

Recent Quotes

News

Submissions

Contact Us

Services

Footer Offer Promo