Skip to main content

VLA Model Architectures

Vision-Language-Action (VLA) models represent a new class of foundation models that connect vision, language, and action modalities in robotics.

Key Architectures

  • End-to-end learning approaches
  • Modular architectures with specialized components
  • Transformer-based architectures
  • Multimodal fusion techniques