metadata
language: en
license: mit
tags:
- multimodal
- vision-language
- captioning
Multimodal Caption Model
A model designed to generate textual descriptions from visual inputs.