Papers - Image - Understanding
updated
Veagle: Advancements in Multimodal Representation Learning
Paper
• 2403.08773
• Published
• 10
mPLUG-Owl: Modularization Empowers Large Language Models with
Multimodality
Paper
• 2304.14178
• Published
• 3
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper
• 2403.12596
• Published
• 11
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper
• 2403.11703
• Published
• 17
GQA: A New Dataset for Real-World Visual Reasoning and Compositional
Question Answering
Paper
• 1902.09506
• Published
• 2
MyVLM: Personalizing VLMs for User-Specific Queries
Paper
• 2403.14599
• Published
• 17
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Paper
• 2403.14551
• Published
• 2
Prompt me a Dataset: An investigation of text-image prompting for
historical image dataset creation using foundation models
Paper
• 2309.01674
• Published
• 2
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published
• 32
RegionGPT: Towards Region Understanding Vision Language Model
Paper
• 2403.02330
• Published
• 2
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
• 2404.12803
• Published
• 30
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33