AI & ML interests

Non-profit Research Team for Multimodal Large Language Model

Jiaqi-hkust 
posted an update 3 days ago
view post
Post
5684
🛰️ Introducing Awesome-Remote-Sensing-Agents: The Largest Curated Collection of Intelligent Remote Sensing Agents

We are excited to share our new repository Awesome-Remote-Sensing-Agents – a comprehensive, community-driven collection of 100+ papers at the intersection of remote sensing and intelligent agents (LLMs, VLM, multi‑agent systems, etc.).

🔗 GitHub Repository: https://github.com/PolyX-Research/Awesome-Remote-Sensing-Agents

Our repository organizes this rapidly growing field into a structured, easy‑to‑navigate resource for researchers, practitioners, and enthusiasts.

📚 What’s Inside?
We’ve carefully curated papers across 6 key application domains:
🌿 Ecological Monitoring – forest fires, biodiversity, climate science
🚨 Emergency Response – flood mapping, wildfire tracking, disaster geolocalization
⛏️ Geological Exploration – mineral mapping, lithological recognition, geologic reasoning
🌊 Marine Supervision – ocean science, autonomous surface vehicles
🌾 Precision Agriculture – crop disease detection, land use simulation
🏙️ Urban Governance – change detection, urban planning, embodied navigation

🤝 Join the Community!
We warmly welcome contributions to keep this list up‑to‑date:
📝 Add missing papers via Pull Request
🏷️ Propose new or refined categories
🔗 Report broken links or outdated entries
💬 Discuss via GitHub Issues or contact the authors
Jiaqi-hkust 
posted an update 3 months ago
view post
Post
3684
We have open-sourced Robust-R1 (AAAI 2026 Oral), a new paradigm in the field of anti-degradation and robustness enhancement for multimodal large models.

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual influence, pristine semantic reasoning chain, and conclusion. Comprehensive evaluations demonstrate state-of-the-art robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMMB, MMStar, and RealWorldQA.

We have made all of our papers, codes, data, model weights and demos fully open-source:
Paper: Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (2512.17532) (help us to upvote)
GitHub code: https://github.com/jqtangust/Robust-R1 (help us to star)
HF model: https://huggingface.co/Jiaqi-hkust/Robust-R1
HF data: Jiaqi-hkust/Robust-R1
HF Space: Jiaqi-hkust/Robust-R1

We sincerely invite everyone to give it a try.

  • 2 replies
·
Jiaqi-hkust 
posted an update about 1 year ago
view post
Post
2122
We have open-sourced Hawk (NeurIPS 2024) 🎉, one of the pioneering frameworks for open-world video anomaly understanding.

In the field of video anomaly detection, despite continuous technological advancements, existing systems still face limitations in semantic understanding of scenes and user interaction, making it challenging to effectively identify complex anomalous scenes. Additionally, the scarcity of datasets restricts the applicability of these systems in open-world scenarios.

To tackle these challenges, we developed Hawk, an open-world video understanding and anomaly detection framework. Hawk significantly enhances anomaly recognition by identifying motion information differences between anomalous and normal videos. We introduce an auxiliary consistency loss to strengthen the focus on motion modalities and establish a supervisory relationship between motion and language representations. Furthermore, we have annotated over 8,000 anomalous videos and their language descriptions and created 8,000 question-answer pairs to support effective training in diverse open-world scenarios.

Experimental results demonstrate that Hawk surpasses existing video understanding frameworks in video description generation and question-answering tasks.

We warmly invite everyone to try it out!
- Hugging Face Demo: Jiaqi-hkust/hawk
- Hugging Face Model: Jiaqi-hkust/hawk
- Hugging Face Dataset: Jiaqi-hkust/hawk
- GitHub Code: https://github.com/jqtangust/hawk

We look forward to your feedback and participation! 👏
  • 2 replies
·