img1

GeoCAD-LLM: CAD Sequence Generation via Multimodal LLMs with Equivariant Geometric Features🛠️

GeoCAD-LLM_4B🛠️

  • Base_model: Qwen3-4B-Instruct
  • Max sequence length: 8,192
  • Epoch: 2
  • Learning rate: 1e-4
  • Batch size: 128
  • This model specialized for text-to-CAD. However, it also supports multi-modality.

GeoCAD-LLM Contributions🔥

img2

  • State-of-the-art Performace🏆 in Text2CAD datasets. (as shown in below Tables)
  • Multimodal CAD Generation🌐: Both text-to-CAD and pc-text-to-CAD.
  • GeoCAD-LLM directly generate CAD vector sequence as natural language.
  • Novel Two Stage Training Pipeline🧭: In stage1, training semantic geometry alignment. In stage2, training fine-grained geometry. Especially, we direct levearge E(3)-equivariant features for geomtry-consistent supervision, inherently ensuring geometric feature consistency regardless of input orientation.
  • Apply Point Cloud Dropout (PCD) technique🧶: PCD mitigates over-reliance on geometric inputs and improves multimodal generalization. Also, it is a critical training technique for multimodal CAD generation.

Performace (text-to-CAD & pc-text-to-CAD)🔥

img3

Qualitative Results

Please check our paper and supplementary materials.🤗

Bibtex🤗

(TODO)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support