Best open-source model for parsing messy PDFs on 16GB RAM (CPU only)

John6666 · October 18, 2025, 6:46am

Probably the repo_id format is wrong… Also, since many GGUF models aren’t gated, it’s faster to search for them directly.

A) What’s available right now

Mistral 7B: public.

Use mistralai/Mistral-7B-v0.3 (base) or mistralai/Mistral-7B-Instruct-v0.3 (chat). (Hugging Face)

“LLaMA 3B”: this is Llama-3.2-3B from Meta. It is gated.

Use meta-llama/Llama-3.2-3B or meta-llama/Llama-3.2-3B-Instruct. (Hugging Face)

Why you saw 404s:

Wrong or incomplete repo ID (IDs are owner/repo, case-sensitive). (Hugging Face Forums)
Repo is private or gated and you’re not approved yet. HF returns 404 in that case. (Hugging Face)

B) How to request access for gated models (e.g., Llama 3.2)

Open the model page while logged in and click Request/Accept access. (Hugging Face)
Gated access is controlled on the Hub; owners can require acceptance and review. (Hugging Face)

C) Minimal commands that work

Authenticate and verify:


# login

hf auth login # or: huggingface-cli login

# verify

hf auth whoami # or: huggingface-cli whoami

(Hugging Face)

Download models:


# Public Mistral 7B Instruct

hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ./mistral-7b-instruct

# Public Mistral 7B base

hf download mistralai/Mistral-7B-v0.3 --local-dir ./mistral-7b-base

# Gated Llama 3.2 3B (accept license on the repo page first)

hf download meta-llama/Llama-3.2-3B-Instruct --local-dir ./llama-3.2-3b-instruct

(Hugging Face)

Python alternative:


from huggingface_hub import snapshot_download

snapshot_download("mistralai/Mistral-7B-Instruct-v0.3", local_dir="./mistral-7b-instruct")

(Hugging Face)

D) CPU-only? Use GGUF + llama.cpp

This avoids big PyTorch installs and runs fully on CPU.

Install llama.cpp and run a GGUF directly from HF:


# build

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp

cmake -B build && cmake --build build --config Release

# run a GGUF in one line (example: Llama-3.2-3B Instruct, quantized)

./build/bin/llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M -p "Hello"

You can replace the -hf target with any GGUF repo:file. (GitHub)

E) Good, small, public CPU models (ready today)

Phi-3 Mini 4K Instruct (3.8B). Also has an official GGUF repo. (Hugging Face)
Qwen2.5-3B-Instruct (3B). Strong small model. (Hugging Face)
SmolLM3-3B. Fully open 3B; ONNX and GGUF variants exist from the community. (Hugging Face)
TinyLlama-1.1B-Chat GGUF, for very low-resource CPU. (Hugging Face)
Zephyr-7B-beta GGUF, if you want a 7B chat model and can tolerate slower CPU speed. (Hugging Face)

F) Quick 404 checklist

Always use the exact owner/repo ID, correct casing. (Hugging Face Forums)
Open the repo page in your browser to confirm visibility or gating. (Hugging Face)
Stay authenticated before downloading (hf auth whoami). (Hugging Face)
If it’s truly private, owners must grant you access; otherwise you’ll keep seeing 404. (Hugging Face Forums)

G) Note on Python 3.13

3.13 support is improving, but some stacks still lag. PyTorch now lists 3.13 support on its install page, yet issues around dependencies have existed; if you hit install errors, use Python 3.12. (PyTorch)

Short “do this now” script (Ubuntu, CPU)


# 1) Auth

pipx install huggingface_hub || pip install -U huggingface_hub

hf auth login && hf auth whoami

# 2) Download public Mistral 7B Instruct

hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ./mistral-7b-instruct

# 3) If you want Llama-3.2-3B, accept access in browser, then:

hf download meta-llama/Llama-3.2-3B-Instruct --local-dir ./llama-3.2-3b-instruct

# 4) CPU inference via llama.cpp + GGUF

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp

cmake -B build && cmake --build build --config Release

./build/bin/llama-cli -hf microsoft/Phi-3-mini-4k-instruct-gguf:Q4_K_M -p "Hello"

(Hugging Face)

Curated references

Model cards: Mistral-7B v0.3, Mistral-7B-Instruct-v0.3, Llama-3.2-3B. Useful to verify availability and licenses. (Hugging Face)
HF docs: hf download, downloads API, gated-model rules. Practical CLI and why 404 appears for private/gated. (Hugging Face)
llama.cpp: repo and GGUF-on-HF usage page. CPU-first, simple runs with -hf. (GitHub)
Alternative small models: Phi-3 Mini 4K, Qwen2.5-3B, SmolLM3-3B, TinyLlama, Zephyr-7B. Good CPU options. (Hugging Face)

Topic		Replies	Views
Question: Which open-source model is best for pruning with 32GB RAM? Models	3	120	October 15, 2025
Issues with Pruning and Quantization of Hugging Face LLMs on CPU Models	2	27	October 24, 2025
A model to extract email text body from html code 🤗Transformers	4	729	May 2, 2024
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	547	December 3, 2023
Find LLM to run on single gpu with only 8 GB ram Models	10	8818	March 22, 2024

Best open-source model for parsing messy PDFs on 16GB RAM (CPU only)

A) What’s available right now

B) How to request access for gated models (e.g., Llama 3.2)

C) Minimal commands that work

D) CPU-only? Use GGUF + llama.cpp

E) Good, small, public CPU models (ready today)

F) Quick 404 checklist

G) Note on Python 3.13

Short “do this now” script (Ubuntu, CPU)

Curated references

Related topics