Hugging Face reposted this
Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there! 🔥 Bonus: It comes with a more permissive license, which allows one to train other LLMs on its high-quality outputs 🐐 Some important facts: > Multilingual - English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. > MMLU - 405B (85.2), 70B (79.3) & 8B (66.7) > Trained on 15 Trillion tokens + 25M synthetically generated outputs. > Pre-training cut-off date of December 2023 > Same architecture as Llama 3 with GQA > Used a massive 39.3 Million GPU hours (16K H100s for 405B) > 128K context ⚡ > Excels at Code output tasks, too! > Release Prompt Guard - BERT-based classifier to detect jailbreaks, malicious code, etc > Llama Guard 8B w/ 128K context for securing prompts across a series of topics How much GPU VRAM do you need to run these? 405B - 810 GB in fp/bf16, 405 GB in fp8/ int8, 203 GB in int4 70B - 140 GB in fp/bf16, 70 GB in fp8/ int8, 35 GB in int4 8B - 16 GB in fp/bf16, 8 GB in fp8/ int8 & 4 GB in int4 In addition, we provide a series of Quants ready to deploy: AWQ, Bitsandbytes, and GPTQ. These allow you to run 405B in as little as 4 x A100 (80GB) through TGI or VLLM. 🔥 Wait, it improves; we also provide unlimited access to HF Pro users via our deployed Inference Endpoint! Want to learn more? We wrote a detailed blog post on it 🦙 Kudos to AI at Meta for believing in open source and science! It has been fun collaborating! 🤗