Hugging Face

Hugging Face

Software Development

The AI community building the future.

About us

The AI community building the future.

Website
https://huggingface.co
Industry
Software Development
Company size
51-200 employees
Type
Privately Held
Founded
2016
Specialties
machine learning, natural language processing, and deep learning

Products

Locations

Employees at Hugging Face

Updates

  • Hugging Face reposted this

    View profile for Vaibhav Srivastav, graphic

    GPU poor @ Hugging Face

    Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there! 🔥 Bonus: It comes with a more permissive license, which allows one to train other LLMs on its high-quality outputs 🐐 Some important facts: > Multilingual - English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. > MMLU - 405B (85.2), 70B (79.3) & 8B (66.7) > Trained on 15 Trillion tokens + 25M synthetically generated outputs. > Pre-training cut-off date of December 2023 > Same architecture as Llama 3 with GQA > Used a massive 39.3 Million GPU hours (16K H100s for 405B) > 128K context ⚡ > Excels at Code output tasks, too! > Release Prompt Guard - BERT-based classifier to detect jailbreaks, malicious code, etc > Llama Guard 8B w/ 128K context for securing prompts across a series of topics How much GPU VRAM do you need to run these? 405B - 810 GB in fp/bf16, 405 GB in fp8/ int8, 203 GB in int4 70B - 140 GB in fp/bf16, 70 GB in fp8/ int8, 35 GB in int4 8B - 16 GB in fp/bf16, 8 GB in fp8/ int8 & 4 GB in int4 In addition, we provide a series of Quants ready to deploy: AWQ, Bitsandbytes, and GPTQ. These allow you to run 405B in as little as 4 x A100 (80GB) through TGI or VLLM. 🔥 Wait, it improves; we also provide unlimited access to HF Pro users via our deployed Inference Endpoint! Want to learn more? We wrote a detailed blog post on it 🦙 Kudos to AI at Meta for believing in open source and science! It has been fun collaborating! 🤗

    • No alternative text description for this image
  • Hugging Face reposted this

    View organization page for Gradio, graphic

    24,288 followers

    🚀 Meta unveils Llama 3.1: This release changes everything!🤯 A bullet-point summary👇 of all the technical details for Meta's Llama 3.1 release that you need to know: • Llama 3.1 comes in three sizes: 8B, 70B, and 405B parameters • All models support a context length of 128K tokens • New licensing terms allow using model outputs to improve other LLMs • Models trained on over 15 trillion tokens • Instruct models trained on publicly available instruction datasets and over 25M synthetically generated examples • Models are multilingual, support 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai • Six new open LLM models released: - Meta-Llama-3.1-8B (base) - Meta-Llama-3.1-8B-Instruct (fine-tuned) - Meta-Llama-3.1-70B (base) - Meta-Llama-3.1-70B-Instruct (fine-tuned) - Meta-Llama-3.1-405B (base) - Meta-Llama-3.1-405B-Instruct (fine-tuned) • Two additional models released: - Llama Guard 3: For classifying LLM inputs and responses - Prompt Guard: A 279M parameter BERT-based classifier for detecting prompt injection and jailbreaking • Uses Grouped-Query Attention (GQA) for efficient representation • Instruct models are fine-tuned for tool calling with two built-in tools (search, mathematical reasoning with Wolfram Alpha) • Supports four conversation roles: system, user, assistant, and ipython (for tool call outputs) • Custom tool calling supported via JSON function calling • Official FP8 quantized version of Llama 3.1 405B available • AWQ and GPTQ quantized variants in INT4 also available • Memory requirements (approx~): - 8B model: 16 GB (FP16), 8 GB (FP8), 4 GB (INT4) - 70B model: 140 GB (FP16), 70 GB (FP8), 35 GB (INT4) - 405B model: 810 GB (FP16), 405 GB (FP8), 203 GB (INT4) • KV cache memory requirements (in FP16) for 128k tokens: - 8B model: 15.62 GB - 70B model: 39.06 GB - 405B model: 123.05 GB Read everything about the Meta Llama 3.1 release on the detailed report on Hugging Face Blog here https://lnkd.in/g9yTBFnv Meta Llama 3.1 model Collection on Hugging Face: https://lnkd.in/g_bVRpmp A Gradio demo for the Meta Llama 3.1 8b is hosted on Hugging Face Spaces: https://lnkd.in/gKD6BYiW

    Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

    Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

    huggingface.co

  • Hugging Face reposted this

    View profile for Philipp Schmid, graphic

    Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

    Llama 405B is here, and it comes with more than expected! 🚨 Meta Llama 3.1 comes in 3 sizes, 8B, 70B, and 405B, and speaks 8 languages! 🌍 Llama 3.1 405B matches or beats the Openai GPT-4o across many text benchmarks. New and improvements of 3.1✨: 🧮 8B, 70B & 405B versions as Instruct and Base with 128k context 🌐 Multilingual, supports 8 languages, including English, German, French, and more. 🔠 Trained on >15T Tokens & fine-tuned on 25M human and synthetic samples 📃 Commercial friendly license with allowance to use model outputs to improve other LLMs ⚖️ Quantized versions in FP8, AWQ, and GPTQ for efficient inference. 🚀 Llama 3 405B matches and beast GPT-4o on many benchmarks 🧑🏻💻 8B & 70B improved Coding and instruction, following up to 12% ⚒️ Supports Tool use and Function Calling 🤖 Llama 3.1 405B available on Hugging Face Inference API and in HuggingChat 🤗 Available on @huggingface 🔜  1-click deployments on Hugging Face, Amazon SageMaker, Google Cloud Blog: https://lnkd.in/eiRsPgDj Model Collection: https://lnkd.in/ehpTfzMq Big Kudos to Meta for releasing Llama 3.1, including 405B. This will help everyone accelerate and adopt AI more easily and faster. ❤️

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Sayak Paul, graphic

    ML @ Hugging Face 🤗

    Establishing strong automated reporting mechanisms is important to sustain a good open-source project. On the 🧨 Diffusers team, we do this by reporting: 🟢 Status of our nightly test suite 🟢 Status of the nightly Docker builds (if any) 🟢 Mirroring status of the community pipelines 🟢 Bi-weekly benchmarking The nightly test suite helps us discover any nasty bug relatively quickly. The benchmarking suite helps dive deep into any creepy if we see a slowdown in the numbers. All of these are reported to specific Slack channels with specific members to reduce the noise. Check out the workflows for more details: https://lnkd.in/g69zkwK2

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Ahsen Khaliq, graphic

    ML @ Hugging Face

    Apple presents LazyLLM Dynamic Token Pruning for Efficient Long Context LLM Inference paper page: https://buff.ly/4d9En9A The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first token. Consequently, the prefilling stage may become a bottleneck in the generation process. An open question remains whether all prompt tokens are essential for generating the first token. To answer this, we introduce a novel method, LazyLLM, that selectively computes the KV for tokens important for the next token prediction in both the prefilling and decoding stages. Contrary to static pruning approaches that prune the prompt at once, LazyLLM allows language models to dynamically select different subsets of tokens from the context in different generation steps, even though they might be pruned in previous steps. Extensive experiments on standard datasets across various tasks demonstrate that LazyLLM is a generic method that can be seamlessly integrated with existing language models to significantly accelerate the generation without fine-tuning. For instance, in the multi-document question-answering task, LazyLLM accelerates the prefilling stage of the LLama 2 7B model by 2.34x while maintaining accuracy.

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Merve Noyan, graphic

    open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

    PSA 🗣️ We kept shipping in June, here's some non-exhaustive Hugging Face Hub updates! See the deck below for how they look like, keep reading 🤗 📑 Datasets: - We've added new filters on modality, size and format - Easily check how to load dataset repositories to other formats (datasets, pandas and croissant) - You can now sort dataset repositories for dataset number of elements and also preview number of elements in the dataset 🤝 Community: - You can now open discussions at any organization (for anything that's not related to models or datasets they share) - If you already have more than one paper at Hugging Face papers, you can now submit a paper Tasks is a documentation project for everyone to start building with machine learning 📖 (at /tasks) 📚 Tasks: - We now have a task page for vision language models - We have completely renewed Feature Extraction task page to have retrieval, reranking, RAG & co - We have updated a ton of Tasks with new models, datasets and more

  • Hugging Face reposted this

    View profile for Ahsen Khaliq, graphic

    ML @ Hugging Face

    Shape of Motion 4D Reconstruction from a Single Video paper page: https://buff.ly/3S9Zroj Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.

  • Hugging Face reposted this

    View organization page for Gradio, graphic

    24,288 followers

    🔥Mistral-NeMo, a 12B LLM, launched a few hours ago has the community abuzz! 👇Understand nuances about model's capabilities that were not covered in the release blog. We also cover the community's initial reactions. - Trained jointly by Mistral AI and NVIDIA. Mistral NeMo has 50% more parameters (12B) compared to Llama 3 (8B). - Open-source model with Apache 2.0 license 🎉 - 12B parameter count puts it between smaller models (7-8B) and larger ones (30-70B), potentially offering a good balance of performance and resource requirements. - Trained with "quantization awareness," allowing for FP8 inference without performance loss. This approach appears forward-thinking, potentially allowing for better performance when quantized compared to models without quantization-aware training. - VRAM required to run: Model would need about 12GB of VRAM at 8-bit precision, or 6GB at 4-bit precision (not counting context). - Can potentially run on consumer GPUs with 16GB VRAM, and possibly on 12GB cards with quantization. The model seems to be designed to fit on NVIDIA L40S, GeForce RTX 4090, or RTX 4500 GPUs.🤔 - 128k Context Window available - However, using full context size could significantly increase memory requirements, might not be practical for all usecases. - This large context window (128K) is rare among models of this size. This makes Mistral-Nemo potentially valuable for tasks requiring long-range understanding. - Mistral Nemo is a joint release with Nvidia: Model was trained using 3,072 H100 (80GB) -- You can see significant computational resources have been used. - Multilingual: Trained on multiple languages. Benchmarks for non-English languages look particularly strong. - Tokenizer: New Tekken tokenizer based on tiktoken (by OpenAI), which uses byte-pair encoding - llama.cpp compatibility: Not yet out-of-the-box; however, a PR is in motion, might take couple days. This might potentially delay a widespread adoption for the model. - Released same day as GPT-4o Mini 😉-- We are excited to see how these two would compete in Lmsys leaderboard (a @Gradio-built leaderboard and Arena)! - Fine-tuning: As per the community, Mistral-nemo seems to be more suitable for fine-tuning compared to Llama 3. - Temperature used: The model reportedly (HN/Reddit comments) requires lower temperature settings (around 0.3) compared to previous Mistral models, which might affect its behavior in various applications. Useful to know if you were planning for a drop-in replacement for mistral models. - Potential: Could be particularly useful for tasks like coding assistance, creative writing, and role-playing. - Base and Instruct Models on Hugging Face: 1. Mistral-Nemo-Instruct-2407: https://lnkd.in/ek6DHuZD  2. Mistral-Nemo-Base-2407: https://lnkd.in/gRdzezbr Gradio chatbot demo on 🤗Spaces: https://lnkd.in/g_fUTTF6

    mistralai/Mistral-Nemo-Instruct-2407 · Hugging Face

    mistralai/Mistral-Nemo-Instruct-2407 · Hugging Face

    huggingface.co

  • Hugging Face reposted this

    View profile for Merve Noyan, graphic

    open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

    Chameleon 🦎 by Meta is now available in Hugging Face transformers 😍 A vision language model that comes in 7B and 34B sizes 🤩 But what makes this model so special?  Demo and more in comments, keep reading ⥥ Chameleon is a unique model: it attempts to scale early fusion 🤨 But what is early fusion? Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM) Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation 😏 Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO) This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use. One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks) Thanks for reading!

  • Hugging Face reposted this

    View profile for Rick Lamers, graphic

    AI Researcher/Engineer

    Hugging Face has made it seriously easy to deploy Gradio apps through the use of Spaces, would recommend everyone to give it a shot!

Similar pages

Browse jobs

Funding

Hugging Face 7 total rounds

Last Round

Series D
See more info on crunchbase