Baseten

Baseten

Software Development

San Francisco, CA 3,834 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    3,834 followers

    We’re thrilled to introduce Chains, a framework for building multi-component AI workflows on Baseten! ⛓️ 🎉 Chains enables users to build complex workflows as modular services in simple Python code—with optimal scaling for each component. Read our announcement blog to learn more: https://lnkd.in/eHsqG4yV After working with AI builders at companies like Patreon, Descript, and many others, we saw the increasing need to expand the capabilities of AI infrastructure and model deployments for multi-component workflows. Our customers found that: 🫠 They were often writing messy scripts to coordinate inference across many models 🫠 They were paying too much for hardware by not separating CPU workloads from GPU ones 🫠 They couldn’t quickly test locally, which drastically slowed down development Other solutions either rely on DAGs or use bidirectional API calls to make multi-model inference possible. These approaches are too slow, inefficient, and expensive at scale. They also fail to enable heterogeneous GPU/CPU resourcing across models and code, leading to overprovisioning and unnecessary compute costs. We built Chains in response to customer needs to deliver reliable and high-performance inference for workflows using multiple models or processing steps. Using Chains, you can: ✅ Assemble distinct computational steps (or models) into a holistic workflow ✅ Allocate and scale resources independently for each component ✅ View critical performance metrics across your entire Chain Chains is a game-changer for anyone using or building compound AI systems. We’ve seen processing times halve and GPU utilization improve 6x. With built-in type checking, blazing-fast deployments, and simplified pipeline orchestration, Chains is our latest step in enhancing the capabilities and efficiency of AI infrastructure! 🚀 Try Chains today with $30 free credits and tell us what you think! https://lnkd.in/ecjknaZM

    Introducing Baseten Chains

    Introducing Baseten Chains

    baseten.co

  • View organization page for Baseten, graphic

    3,834 followers

    A10 or A100: which GPU should you use? 🤔 🆚 Philip Kiely compares the two: https://lnkd.in/gJgTGEUP NVIDIA’s A10 and A100 GPUs power all kinds of model inference: from LLMs to audio transcription to image generation. 🖼️ A100s are a clear winner for certain demanding ML inference tasks—but you can also leverage multiple A10s in a single instance to save on cost, while meeting the needs of many workloads. 🧠

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,834 followers

    📌 Pinning ML model revisions and open-source Python packages is often a best practice. It can: ✅ Make your model's performance more reliable ✅ Prevent against different failure modes ✅ Help secure against malicious code injection That said, it's not always necessary to pin model revisions—it depends on your use case, and can pose some disadvantages, too. 👀 Check out Philip Kiely's post to learn when pinning model versions is recommended—and when it's not: https://lnkd.in/eEF2XCNw

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,834 followers

    We’ve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class! 🏆 Now we’ve made it even better. With our new "build commands" feature, you can easily run custom nodes and model checkpoints with ComfyUI on powerful GPUs. 💪🏻 🚀 Check out Het Trivedi and Rachel Rapp's post to see how: https://lnkd.in/ejDJMv7Q In case you didn't know: you can (and always could) launch ComfyUI with Truss as a callable API endpoint that you can share. Now your models spin up even faster. We’re proud to enable users with the full power of ComfyUI, while making it shareable and blazing fast. If you try it out let us know how it goes, or show us what you build! 🎉

  • View organization page for Baseten, graphic

    3,834 followers

    Prompt: write a stand-up comedy routine about being an LLM Llama 405B: I don't have feelings unless you count the existential dread of knowing I'll be replaced by a newer model in six months.

  • View organization page for Baseten, graphic

    3,834 followers

    In his letter on open source AI, Mark Zuckerberg listed reasons why developers need open source models, including: 1. We need to control our own destiny and not get locked into a closed vendor. 2. We need to protect our data. 3. We need a model that is efficient and affordable to run. At Baseten, we agree that every engineering team should be able to choose any vendor, keep their data safe, and run models affordably. 🧘Control Every model deployed on Baseten uses Truss, our open-source model packaging library. Truss is agnostic to inference optimizers and serving engines, so you can use open-source tools like vLLM and TensorRT-LLM to package your model as a Docker container, which can be deployed anywhere. Here’s an implementation of Llama 3.1 405B with VLLM in less than 100 lines of Python: https://lnkd.in/gVBwssde 🔐Privacy With a shared endpoint, your prompts and responses are processed by a third party alongside every other user’s data. Baseten offers dedicated deployments for open source and custom models. On top of SOC 2 Type II certification and HIPAA compliance, we offer self-hosted model deployments so that you can run models like Llama from the comfort and security of your own VPC. 💰Cost Baseten charges per minute of GPU use. With available commit discounts, we’re a highly cost-competitive platform for large-scale deployments. With autoscaling dedicated deployments, you pay a cost that you control, can decrease with iterative optimization work, and is driven by fundamental prices for compute and storage, not VC subsidies or loss-leading market capture plays. Build with customizable, private, affordable inference: - Deploy Llama 3.1 8B: https://lnkd.in/gXQqhNzj  - Deploy Llama 3.1 70B: https://lnkd.in/gVbPyCuA  - Contact us for Llama 3.1 405B: DM or support@baseten.co

  • View organization page for Baseten, graphic

    3,834 followers

    It's a beautiful day for open source AI. For years, we've seen the gap narrow between open and proprietary models. Today: Gap. Closed. Congrats to the entire team at AI at Meta! Want to deploy Llama 3.1? We'll be posting optimized engines throughout the day, links in comments!

    View organization page for AI at Meta, graphic

    831,192 followers

    Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new models including our long awaited 405B. Llama 3.1 delivers stronger reasoning, a larger 128K context window & improved support for 8 languages including English — among other improvements. Details in the full announcement ➡️ https://go.fb.me/hvuqhb Download the models ➡️ https://go.fb.me/11ffl7 We evaluated performance across 150+ benchmark datasets across a range of languages — in addition to extensive human evaluations in real-world scenarios. Trained on >16K NVIDIA H100 GPUs, Llama 3.1 405B is the industry leading open source foundation model and delivers state-of-the-art capabilities that rival the best closed source models in general knowledge, steerability, math, tool use and multilingual translation. We’ve also updated our license to allow developers to use the outputs from Llama models — including the 405B — to improve other models for the first time. We’re excited about how synthetic data generation and model distillation workflows with Llama will help to advance the state of AI. As Mark Zuckerberg shared this morning, we have a strong belief that open source will ensure that more people around the world have access to the benefits and opportunities of AI and that’s why we continue to take steps on the path for open source AI to become the industry standard. With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest Llama models will unlock across all levels of the AI community.

  • View organization page for Baseten, graphic

    3,834 followers

    If you're doing LLM inference with TensorRT-LLM, you can now swap LoRAs on Baseten! 🚀 😎 💪 This means you can serve thousands of fine-tuned LLM variants from a single GPU—especially important for builders leveraging fine-tuned models adjusted per user. 💡 Check out Philip Kiely's video to learn more: https://lnkd.in/eEaMtqC6 💡 Or read how on our blog: https://lnkd.in/eGJZGZFd Between the cost and model management, serving thousands of fine-tuned models individually is infeasible. With LoRA swapping, you don’t need to! Now you can use a different fine-tuned LoRA per request in a batch without any meaningful effect on latency. 🧠

Similar pages

Browse jobs

Funding