Baseten

Software Development

San Francisco, CA 3,834 followers

Fast, scalable inference in our cloud or yours

See jobs Follow

View all 43 employees

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools and software engineering

Products

Baseten

Machine Learning Software

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten

3,834 followers
1mo
Report this post
We’re thrilled to introduce Chains, a framework for building multi-component AI workflows on Baseten! ⛓️ 🎉 Chains enables users to build complex workflows as modular services in simple Python code—with optimal scaling for each component. Read our announcement blog to learn more: https://lnkd.in/eHsqG4yV After working with AI builders at companies like Patreon, Descript, and many others, we saw the increasing need to expand the capabilities of AI infrastructure and model deployments for multi-component workflows. Our customers found that: 🫠 They were often writing messy scripts to coordinate inference across many models 🫠 They were paying too much for hardware by not separating CPU workloads from GPU ones 🫠 They couldn’t quickly test locally, which drastically slowed down development Other solutions either rely on DAGs or use bidirectional API calls to make multi-model inference possible. These approaches are too slow, inefficient, and expensive at scale. They also fail to enable heterogeneous GPU/CPU resourcing across models and code, leading to overprovisioning and unnecessary compute costs. We built Chains in response to customer needs to deliver reliable and high-performance inference for workflows using multiple models or processing steps. Using Chains, you can: ✅ Assemble distinct computational steps (or models) into a holistic workflow ✅ Allocate and scale resources independently for each component ✅ View critical performance metrics across your entire Chain Chains is a game-changer for anyone using or building compound AI systems. We’ve seen processing times halve and GPU utilization improve 6x. With built-in type checking, blazing-fast deployments, and simplified pipeline orchestration, Chains is our latest step in enhancing the capabilities and efficiency of AI infrastructure! 🚀 Try Chains today with $30 free credits and tell us what you think! https://lnkd.in/ecjknaZM

Introducing Baseten Chains

baseten.co

12 Comments

Like Comment Share
Baseten

3,834 followers
6h
Report this post
Come build with us. 🛠 🧠 We just opened two new roles for: 🚀 Field and Event Marketing Leaders https://lnkd.in/ehc7izet 🚀 Sales Development Representatives https://lnkd.in/ekdgcbAZ We're also hiring for 6 different positions on our engineering teams! ⚙ 👀 Check out the full list: https://lnkd.in/eXw23fNq Apply directly 👆 or reach out with any questions!
Like Comment Share
Baseten

3,834 followers
1d
Report this post
A10 or A100: which GPU should you use? 🤔 🆚 Philip Kiely compares the two: https://lnkd.in/gJgTGEUP NVIDIA’s A10 and A100 GPUs power all kinds of model inference: from LLMs to audio transcription to image generation. 🖼️ A100s are a clear winner for certain demanding ML inference tasks—but you can also leverage multiple A10s in a single instance to save on cost, while meeting the needs of many workloads. 🧠
Like Comment Share
Baseten

3,834 followers
2d
Report this post
📌 Pinning ML model revisions and open-source Python packages is often a best practice. It can: ✅ Make your model's performance more reliable ✅ Prevent against different failure modes ✅ Help secure against malicious code injection That said, it's not always necessary to pin model revisions—it depends on your use case, and can pose some disadvantages, too. 👀 Check out Philip Kiely's post to learn when pinning model versions is recommended—and when it's not: https://lnkd.in/eEF2XCNw
Like Comment Share
Baseten

3,834 followers
3d Edited
Report this post
We’ve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class! 🏆 Now we’ve made it even better. With our new "build commands" feature, you can easily run custom nodes and model checkpoints with ComfyUI on powerful GPUs. 💪🏻 🚀 Check out Het Trivedi and Rachel Rapp's post to see how: https://lnkd.in/ejDJMv7Q In case you didn't know: you can (and always could) launch ComfyUI with Truss as a callable API endpoint that you can share. Now your models spin up even faster. We’re proud to enable users with the full power of ComfyUI, while making it shareable and blazing fast. If you try it out let us know how it goes, or show us what you build! 🎉

1 Comment

Like Comment Share
Baseten

3,834 followers
4d
Report this post
🚨 Three weeks away! Join our live webinar + Q&A with the lead engineers behind asynchronous inference on Baseten, Samiksha Pal and Helen Yang, and learn why you need it to: 🧠 Enable request prioritization 🧠 Leverage idle compute for cost efficiency 🧠 Make your production workloads more resilient to different inference failures ❗Save your spot: https://lnkd.in/e6KaYi5G
Like Comment Share
Baseten

3,834 followers
4d
Report this post
Prompt: write a stand-up comedy routine about being an LLM Llama 405B: I don't have feelings unless you count the existential dread of knowing I'll be replaced by a newer model in six months.

Like Comment Share
Baseten

3,834 followers
5d
Report this post
In his letter on open source AI, Mark Zuckerberg listed reasons why developers need open source models, including: 1. We need to control our own destiny and not get locked into a closed vendor. 2. We need to protect our data. 3. We need a model that is efficient and affordable to run. At Baseten, we agree that every engineering team should be able to choose any vendor, keep their data safe, and run models affordably. 🧘Control Every model deployed on Baseten uses Truss, our open-source model packaging library. Truss is agnostic to inference optimizers and serving engines, so you can use open-source tools like vLLM and TensorRT-LLM to package your model as a Docker container, which can be deployed anywhere. Here’s an implementation of Llama 3.1 405B with VLLM in less than 100 lines of Python: https://lnkd.in/gVBwssde 🔐Privacy With a shared endpoint, your prompts and responses are processed by a third party alongside every other user’s data. Baseten offers dedicated deployments for open source and custom models. On top of SOC 2 Type II certification and HIPAA compliance, we offer self-hosted model deployments so that you can run models like Llama from the comfort and security of your own VPC. 💰Cost Baseten charges per minute of GPU use. With available commit discounts, we’re a highly cost-competitive platform for large-scale deployments. With autoscaling dedicated deployments, you pay a cost that you control, can decrease with iterative optimization work, and is driven by fundamental prices for compute and storage, not VC subsidies or loss-leading market capture plays. Build with customizable, private, affordable inference: - Deploy Llama 3.1 8B: https://lnkd.in/gXQqhNzj - Deploy Llama 3.1 70B: https://lnkd.in/gVbPyCuA - Contact us for Llama 3.1 405B: DM or support@baseten.co

Like Comment Share
Baseten

3,834 followers
6d
Report this post
It's a beautiful day for open source AI. For years, we've seen the gap narrow between open and proprietary models. Today: Gap. Closed. Congrats to the entire team at AI at Meta! Want to deploy Llama 3.1? We'll be posting optimized engines throughout the day, links in comments!

AI at Meta

831,192 followers
6d

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new models including our long awaited 405B. Llama 3.1 delivers stronger reasoning, a larger 128K context window & improved support for 8 languages including English — among other improvements. Details in the full announcement ➡️ https://go.fb.me/hvuqhb Download the models ➡️ https://go.fb.me/11ffl7 We evaluated performance across 150+ benchmark datasets across a range of languages — in addition to extensive human evaluations in real-world scenarios. Trained on >16K NVIDIA H100 GPUs, Llama 3.1 405B is the industry leading open source foundation model and delivers state-of-the-art capabilities that rival the best closed source models in general knowledge, steerability, math, tool use and multilingual translation. We’ve also updated our license to allow developers to use the outputs from Llama models — including the 405B — to improve other models for the first time. We’re excited about how synthetic data generation and model distillation workflows with Llama will help to advance the state of AI. As Mark Zuckerberg shared this morning, we have a strong belief that open source will ensure that more people around the world have access to the benefits and opportunities of AI and that’s why we continue to take steps on the path for open source AI to become the industry standard. With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest Llama models will unlock across all levels of the AI community.

Meet Llama 3.1: Our most capable models yet

1 Comment

Like Comment Share
Baseten

3,834 followers
6d
Report this post
If you're doing LLM inference with TensorRT-LLM, you can now swap LoRAs on Baseten! 🚀 😎 💪 This means you can serve thousands of fine-tuned LLM variants from a single GPU—especially important for builders leveraging fine-tuned models adjusted per user. 💡 Check out Philip Kiely's video to learn more: https://lnkd.in/eEaMtqC6 💡 Or read how on our blog: https://lnkd.in/eGJZGZFd Between the cost and model management, serving thousands of fine-tuned models individually is infeasible. With LoRA swapping, you don’t need to! Now you can use a different fine-tuned LoRA per request in a batch without any meaningful effect on latency. 🧠

4 Comments

Like Comment Share

Browse jobs

Funding

Baseten 4 total rounds

Last Round

Series B Apr 4, 2024

US$ 40.0M

Investors

Spark Capital IVP + 5 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 3,834 followers

Fast, scalable inference in our cloud or yours

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

William Lau

Amir Haghighat

Co-founder at Baseten

Aaron Relph

Design at Baseten

Anupreet Walia

Engineering leadership

Updates

Meet Llama 3.1: Our most capable models yet

Join now to see what you are missing

Similar pages

Doss

Conviction

Glean

Together AI

Remitly

Contrary

Anthropic

Adept

HeyGen

Modal

Browse jobs

Corporate Finance Intern jobs

Appointment Setter jobs

Data Science Specialist jobs

Sales Development Director jobs

Patent Agent jobs

Enterprise Account Executive jobs

Community Lead jobs

Vice President Finance jobs

Engineer jobs

Psychologist jobs

Scientist jobs

Senior Sales Executive jobs

Evangelist jobs

Specialist jobs

Sales Director jobs

Director jobs

Head of Sales jobs

Executive jobs

Linguist jobs

Analyst jobs

Funding