Skip to content
View Qubitium's full-sized avatar
Block or Report

Block or report Qubitium

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. ModelCloud/GPTQModel ModelCloud/GPTQModel Public

    An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).

    Python 32 9

  2. sgl-project/sglang sgl-project/sglang Public

    SGLang is yet another fast serving framework for large language models and vision language models.

    Python 3.1k 199

  3. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 23.7k 3.4k

  4. AutoGPTQ/AutoGPTQ AutoGPTQ/AutoGPTQ Public

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

    Python 4.2k 432

  5. flashinfer-ai/flashinfer flashinfer-ai/flashinfer Public

    FlashInfer: Kernel Library for LLM Serving

    Cuda 856 80

  6. Dao-AILab/flash-attention Dao-AILab/flash-attention Public

    Fast and memory-efficient exact attention

    Python 12.6k 1.1k