Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is fp8 quantization gemm supported? #82

Closed
sleepwalker2017 opened this issue Jul 9, 2024 · 2 comments
Closed

Is fp8 quantization gemm supported? #82

sleepwalker2017 opened this issue Jul 9, 2024 · 2 comments

Comments

@sleepwalker2017
Copy link

both input A and B are in fp8, and the output is fp16.

Or a fused one, input A with fp16 dtype and A scale with float32 dtype, B in fp8, the kernel quantize A into fp8 and then invoke fp8 gemm to get fp16 output.

Are these supported? And if yes, is there any benchmark? thank you!

@LeiWang1999
Copy link
Contributor

@sleepwalker2017 thanks for your attention! Currently, we do not support FP8 GEMM with scaling, as FP8 GEMM typically lacks a zero point, so rescaling can be performed as an external kernel to adjust the output. If you wish to perform FP8 GEMM, please refer to https://github.com/microsoft/BitBLAS/blob/main/testing/python/operators/test_general_matmul_fp8.py.

you can also apply scaling in the input by directly editing https://github.com/microsoft/BitBLAS/blob/main/bitblas/ops/impl/matmul_dequantize_impl.py

@sleepwalker2017
Copy link
Author

@sleepwalker2017 thanks for your attention! Currently, we do not support FP8 GEMM with scaling, as FP8 GEMM typically lacks a zero point, so rescaling can be performed as an external kernel to adjust the output. If you wish to perform FP8 GEMM, please refer to https://github.com/microsoft/BitBLAS/blob/main/testing/python/operators/test_general_matmul_fp8.py.

you can also apply scaling in the input by directly editing https://github.com/microsoft/BitBLAS/blob/main/bitblas/ops/impl/matmul_dequantize_impl.py

Thank you for the quick reply! I'll try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants