Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Problems when use onnxruntime #21264

Closed
paulocoutinhox opened this issue Jul 5, 2024 · 3 comments
Closed

[Build] Problems when use onnxruntime #21264

paulocoutinhox opened this issue Jul 5, 2024 · 3 comments
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider more info needed issues that cannot be triaged until more information is submitted by the original user

Comments

@paulocoutinhox
Copy link

Describe the issue

When i try use onnxruntime docker cuda i get these errors below.

Urgency

No response

Target platform

Linux x64

Build script

Error / output

2024-07-05 17:20:42.719797292 [E:onnxruntime:Default, provider_bridge_ort.cc:1731 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libcublas.so.11: cannot open shared object file: No such file or directory

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:456 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(onnxruntime::python::PySessionOptions&, const ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.


2024-07-05 17:20:42.999570148 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./checkpoints/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
2024-07-05 17:20:43.679676761 [E:onnxruntime:Default, provider_bridge_ort.cc:1731 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libcublas.so.11: cannot open shared object file: No such file or directory

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:456 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(onnxruntime::python::PySessionOptions&, const ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.


2024-07-05 17:20:43.695770115 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

Visual Studio Version

No response

GCC / Compiler Version

No response

@paulocoutinhox paulocoutinhox added the build build issues; typically submitted using template label Jul 5, 2024
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Jul 5, 2024
@snnn snnn added more info needed issues that cannot be triaged until more information is submitted by the original user and removed build build issues; typically submitted using template labels Jul 8, 2024
@snnn
Copy link
Member

snnn commented Jul 8, 2024

Could you please be more specific, which docker? We no longer publish docker images.

@dianyo
Copy link

dianyo commented Jul 12, 2024

Hi @snnn
I encountered the same issue when using this docker https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-12.html

And I build onnxruntime-gpu using pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

Think it should related to the wheel's link lib version?

Here's the ldd

ldd libonnxruntime_providers_cuda.so
        linux-vdso.so.1 (0x00007ffe92257000)
        libcublasLt.so.11 => not found
        libcublas.so.11 => not found
        libcudnn.so.8 => /lib/x86_64-linux-gnu/libcudnn.so.8 (0x00007f3d45200000)
        libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007f3d3ec00000)
        libcufft.so.10 => not found
        libcudart.so.11.0 => not found
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3d4556d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3d45568000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3d45563000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3d3e9d4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3d4547a000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3d4545a000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3d3e7ac000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3d68769000)

And here are the two of the missing libs

ldconfig -p | grep libcublasLt
       libcublasLt.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
       libcublasLt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so
ldconfig -p | grep libcufft.so
        libcufft.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.11
        libcufft.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so
@snnn
Copy link
Member

snnn commented Jul 12, 2024

Dup with #20944 .

@snnn snnn closed this as completed Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider more info needed issues that cannot be triaged until more information is submitted by the original user
3 participants