Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Enable copying of GPU OrtValue to CPU #21244

Open
guigzzz opened this issue Jul 3, 2024 · 2 comments
Open

[C#] Enable copying of GPU OrtValue to CPU #21244

guigzzz opened this issue Jul 3, 2024 · 2 comments
Labels
api:CSharp issues related to the C# API

Comments

@guigzzz
Copy link

guigzzz commented Jul 3, 2024

Describe the issue

Hey guys,

I can't seem to figure out an easy way to copy an OrtValue that's been allocated on the GPU, back to the CPU.

OrtValue has a really convenient GetTensorDataAsSpan API, which just seems to wrap the raw pointer into a span which obviously won't work when the pointer is for memory on the GPU.

The python API has a nice copy_outputs_to_cpu API, which is exactly what I need.

Can we have the same thing added to the dotnet API ?
Either the GetTensorDataAsSpan API could be updated to do the copying automatically, or a new CopyOutputsToCpu API could be added to the IOBinding class, similar to python.

To reproduce

using Microsoft.ML.OnnxRuntime;

var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider());

var binding = session.CreateIoBinding();

var alloc = new OrtMemoryInfo(OrtMemoryInfo.allocatorCUDA,
    OrtAllocatorType.DeviceAllocator, 0, OrtMemType.Default);
// var alloc = OrtMemoryInfo.DefaultInstance;

binding.BindOutputToDevice("output", alloc);

var input = new float[4];
var inputValue = OrtValue.CreateTensorValueFromMemory(
    OrtMemoryInfo.DefaultInstance, input.AsMemory(), new long[] { 4 });
binding.BindInput("input", inputValue);

session.RunWithBinding(new RunOptions(), binding);

var output = binding.GetOutputValues().ToArray().First();

var outputSpan = output.GetTensorDataAsSpan<float>();
Console.Out.WriteLine($"Got {outputSpan[0]}");

Fails with:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Program.<Main>$(System.String[])

Urgency

Not urgent, feature request.

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Jul 3, 2024
@guigzzz guigzzz changed the title Enable copying of GPU OrtValue to CPU Jul 3, 2024
@xadupre xadupre added the api:CSharp issues related to the C# API label Jul 5, 2024
@yuslepukhin
Copy link
Member

IOBinding is deprecated.

Unless otherwise instructed, the output OrtValues are created and copied to CPU memory at the end of inferencing.

https://onnxruntime.ai/docs/tutorials/csharp/basic_csharp.html

@guigzzz
Copy link
Author

guigzzz commented Jul 10, 2024

Can you elaborate on 'IOBinding is deprecated', this is news to me.
How else are we supposed to efficiently reuse output OrtValues?
If it truly is deprecated, then the documentation should be updated to reflect that.

The 'unless otherwise instructed' part is the crucial bit here. I have my output tensors being allocated on the GPU and I only sometimes want to copy them back to the host (time series model, so outputs feed back into the input, this is more efficient if everything stays on the GPU), but currently can't.

@sophies927 sophies927 removed the ep:CUDA issues related to the CUDA execution provider label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API
4 participants