NN clustering: VRAM memory leak fix + (u)int -> (u)int32_t#14272
NN clustering: VRAM memory leak fix + (u)int -> (u)int32_t#14272davidrohr merged 7 commits intoAliceO2Group:devfrom
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
Please consider the following formatting changes to AliceO2Group#14272
|
Error while checking build/O2/fullCI_slc9 for b0e8923 at 2025-05-14 12:44: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for afbdade at 2025-05-15 01:54: Full log here. |
|
@drohr Can be merged from my side |
Common/ML/include/ML/OrtInterface.h
Outdated
| // ORT variables -> need to be hidden as pImpl | ||
| struct OrtVariables; | ||
| OrtVariables* mPImplOrt; | ||
| std::shared_ptr<OrtVariables> mPImplOrt = nullptr; |
There was a problem hiding this comment.
Why do you use a shared_ptr and no unique_ptr?
There was a problem hiding this comment.
Because I can't assign a nullptr in the header file since OrtVariables is not known there. It will throw an error for invalid size of incomplete type.
There was a problem hiding this comment.
You have to declare the constructor and destructor in the header, but instanciate it in the cxx.
I.e. only OrtInterface() in the header, and
OrtInterface::OrtInterface() = default; in the cxx.
Then you can use unique_ptr.
There was a problem hiding this comment.
Is there any reason why we need a unique_ptr? Especially for the OrtAllocator (which gets created only for one environment but used in multiple), shared_ptr would probably be preferable. Unique_ptr would also require a deleter, which will make the code more cumbersome...
|
But a unique_ptr is simpler than a shared_ptr. shared_ptr is basically a unique_ptr with refcounting. So if you don't need particularly a shared_ptr, you should use a unique_ptr.
And what kind of deleter do you need?
|
An example. OrtInterface.h: OrtInterface.cxx: ''' // General purpose |
|
I would do it like in the patch below. You are basically using a shared_ptr as a unique_ptr, since your pointer is never really shared. The reason it works for you with a shared ptr is that for a shared ptr the constructor / destructor do not need sizeof(...), while for unique_ptr they do. Thus, with a unique_ptr the constructors must not be in the header. My patch moves them to the cxx. And then, your setEnv / getEnv functions certainly cause memory corruption, irrespective of whether they use shared or unique ptr. With get(...), you are getting a raw C pointer, and then you are setting that to as the env of another OrtVariables struct, thus you have 2 unique_ptr (or in your case 2 shared_ptr, which do not know about each other), both owning the same object, which will lead to use after free memory corruption and to double delete. |
This PR fixes memory leaks and changes (u)int -> (u)int32_t in the GPU kernel code