Support AD on CUDA

Hi developers.

I have recently noticed that the new version of TensorKit supports running on CUDA, and I got a massive speed up when I tried. It's wonderful!                
However, when combining GPU-backed TensorMaps with AD (Zygote), the backward pass fails because several pullback functions assume CPU arrays due to **scalar indexing**.

```
using TensorKit
using Zygote
using CUDA
using cuTENSOR
using Adapt


V = Rep[U₁](1//2 => 2, -1//2 => 2)
A = randn(ComplexF64, V ← V)
A_gpu = adapt(CuArray, A)

# Forward works
D, U = eigh_trunc((A_gpu + A_gpu') / 2; trunc = truncrank(2))

@show D

# Backward fails
Zygote.gradient(A_gpu) do a
    D, U = eigh_trunc((a + a') / 2; trunc = truncrank(2))
    return real(tr(D))
end
```
And the essentially same issue occurs in many of functions for tensor manipulation, such as flip() and twist().
Maybe the scalar indexing is the best way to go on CPU with help of Strided.jl, so we may need to implement special pullbacks for CUDA extensions? It'd be annoying...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AD on CUDA #376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support AD on CUDA #376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions