-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
Hi developers.
I have recently noticed that the new version of TensorKit supports running on CUDA, and I got a massive speed up when I tried. It's wonderful!
However, when combining GPU-backed TensorMaps with AD (Zygote), the backward pass fails because several pullback functions assume CPU arrays due to scalar indexing.
using TensorKit
using Zygote
using CUDA
using cuTENSOR
using Adapt
V = Rep[U₁](1//2 => 2, -1//2 => 2)
A = randn(ComplexF64, V ← V)
A_gpu = adapt(CuArray, A)
# Forward works
D, U = eigh_trunc((A_gpu + A_gpu') / 2; trunc = truncrank(2))
@show D
# Backward fails
Zygote.gradient(A_gpu) do a
D, U = eigh_trunc((a + a') / 2; trunc = truncrank(2))
return real(tr(D))
end
And the essentially same issue occurs in many of functions for tensor manipulation, such as flip() and twist().
Maybe the scalar indexing is the best way to go on CPU with help of Strided.jl, so we may need to implement special pullbacks for CUDA extensions? It'd be annoying...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels