More tweaks#375

Open

kshyatt wants to merge 1 commit intomainfrom

ksh/cuda_tweaks

Member

kshyatt commented Feb 18, 2026

Needed to get more MPSKit examples working


          More tweaks

8665c4a

kshyatt force-pushed the ksh/cuda_tweaks branch from 3bed38d to 8665c4a Compare

February 18, 2026 13:35

lkdvos reviewed

View reviewed changes

ext/TensorKitCUDAExt/auxiliary.jl

		@@ -0,0 +1,28 @@
		function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB}

Member

lkdvos Feb 18, 2026

Does this make sense to include, and should this not simply fall back to the default copyto!?
This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided copyto!.

I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)

A[A.offset:stride(A, 1):end] .= B.op.(view(B, div(B.offset, stride(B, 2)):stride(B, 1):size(B, 1), 1:stride(B, 2):size(B, 2)))

Member Author

kshyatt Feb 18, 2026

It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!

ext/TensorKitCUDAExt/cutensormap.jl

                   return CuTensorMap{T, S, N₁, N₂}(CuArray{T}(t.data), space(t))
               end
+              #=function TensorKit.TensorMap{T, S₁, N₁, N₂, A}(

Member

lkdvos Feb 18, 2026

leftover?

Member Author

kshyatt Feb 18, 2026

😰

ext/TensorKitCUDAExt/cutensormap.jl

               end
+              function TensorKit.blocktype(::Type{<:CuTensorMap{T, S}}) where {T, S}
+                  return SubArray{T, 1, CuVector{T, CUDA.DeviceMemory}, Tuple{UnitRange{Int}}, true}

Member

lkdvos Feb 18, 2026

I somehow had expected the blocktype to be CuMatrix, with the way that CUDA handles views. If this isn't the case, can we force it to be?

Member Author

kshyatt Feb 18, 2026

Actually it wanted it to be ReshapedArray of this SubArray 😱 . Really painful. I can swap this to just being a CuMatrix.

ext/TensorKitCUDAExt/cutensormap.jl

Comment on lines 168 to +171

               TensorKit.promote_storage_rule(::Type{CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} =
                   CuArray{T, N, CUDA.default_memory}
+              TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{CuArray{T, N}}) where {T, N} =
+                  CuArray{T, N, CUDA.default_memory}

Member

lkdvos Feb 18, 2026

Suggested change

      
            TensorKit.promote_storage_rule(::Type{CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} =
          
                CuArray{T, N, CUDA.default_memory}
          
            TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{CuArray{T, N}}) where {T, N} =
          
                CuArray{T, N, CUDA.default_memory}
          
            TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} =
          
            CuArray{T, N, CUDA.default_memory}

I should have written the rules in such a way that it is symmetric, so we shouldn't have to define both directions. However, I do think both sides need <: to account for the third type parameter being there, which I also missed in the last PR.

ext/TensorKitCUDAExt/cutensormap.jl

		@@ -102,9 +117,21 @@ function TensorKit.scalar(t::CuTensorMap{T, S, 0, 0}) where {T, S}
		end

		function Base.convert(

Member

lkdvos Feb 18, 2026

I'm again a bit confused by the necessity of this function, is that not the same definition as the regular TensorMap one?

ext/TensorKitCUDAExt/cutensormap.jl

+                  end
+              end
+              function Base.convert(

Member

lkdvos Feb 18, 2026

Same comment here

src/tensors/braidingtensor.jl

                       α::Number, β::Number, backend::AbstractBackend...
-                  )
+                  ) where {T, S}
+                  tsrc_map = TensorMapWithStorage{scalartype(tdst), storagetype(tdst)}(undef, (tsrc.V2 ⊗ tsrc.V1) ← (tsrc.V1 ⊗ tsrc.V2))

Member

lkdvos Feb 18, 2026

Suggested change

      
                tsrc_map = TensorMapWithStorage{scalartype(tdst), storagetype(tdst)}(undef, (tsrc.V2 ⊗ tsrc.V1) ← (tsrc.V1 ⊗ tsrc.V2))
          
                tsrc_map = similar(tdst, storagetype(tdst), space(tsrc))

This might be a little cleaner/not use that much "internal"s

src/tensors/treetransformers.jl

               const _GenericTransformerData{T, N} = Tuple{
-                  Matrix{T},
+                  DenseMatrix{T},

Member

lkdvos Feb 18, 2026

I think this change makes the types below abstractly typed, do we need this?

Member Author

kshyatt Feb 18, 2026

Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply CuMatrix * Matrix outside of constructors

Member

lkdvos Feb 18, 2026

Ok, but in that case we would really have to make that an additional type parameter in the GenericTreeTransformer struct -- these were introduced to hyper specialize and get maximal efficiency, so I don't think we can eat a type-instability here.

Member Author

kshyatt Feb 18, 2026

OK, it would have been helpful to have had a comment or anything that this was why they were there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet