Conversation
3bed38d to
8665c4a
Compare
| @@ -0,0 +1,28 @@ | |||
| function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB} | |||
There was a problem hiding this comment.
Does this make sense to include, and should this not simply fall back to the default copyto!?
This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided copyto!.
I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)
A[A.offset:stride(A, 1):end] .= B.op.(view(B, div(B.offset, stride(B, 2)):stride(B, 1):size(B, 1), 1:stride(B, 2):size(B, 2)))There was a problem hiding this comment.
It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!
| return CuTensorMap{T, S, N₁, N₂}(CuArray{T}(t.data), space(t)) | ||
| end | ||
|
|
||
| #=function TensorKit.TensorMap{T, S₁, N₁, N₂, A}( |
| end | ||
|
|
||
| function TensorKit.blocktype(::Type{<:CuTensorMap{T, S}}) where {T, S} | ||
| return SubArray{T, 1, CuVector{T, CUDA.DeviceMemory}, Tuple{UnitRange{Int}}, true} |
There was a problem hiding this comment.
I somehow had expected the blocktype to be CuMatrix, with the way that CUDA handles views. If this isn't the case, can we force it to be?
There was a problem hiding this comment.
Actually it wanted it to be ReshapedArray of this SubArray 😱 . Really painful. I can swap this to just being a CuMatrix.
| TensorKit.promote_storage_rule(::Type{CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} = | ||
| CuArray{T, N, CUDA.default_memory} | ||
| TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{CuArray{T, N}}) where {T, N} = | ||
| CuArray{T, N, CUDA.default_memory} |
There was a problem hiding this comment.
| TensorKit.promote_storage_rule(::Type{CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} = | |
| CuArray{T, N, CUDA.default_memory} | |
| TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{CuArray{T, N}}) where {T, N} = | |
| CuArray{T, N, CUDA.default_memory} | |
| TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} = | |
| CuArray{T, N, CUDA.default_memory} |
I should have written the rules in such a way that it is symmetric, so we shouldn't have to define both directions. However, I do think both sides need <: to account for the third type parameter being there, which I also missed in the last PR.
| @@ -102,9 +117,21 @@ function TensorKit.scalar(t::CuTensorMap{T, S, 0, 0}) where {T, S} | |||
| end | |||
|
|
|||
| function Base.convert( | |||
There was a problem hiding this comment.
I'm again a bit confused by the necessity of this function, is that not the same definition as the regular TensorMap one?
| end | ||
| end | ||
|
|
||
| function Base.convert( |
| α::Number, β::Number, backend::AbstractBackend... | ||
| ) | ||
| ) where {T, S} | ||
| tsrc_map = TensorMapWithStorage{scalartype(tdst), storagetype(tdst)}(undef, (tsrc.V2 ⊗ tsrc.V1) ← (tsrc.V1 ⊗ tsrc.V2)) |
There was a problem hiding this comment.
| tsrc_map = TensorMapWithStorage{scalartype(tdst), storagetype(tdst)}(undef, (tsrc.V2 ⊗ tsrc.V1) ← (tsrc.V1 ⊗ tsrc.V2)) | |
| tsrc_map = similar(tdst, storagetype(tdst), space(tsrc)) |
This might be a little cleaner/not use that much "internal"s
|
|
||
| const _GenericTransformerData{T, N} = Tuple{ | ||
| Matrix{T}, | ||
| DenseMatrix{T}, |
There was a problem hiding this comment.
I think this change makes the types below abstractly typed, do we need this?
There was a problem hiding this comment.
Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply CuMatrix * Matrix outside of constructors
There was a problem hiding this comment.
Ok, but in that case we would really have to make that an additional type parameter in the GenericTreeTransformer struct -- these were introduced to hyper specialize and get maximal efficiency, so I don't think we can eat a type-instability here.
There was a problem hiding this comment.
OK, it would have been helpful to have had a comment or anything that this was why they were there
Needed to get more MPSKit examples working