-
Notifications
You must be signed in to change notification settings - Fork 56
More tweaks #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
More tweaks #375
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB} | ||
| length(A) == length(B) || throw(DimensionMismatch(lazy"length of A ($(length(A))) does not match length of B ($(length(B))")) | ||
|
|
||
| Adata = parent(A) | ||
| Astr = stride(A, 1) | ||
| IA = A.offset | ||
|
|
||
| Bdata = parent(B) | ||
| Bstr = strides(B) | ||
|
|
||
| IB_1 = B.offset | ||
| # build index arrays | ||
| IAs = Int[] | ||
| IBs = Int[] | ||
| @inbounds for _ in axes(B, 2) | ||
| IB = IB_1 | ||
| for _ in axes(B, 1) | ||
| IA += Astr | ||
| append!(IAs, IA) | ||
| IB += Bstr[1] | ||
| append!(IBs, IB) | ||
| end | ||
| IB_1 += Bstr[2] | ||
| end | ||
| Adata[IAs] .= Bdata[IBs] | ||
|
|
||
| return A | ||
| end | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,6 +7,17 @@ function CuTensorMap(t::TensorMap{T, S, N₁, N₂, A}) where {T, S, N₁, N₂, | |||||||||||||
| return CuTensorMap{T, S, N₁, N₂}(CuArray{T}(t.data), space(t)) | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| #=function TensorKit.TensorMap{T, S₁, N₁, N₂, A}( | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. leftover?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😰 |
||||||||||||||
| ::UndefInitializer, space::TensorMapSpace{S₂, N₁, N₂} | ||||||||||||||
| ) where {T, S₁, S₂ <: TensorKit.ElementarySpace, N₁, N₂, A <: CuVector{T}} | ||||||||||||||
| d = TensorKit.fusionblockstructure(space).totaldim | ||||||||||||||
| data = A(undef, d) | ||||||||||||||
| if !isbitstype(T) | ||||||||||||||
| zerovector!(data) | ||||||||||||||
| end | ||||||||||||||
| return TensorKit.TensorMap{T, S₂, A}(data, space) | ||||||||||||||
| end=# | ||||||||||||||
|
|
||||||||||||||
| # project_symmetric! doesn't yet work for GPU types, so do this on the host, then copy | ||||||||||||||
| function TensorKit.project_symmetric_and_check(::Type{T}, ::Type{A}, data::AbstractArray, V::TensorMapSpace; tol = sqrt(eps(real(float(eltype(data)))))) where {T, A <: CuVector{T}} | ||||||||||||||
| h_t = TensorKit.TensorMapWithStorage{T, Vector{T}}(undef, V) | ||||||||||||||
|
|
@@ -17,6 +28,10 @@ function TensorKit.project_symmetric_and_check(::Type{T}, ::Type{A}, data::Abstr | |||||||||||||
| return TensorKit.TensorMapWithStorage{T, A}(A(h_t.data), V) | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| function TensorKit.blocktype(::Type{<:CuTensorMap{T, S}}) where {T, S} | ||||||||||||||
| return SubArray{T, 1, CuVector{T, CUDA.DeviceMemory}, Tuple{UnitRange{Int}}, true} | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I somehow had expected the blocktype to be
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually it wanted it to be |
||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| for (fname, felt) in ((:zeros, :zero), (:ones, :one)) | ||||||||||||||
| @eval begin | ||||||||||||||
| function CUDA.$fname( | ||||||||||||||
|
|
@@ -102,9 +117,21 @@ function TensorKit.scalar(t::CuTensorMap{T, S, 0, 0}) where {T, S} | |||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| function Base.convert( | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm again a bit confused by the necessity of this function, is that not the same definition as the regular |
||||||||||||||
| TT::Type{CuTensorMap{T, S, N₁, N₂}}, | ||||||||||||||
| t::AbstractTensorMap{<:Any, S, N₁, N₂} | ||||||||||||||
| ) where {T, S, N₁, N₂} | ||||||||||||||
| TT::Type{TensorMap{T, S, N₁, N₂, A}}, | ||||||||||||||
| t::TensorMap{T, S, N₁, N₂, AA} | ||||||||||||||
| ) where {T, S, N₁, N₂, A <: CuArray{T}, AA} | ||||||||||||||
| if typeof(t) === TT | ||||||||||||||
| return t | ||||||||||||||
| else | ||||||||||||||
| tnew = TT(undef, space(t)) | ||||||||||||||
| return copy!(tnew, t) | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| function Base.convert( | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment here |
||||||||||||||
| TT::Type{TensorMap{T, S, N₁, N₂, A}}, | ||||||||||||||
| t::AdjointTensorMap | ||||||||||||||
| ) where {T, S, N₁, N₂, A <: CuArray{T}} | ||||||||||||||
| if typeof(t) === TT | ||||||||||||||
| return t | ||||||||||||||
| else | ||||||||||||||
|
|
@@ -140,6 +167,8 @@ end | |||||||||||||
|
|
||||||||||||||
| TensorKit.promote_storage_rule(::Type{CuArray{T, N}}, ::Type{<:CuArray{T, N}}) where {T, N} = | ||||||||||||||
| CuArray{T, N, CUDA.default_memory} | ||||||||||||||
| TensorKit.promote_storage_rule(::Type{<:CuArray{T, N}}, ::Type{CuArray{T, N}}) where {T, N} = | ||||||||||||||
| CuArray{T, N, CUDA.default_memory} | ||||||||||||||
|
Comment on lines
168
to
+171
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I should have written the rules in such a way that it is symmetric, so we shouldn't have to define both directions. However, I do think both sides need |
||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| # CuTensorMap exponentation: | ||||||||||||||
|
|
@@ -168,3 +197,21 @@ for f in (:sqrt, :log, :asin, :acos, :acosh, :atanh, :acoth) | |||||||||||||
| return tf | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| function TensorKit._add_general_kernel_nonthreaded!( | ||||||||||||||
| tdst::CuTensorMap, tsrc::CuTensorMap, p, transformer::TensorKit.GenericTreeTransformer, α, β, backend... | ||||||||||||||
| ) | ||||||||||||||
| # preallocate buffers | ||||||||||||||
| buffers = TensorKit.allocate_buffers(tdst, tsrc, transformer) | ||||||||||||||
|
|
||||||||||||||
| for subtransformer in transformer.data | ||||||||||||||
| # Special case without intermediate buffers whenever there is only a single block | ||||||||||||||
| if length(subtransformer[1]) == 1 | ||||||||||||||
| TensorKit._add_transform_single!(tdst, tsrc, p, subtransformer, α, β, backend...) | ||||||||||||||
| else | ||||||||||||||
| cu_subtransformer = tuple(CUDA.adapt(CuArray, subtransformer[1]), subtransformer[2:end]...) | ||||||||||||||
| TensorKit._add_transform_multi!(tdst, tsrc, p, cu_subtransformer, buffers, α, β, backend...) | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
| return nothing | ||||||||||||||
| end | ||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -171,12 +171,15 @@ end | |||||
| has_shared_permute(t::BraidingTensor, ::Index2Tuple) = false | ||||||
| function add_transform!( | ||||||
| tdst::AbstractTensorMap, | ||||||
| tsrc::BraidingTensor, (p₁, p₂)::Index2Tuple, | ||||||
| tsrc::BraidingTensor{T, S}, | ||||||
| (p₁, p₂)::Index2Tuple, | ||||||
| fusiontreetransform, | ||||||
| α::Number, β::Number, backend::AbstractBackend... | ||||||
| ) | ||||||
| ) where {T, S} | ||||||
| tsrc_map = TensorMapWithStorage{scalartype(tdst), storagetype(tdst)}(undef, (tsrc.V2 ⊗ tsrc.V1) ← (tsrc.V1 ⊗ tsrc.V2)) | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This might be a little cleaner/not use that much "internal"s |
||||||
| copy!(tsrc_map, tsrc) | ||||||
| return add_transform!( | ||||||
| tdst, TensorMap(tsrc), (p₁, p₂), fusiontreetransform, α, β, | ||||||
| tdst, tsrc_map, (p₁, p₂), fusiontreetransform, α, β, | ||||||
| backend... | ||||||
| ) | ||||||
| end | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,7 +46,7 @@ function AbelianTreeTransformer(transform, p, Vdst, Vsrc) | |
| end | ||
|
|
||
| const _GenericTransformerData{T, N} = Tuple{ | ||
| Matrix{T}, | ||
| DenseMatrix{T}, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this change makes the types below abstractly typed, do we need this?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, but in that case we would really have to make that an additional type parameter in the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, it would have been helpful to have had a comment or anything that this was why they were there |
||
| Tuple{NTuple{N, Int}, Vector{Tuple{NTuple{N, Int}, Int}}}, | ||
| Tuple{NTuple{N, Int}, Vector{Tuple{NTuple{N, Int}, Int}}}, | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense to include, and should this not simply fall back to the default
copyto!?This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided
copyto!.I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!