Conversation
|
Lol, the assertslow is there for a reason, since scalar indexing into a gpuarray is very slow and should not be done! |
|
Ouch! The benchmarks don't include padding. I should have mentioned that. Sorry, my bad. I'll create a separate benchmark with padding. |
src/pool.jl
Outdated
| @@ -0,0 +1,44 @@ | |||
| import CUDAnative | |||
src/pool.jl
Outdated
| pool = UInt32(pool) | ||
| stride = UInt32(stride) | ||
| out = similar(b) | ||
| out = out[1:(div(Asize[1] - pool, stride) + 1), 1:(div(Asize[2] - pool, stride) + 1), :, :] |
There was a problem hiding this comment.
you could just do similar(b, outsize) no?
There was a problem hiding this comment.
Thanks, I was unaware of this. It should be similar(b, outSize...) perhaps. Also, outSize needs to be determined before similar is called.
Co-authored-by: SimonDanisch <sdanisch@gmail.com>
|
Updated. Thank you @SimonDanisch for PR #111 and commit 1e1104e. |
39e7783 to
fef2421
Compare
An implementation of
maxpool. Here's a sample benchmarking (CPU v/s GPU): https://gist.github.com/americast/95358d972647adf5c7ebcde7c58db51fTests were failing due to
getindex is disablederror. I have made a small change insrc/indexing.jlas a workaround.Thanks.