I had one til a few months ago!
The closest thing I’ve been able to find so far (which seems to have been under slow development by 1-2 contributors for the past couple years) is https://github.com/MPSQUARK/BAVCL which is based on ILGPU. I’ll probably be keeping an eye on it though.
Unfortunately I don’t believe NumPy has any built in accelerations (other than being a C library which is fast already), though I don’t really know the ins and outs. There are Python libraries that use the NumPy API or otherwise do some stuff to accelerate it on e.g. CUDA, but the Numpy.NET library as far as I know uses its own embedded Python + numpy, so as far as I can tell that wouldn’t be an option.
Unfortunately not, though I forgot about SIMD! It doesn’t seem to support arbitrary-sized matrices or arrays out of the box, though I guess I could index the vector type myself. Still, it doesn’t offer the operations I’d like, as far as I can tell.
Thanks though!
Great post. Full of useful tips I always need to remind myself of.