CUDA and Raymarching

category: code [glöplog]
Would there be any benefits to using CUDA to code our raymarchers, as apposed to GLSL/HLSL/Cg? I'm looking into it, but do you people know any pros/cons? Is it even possible?
added on the 2011-02-18 00:59:41 by Mewler Mewler
not that I have any proof. lol. but... theoretical. why not?! it's all the same code running on the gpu. but anyway. FUCK CUDA. it's NVidia only. this' not for the general audience. so... better code safe for all vendors. ;)
added on the 2011-02-18 01:38:40 by yumeji yumeji
No reason why it shouldn't be possible. But yeah, what yumeji said. You don't want to produce something that won't run on all cards. Not acceptable in anything above 1k these days (and even then it's a bit sketchy).
added on the 2011-02-18 04:15:56 by ferris ferris
Yea, I'm using ATI so its not really an option. Luckily I did some digging, and it seams ati have there own almost cross compatible competition, APP. It seams it can run CUDA code with only small modifications, like HLSL/GLSL.
added on the 2011-02-18 04:24:49 by Mewler Mewler
I'm no expert, but I'm quite sure you don't want your size-optimized intro querying your gpu vendor and modifying your shader code accordingly :D
added on the 2011-02-18 04:31:18 by ferris ferris
(and please don't take that comment seriously, like some people *cough*vibrator*cough* have. I just like to joke around alot)
added on the 2011-02-18 04:35:33 by ferris ferris
ryg: Almost as funny as his pie vs cake comparison :D
added on the 2011-02-18 05:05:29 by ferris ferris
And I'm not quite sure how alot is less dangerous than baby.
added on the 2011-02-18 05:07:31 by ferris ferris
Oh! And my apologies!! Hadn't realized that blog belonged to a girl.

*her :)
added on the 2011-02-18 05:09:21 by ferris ferris
Good question. There are two CUDA advantages that come to mind:

1) You have a degree of control over the order of execution, that is, each pixel is not neccesarily evaluated independently.
2) This allows you too share some data between pixels.

(It's not strictly correct to talk about pixels, since you process arbitrary data. But I guess you have to map the data processing to pixel processing in a way, so I use the term "pixel" loosely here)

I can't see any obvious benefits from those advantages in a raymarcher. Anyone?
added on the 2011-02-18 07:08:35 by revival revival
For a cross-platform and cross-vendor GPGPU compute language, use OpenCL instead of CUDA. Strictly speaking, NVIDIA's CUDA is a platform (not a language) that actually includes OpenCL; the language itself is called CUDA C. For ATI/AMD, the GPGPU platform used to be called ATI Stream (being compromised of CAL and OpenCL etc.), and now is called AMD Accelerated Parallel Processing (APP). See SmallLuxGPU for a nice example of OpenCL-based raytracing, though not raymarching.
added on the 2011-02-18 07:29:04 by eyebex eyebex
What eyebex is saying, is something like this. Imagine that OpenCL is a cat, and CUDA is a bear. Watch and learn:

BB Image

I think I've made my point quite clear, don't you gentlemen? Indeed.
added on the 2011-02-18 08:00:10 by ferris ferris
Another fine example:

BB Image
added on the 2011-02-18 08:13:47 by ferris ferris
There isn't much/any point in using cuda over directompute/opencl these days.
But sure, a compute API allows some raymarching optimizations hard to do with pixel shaders. I did some long ago when dx11 was new (and imature), but it's not as easy as it seems, as you'll quickly run into simd performance pitfalls.

However, (at least for 4ks) such optmizations also makes the code more complex and thereby larger, and if you also want to use the faster renderer for more complex scenes, well... ;)

But DirectCompute (or actually just computeshaders in dx11) is clearly the api you should choose for these kinds of things. No dlls to ship along your code and decently vendor agnostic (as long as dx11 is available).
It's even possible to do 1ks with compute shaders, but it will be slightly larger than dx9/ogl (but smaller than a dx10 pixel shader setup).
added on the 2011-02-18 09:09:02 by Psycho Psycho
There is no advantage in using CUDA/OpenCL for raymarching - at least from what I read in several other raymarching threads.
added on the 2011-02-18 09:49:52 by las las
yes, I ported Slisesix to CUDA, it was still 4k. But I got no speed up compared to GLSL or HLSL. I remember I had to do some tricks to have CUDA working in 4k because of some static data initialization. If I remember well I had to modify the exe in hexadecimal to replace some address to have it working in Release. It was working well tho. I also remember to have the Mandelbrot set in 1k with CUDA.

That's what iq said.
added on the 2011-02-18 10:07:59 by las las
In here xortdsc is doing raymarching with CUDA.
I would suggest OpenCL too btw.
added on the 2011-02-18 10:28:32 by raer raer
...if anything of that sorts...
added on the 2011-02-18 10:29:31 by raer raer
I would suggest GLSL/HLSL.
added on the 2011-02-18 12:43:58 by las las
If you're on mac, openCL is there by default. But avoid it, hardware support is so patchy it makes CUDA look like the peak of compatibility! Even new systems are totally hit + miss, my new imac with ATI 4670 doesn't support it at all :(
added on the 2011-02-18 12:46:16 by psonice psonice
BB Image
added on the 2011-02-18 12:50:29 by las las
Dell is pretty gay though so the comparision is rendered useless ;)

Still, I lol'ed.
added on the 2011-02-18 12:56:22 by cg_ cg_
unfortunately OpenCL is way too slow in its current state (and on ATI very fragile, too).. CUDA is the way to go if you want highest performance (as an example, my mandelbulb implementation in CUDA outperforms my GLSL one by far), unless your codebase is rather small (in that case the GLSL compiler might win performancewise as it tends to be more "brute force" optimizing)..
added on the 2011-02-18 13:20:52 by toxie toxie
I'm currently taking a close look on various OpenCL implementations as part of my thesis, and for my specific use-case, ATI's implementation (both on CPU and GPU) turned out to be the one with the least critical bugs. While NVIDIA's and Intel's implementation deliver better performance, they're still quite buggy, mostly regarding code generation (invalid PTX code, wrong results for vectorized kernels). But I believe this is mostly due to LLVM, which is also quite immature yet. Once this is settled, I believe OpenCL is the way to go regarding GPGPU / computing languages.
added on the 2011-02-18 13:30:45 by eyebex eyebex