OpenCL is the new open standard for GPGPU (general-purpose computing on graphics processing units) programming. It supports both graphics cards and multi-core processors with the same C99-based source code. I've re-written the NVIDIA CUDA version of my real-time ray tracer using OpenCL, and run it on an ATI/AMD Radeon HD 5870 GPU. For comparison I also ran it in CPU mode on the Intel i7-980X (6 core, hyperthreaded) processor.
The OpenCL CPU version provided similar results to the pthreads-generic version tested earlier. With 10 objects in the scene the GPU performance was only slightly better than the CPU version. But as the number of objects increased the GPU version maintained performance much better than the CPU version. This indicates that with low object counts the actual work done by each GPU core is small relative to the setup and copying results back overheads. Yes, a realistic ray tracer would be culling the scene graph so fewer objects would be evaluated for each point. And the GPU version ran out of constant memory above 800 objects. But this is much better scaling and performance than the NVIDIA CUDA test I did a few years ago.