Sunday, October 24, 2010

Hyper-Threading

I've re-run my real-time ray tracing and Go ACO TSP programs on a 6 core processor with hyper-threading. The Intel i7-980X supports two hardware threads per core, so the OS sees it as a 12-core processor.



Here hyper-threading has improved performance for both programs in all configurations. At one core both programs were sped up 28%. At 6 cores ray tracing was sped up 9% versus non-HT, while ACO TSP was sped up 18% versus non-HT. The higher speedup for ACO TSP is expected as it matches the general speedup from increased cores.

Both programs were run with 24 threads in all test configurations. Having more program threads than cores*hardware_threads_per_core is key. Otherwise the OS may schedule multiple threads on the same core while leaving others idle. Linux is hyper-theading aware and attempts to avoid this scheduling problem, but doesn't always get it right. In testing with 6 threads and 6 cores, I saw 28% decreases in performance for both programs when hyper-threading was enabled.

The Intel i7-980X also supports "Turbo Boost", where one core is automatically overclocked when other cores are idle. Turbo boost was disabled for the tests above. With one core and HT disabled, Turbo boost provides a speedup of 8% for both programs. Surprisingly, with six cores and HT enabled (so all cores should be fully utilized) Turbo boost provides a speedup of 5% for both programs.

The performance of Go has also improved relative to the single-threaded C version of ACO TSP. For 24 ants the C version finishes in 60 seconds, so on one core Go is only 1.7x slower than C. At 2 cores Go has exceeded the performance of the single-threaded C version. Previously Go was 2.8x slower than C on one core; I don't know whether this improvement is due to an improved Go compiler, the move to 64-bit compilers and OS, or architectural differences between the Intel Core 2 and i7-980X.

No comments: