Sunday, December 2, 2007


Intel C++ supports OpenMP, a standardized API for shared-memory multiprocessing in C++. By adding a simple #pragma loops can be split up to automatically run in parallel on multiple cores. I added a simple change to the per-screen-line rendering loop of my generic ray tracer:

int screenX;
#ifdef USE_OMP
#pragma omp parallel for firstprivate(portPoint)
for (screenX = 0;
screenX++) {

With this change each pixel is rendered by a separate thread, with a maximum of 4 (the number of cores) running simultaneously. This use of threads is an alternative to the explicit pthread threading tested previously, where a separate thread was used per line of the screen. In this case OpenMP is simpler than explicit threading, though it yields slightly lower performance.

GCC 4.2 also supports OpenMP, but its performance is much worse than Intel.

Monday, October 15, 2007

Genezzo Drive Performance

My Clustered Genezzo code is relatively slow because it doesn't use a write-ahead log. Its access pattern involves repeated writes and syncs alternating between blocks at opposite ends of the data file. Potentially it could run much faster on a solid-state drive. I tested the performance of inserting 1000 integers into a table, with a commit after each insert. I avoided SQL to eliminate Perl Parse::RecDescent overhead. I tested a internal SATA drive, a Coraid ATA-over-Ethernet SAN drive on 100Mbit ethernet, and a SanDisk USB 2.0 ("keychain") flash drive. The system monitor showed the internal SATA drive test was CPU-limited, while the ATA-over-Ethernet SAN drive test was network-limited. I also tested base Genezzo without Genezzo::Contrib::Clustered enabled. Base Genezzo also lacks a write-ahead log, but performs fewer syncs per commit and fewer widely-spaced writes. All tests were done on 500M databases, except for Genezzo::Contrib::Clustered on flash which used the default 600K (?) size. With a 500M database the flash test using Genezzo::Contrib::Clustered got 3 commits per second; this requires further investigation...

Monday, October 1, 2007

Updated Clustered Genezzo on CPAN

Revision 0.34 of Genezzo-Contrib-Clustered has been posted on CPAN. One module moved, one test fixed.

Monday, July 16, 2007

Intel C++ Compiler

I recompiled my generic Ray Tracer code using the Intel C++ compiler. Using SSE3 auto-vectorization I obtained a 21% speedup over GCC.

Wednesday, July 4, 2007

Cell PPC Hardware Threads

The PowerPC unit in the PS3 Cell processor supports two hardware threads. Earlier I added pthread support to the generic ray tracer code. I have updated Real-Time Ray Tracing on the Playstation 3 Cell Processor with PPC performance measurements with pthreads and the IBM XLC compiler. Pthreads gives a 63% speedup, and XLC gives a 26% speedup, and together they give an 89% speedup:

Sunday, July 1, 2007

CUDA GPGPU and Pthreads

I've added a new page, Real-Time Ray Tracing with NVIDIA CUDA GPGPU and Intel Quad-Core. I've created two new versions of the ray-tracer code, one utilizing NVIDIA CUDA, and another using pthreads to exercise multi-core CPUs. The CUDA code is fastest, but least realistic as a practical ray-tracer. Pthreads was certainly easiest -- a half hour change to pre-existing generic code. I think CUDA will perform much better on more constrained (in terms of control flow and random memory access) algorithms.

Saturday, June 9, 2007

PS3 Cell SDK 2.1

I've been getting some compatibility questions about my PS3 Ray Tracer code, so I just upgraded my PS3 to Fedora Core 6 and Cell SDK 2.1. IBM changed the SPU thread API (to pthreads); I now provide code for both versions. They also now have a PPC native version of the IBM XLC compiler. It provides an over 50% performance improvement over GCC:

Saturday, May 26, 2007

Blindsight Site

"Humans didn't really fight over skin tone or ideology; those were just handy cues for kin-selection purposes. Ultimately it always came down to bloodlines and limited resources."

Peter Watts, Blindsight. This Science Fiction author has an interesting website with tie-ins, excerpts, and whole texts of his novels.

Saturday, May 19, 2007

Gadgets from Maker Faire

I picked up some useful gadgets at Maker Faire today.

One is a bendable tripod called a Gorillapod. I've attached it to my webcam. This should be useful for remote board-gaming.

Another gadget is the Make Controller Kit. It is actually a complete board (no soldering required).

It attaches via USB or Ethernet to a PC and provides 8 analog inputs, 8 digital outputs, and 4 servo controllers. It can be controlled by the PC with simple commands embeddable in most programming languages. This is similar to the Ditch-Day Stack Controller I built years ago in college. The Make Controller's processor (55 MHz ARM RISC) can also be programmed directly in C using the GNU GCC toolchain. The embedded Real-time OS FreeRTOS and web server (!) also look interesting.

Wednesday, May 2, 2007

PS3 Cell Ray Tracing at MIT, GPGPU at UI

Earlier this year I wrote an article on Real-Time Ray Tracing on the Playstation 3 Cell Processor. MIT is offering a course on Multicore Programming on the PS3. One of the examples presented is the Blue Steel Ray Tracer.

Earlier I also wrote an article on Four-Dimensional Cellular Automata Acceleration Using GPGPU. The University of Illinois at Urbana-Champaign is offering a course on Programming Massively Parallel Processors featuring GPGPU programming on NVIDIA CUDA hardware.

Currently applications must be written specifically for the Cell, conventional multi-core CPUs, or GPGPU. Several organizations are working to develop a common programming model where the same application code could be compiled to run on any of these. One such system is from RapidMind. It's programming model is much like the use of OpenGL within a C++ program. Arrays are defined using RapidMind container types (like OpenGL graphical types), and then embedded programs are invoked on them (like OpenGL Shading Language).

Sunday, April 29, 2007

About the Name

The blog name I chose, Proliferation of Niches, is from the Herbert Simon quote I have had at the top my web site for years:

"If there is such a trend toward variety, then evolution is not to be understood as a series of tournaments for the occupation of a fixed set of environmental niches, each tournament won by the organism that is fittest for that niche. Instead evolution brings about a proliferation of niches... Vannevar Bush wrote of science as an "endless frontier." It can be endless, as can be the process of design and the evolution of human society, because there is no limit on diversity in the world." (The Sciences of the Artificial, p. 165)

I considered Endless Frontier, but that is too well known as the title of the Vannevar Bush report (and sounds too much like Star Trek :-)

After choosing the blog name I found the phrase is currently being used in discussions in and about the blogosphere:

"With an expanded network, individuals are able to reach out to a potentially larger and more varied pool of culture and information. While debates on globalization in the heyday of mass media suggested that interconnection would lead to the homogenization of culture, in the Internet era, the opposite appears to be more the case. What we are seeing now is a proliferation of niches. We see this in subcultures such as English-language fandoms of Japanese animation, a case described in the culture chapter. Now teens in the US can gain access to niche Japan-origin media that they would never have been able to get their hands on even a decade ago. In the blogosphere, this tendency has been criticized as creating an “echo chamber,” where bloggers and audiences are connecting with greater frequency and fidelity with people who share their opinions, relying less on the standards of neutrality espoused by the mainstream press. "

It is also popular with the "Long Tail" economic crowd:

"Pietro assumes that a proliferation of niche markets will lead to a proliferation of niche suppliers, and hence the dilution of the authority of the big suppliers. I don't see any reason to believe that this is the case. Indeed, one of Chris Anderson's own preferred examples is based on Amazon sales rank - and there's nothing very diffuse about Amazon, or the authority wielded by Amazon. Much of the buzz around the 'Long Tail' seems to derive, ultimately, from this confusion of the two meanings of 'niche'."

While I originally chose the quote for my web site for the biological, technical, and social evolution aspects, I expect the blogosphere and long tail meanings will tie in here. My site started long before blogs. Unless you count CHAOS MANOR in the back of Byte Magazine (which I used to read every month at the public library), now calling itself The Original Blog.

Saturday, April 28, 2007

Mail Bag

My web site gets a surprising amount of interesting email (beyond the usual headhunter spam and spam spam). This may be due to certain search engines regarding me as the world's foremost authority on four dimensional cellular automata, among other topics :-)

One email was from an author working on updating a book I had cited. I gave him and his assistant pointers on the latest I knew about emergent collective intelligence.

Another email was someone looking for a part name and source for a CPU chip socket in a photo from my robot project -- not for a robot or a CPU, so I have no idea how he found the page. I was able to help him.

The most amusing email was a 1337 haxor claiming to have broken the copy protection on one of my Palm games, so he could "fix" it (our backgammon didn't cheat, honest!). I replied with this link; he replied with this one. Touche.

Another person found my board game list, and was looking for a copy of the Avalon Hill Titan board game to play ($150 on e-bay!). Sorry, not selling & no leads.

I was also asked whether I have tried a SAN with more than one Coraid EtherDrive. I have not.

The latest email suggested improvements to Multi-Core Ant Colony Optimization for TSP in Erlang. At the time I had tried native compilation in Erlang, but didn't see any speedup and didn't mention it in the article. On Sun SPARC the correspondent was able to get significant speedup (2x +) on my application with native compilation after he separately native compiled the dict module. Turns out the native compiler only operates on a specific module, and you must native compile the system modules yourself. Otherwise the native speedup is lost in the translation between native and bytecode interpreted in calls between modules. I tried native compiling dict on my AMD system, and didn't get any improvement. I Also (re)discovered that native compilation is not yet compatible with SMP (multi-core) operation. The correspondent also got significant improvements by altering some of my data structure choices. I expect that all three ant algorithm implementations (Erlang, Haskell, and Standard ML (twice)) could be improved be re-visiting my data structure choices.

DNA as String Rewrite System

I'm currently reading Eric Baum's What is Thought. It presents an interesting comparison between the operation of DNA and Post production systems. These systems provide a set of regular expressions paired with corresponding substitutions. An example is the rewrite rule:

x B A B y -> x C B y

where x and y are variables.

When provided the string


the result of the substitution is

A C B C B C. (example from Baum)

These types of operations happen both at the levels of intron removal and protein synthesis. The more common (and easier to visualize) comparison is of DNA to a Turing Machine. Here the DNA (actually RNA) strand is the tape and the Ribosome is the read/write head.

Looking back, both comparisons were explored in Hofstadter's Godel, Escher, Bach: an Eternal Golden Braid and Metamagical Themas

UPDATE: In a later chapter Baum points out the Turing equalence (proven by Minsky, discussed by Wolfram and others) between Turing machines and Post production systems.

Fooled by Randomness

In Breaking the Spell: Religion as a Natural Phenomenon Daniel Dennett cites Julian Jaynes "brilliant but quirky and unreliable book" The Origins of Consciousness in the Breakdown of the Bicameral Mind as noting "that the very idea of randomness or chance is of quite recent origin" (Dennet, p. 133):

"Sortilege or the casting of lots differs from omens in that it is active and designed to provoke the gods' answers to specific questions in novel situations. It consisted of throwing marked sticks, stones, bones, or beans upon the ground, or picking one out of a group held in a bowl, or tossing such markers in the lap of a tunic until one fell out. Sometimes it was to answer yes or no, at other time to choose one out of a group of men, plots, or alternatives. But this simplicity -- even triviality to us -- should not blind us from seeing the profound psychological problem involved, as well as appreciating its remarkable historical significance. We are so used to the huge variety of games of chance, of throwing dice, roulette wheels, etc., all of the vestiges of the ancient practice of divination by lots, that we find it difficult to really appreciate the significance of this practice historically. It is a help here to realize that there was no concept of chance whatever until very recent times. Therefore, the discovery (how odd to think of it as a discovery!) of deciding an issue by throwing sticks or beans on the ground was an extremely momentous one for the future of mankind. For, because there was no chance, the result had to be caused by the gods whose intentions were being divined." (Jaynes, p. 339-240)

Software Transactional Memory

I saw an interesting tech talk recently by someone from Intel on their work on Software Transactional Memory. A question from the audience pointed out an interesting problem with exceptions (also discussed under "Implementation issues" in the Wikipedia article). In STM threads perform optimistic reads, and are retried at commit time if conflicts are detected. Because the reads are "dirty" inconsistent states may be seen, and null pointer or division by zero exceptions may be thrown. The application programmer must recognize this possibility and distinguish between transient (should be retried) and fatal errors. STM trades off the complexity of locking with the complexity of understanding the effects of inconsistent reads. Opposite to the suggestion in Wikipedia, perhaps the system could check for conflicts at the point of exception, and automatically retry if one is found. Of course STM still doesn't work for non-restartable operations, such as I/O. I find the ACID (Atomic, Consistent, Isolated, Durable) status of STM transactions debatable (and advocates don't claim to satisfy all of them).

First Post

This blog will probably be a collection of ideas for possible future articles on my web site.