Monday, November 17, 2008

Three Key Concepts, and Their Application

  • Occam's Razor
    "the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory"

  • Falsifiability
    "the logical possibility that an assertion can be shown false by an observation or a physical experiment"

  • Uniformitarianism
    "the assumption that the natural processes operating in the past are the same as those that can be observed operating in the present"

These three concepts form part of the bedrock of the Western scientific method. When taken seriously, they help refute most supernatural claims. Science asserts that physical law has been constant over the time frames of the evolution of life on Earth, the evolution of man, and the development of civilization.

While we cannot travel into the past to directly witness historical events, we can examine evidence that has been (or should have been) left behind. We can also closely examine contemporaries, in both our own and other cultures, making similar claims. In the case of current supernatural claims, not one has held up to scientific examination.

Psychology carefully avoids making judgments about the truth of supernatural claims, instead focusing the individuals ability to function within their local social group. In Western society today we still see individuals making supernatural claims very similar to those we read in ancient texts. Sometimes they are recognized as mentally ill, but this does not stop others from believing and following them. From this it is easy to assume ancient, pre-scientific peoples were even more credulous.

This leaves one with several options on how to partition one's world-view. One extreme is to assert that many contemporary supernatural claims, from many traditions, are all true. A more moderate position is that the world was different two to four thousand years ago, and that supernatural claims from that time period were true. Of course these positions are often subdivided where claims within one tradition are uniformly true while the others are false (or misguided interpretations of events actually powered by the one true source).

The simplest Occam's-Razor explanation is to take a Uniformitarian position and assert that all supernatural claims for all time are purely based in psychology.

Tuesday, October 28, 2008

Galapagos















From Galapagos October 2008

Did the First Giraffe Have a Navel, Revisited

In an earlier post I asked, "Did the First Giraffe Have a Navel?". Turns out this is an old question:

These authors seem no more startled at a miraculous act of creation than at an ordinary birth. But do they really believe that at innumerable periods in the earth's history certain elemental atoms have been commanded suddenly to flash into living tissues? Do they believe that at each supposed act of creation one individual or many were produced? Were all the infinitely numerous kinds of animals and plants created as eggs or seed, or as full grown? and in the case of mammals, were they created bearing the false marks of nourishment from the mother's womb? The Origin of Species, By Charles Darwin (1859)

Monday, July 21, 2008

Citations and Uses of Ray Tracer Code

The real-time ray tracing code I wrote for PS3 Cell and CUDA GPGPU is being used by others. David Yuen's group at the Department of Geology and Geophysics and Minnesota Supercomputing Institute at the University of Minnesota used my PS3 code as the starting point for tsunami modeling. From their IEEE Vis 2007 poster session Use of Ray Tracing Techniques on Tsunami Simulation Data with the PlayStation® 3:

"This poster will discuss how we have used the PlayStation® 3’s cell processor to implement and employ parallelized visualization, through the form of ray tracing techniques, on tsunami data. By adapting Eric Rollins’ ray tracing package, we were able to implement his ray casting techniques on triangles. This allowed us to triangulate our tsunami datasets, stored as [x, y, height] data, and feed the triangulated data into the program. We rendered 4000 triangle objects from the data and applied a colormap based on each triangle’s height value. Finally, we displayed the generated image."

They also published a paper Experiments in scientific computation on the PlayStation 3 in the Springer journal Visual Geosciences. This PowerPoint presentation shows images from my and their work.

Other students are trying CUDA GPGPU ray tracing. Michael Allgyer (DOC) cited my work in his masters thesis proposal. Umut Erturk asked me about PS3 versus GPGPU for his thesis. Others are trying to get my CUDA code working on windows.

Friday, July 18, 2008

ACO for TSP with SML/NJ

I was asked about the performance of SML/NJ on my usual ACO for TSP test. I used Standard ML of New Jersey v110.67 from Debian apt-get. I ran it in the interactive shell; no Compilation Manager. I didn't experiment with Concurrent ML (I don't think it supports multi-core). Results:

Time Multiple
in secs of C
C 0.4 1
MLton SML 1.6 4
SML/NJ 4.9 12
Alice SML 169.0 423

Results scaled up proportionally when I increased iterations by a factor of 10.

Tuesday, July 15, 2008

Shedskin Python

Shedskin is another experimental Python compiler. Previously I tested Psyco, a Just-in-Time Python compiler which operates while the program is running. Shedskin is a more traditional compiler, run in separate steps beforehand. Shedskin actually generates templated C++ code, which is then compiled with GCC.

Since Python is dynamically typed while C++ is statically typed, Shedskin performs type inference over your program. It generates C++ templates to support parametrically polymorphic functions. A simple example, previously used in my article Types and Programming Languages:

def palindrome(x, y):
return (x, y, x)

print palindrome(1, 2)
print palindrome("a", "b")

Results:

(1, 2, 1)
('a', 'b', 'a')

With shedskin, the following C++ template is generated:

template <class A> tuple2<A, A> *palindrome(A x, A y) {
return (new tuple2<A, A>(3, x, y, x));
}

and used:

str *const_0, *const_1;
const_0 = new str("a");
const_1 = new str("b");
print("%s\n", palindrome(1, 2));
print("%s\n", palindrome(const_0, const_1));

While support for many Python constructs and library functions is currently limited, Shedskin was able to compile unmodified the same Ant-Colony-Optimization Python code I had used previously with Psyco Python. It is significantly faster, coming within 3x the performance of the baseline C version:

Time Multiple
in secs of C
C 0.4 1
Shedskin Python 1.2 3
Psyco Python 5.5 14
Python 40.0 100

Monday, July 14, 2008

C baseline for ACO TSP

In my previous post on the Multi-Core Problem I discussed comparing the performance of multi-core-friendly languages to the baseline performance of single-threaded C code. The test case I have been using is Ant Colony Optimization for TSP. In a recent post Leonardo Maffi has rewritten my ACO TSP code in C. This provides a baseline for comparisons:

Time Multiple Speedup
in secs of C (1->2)
C 0.4 1 n/a
Scala 10.0 25 1.7
Haskell 22.0 55 1.4
Erlang 93.0 233 1.9

gcc (GCC) 4.2.4 (Debian 4.2.4-1).
Compiled gcc -Wall -O3 -s ant_c.c -o ant_c
Still 1000 iterations, 200 cities.
Times for other languages from here .
Initial speedups from here (Scala interpolated), and do not model decay at higher numbers of cores.

Sunday, July 13, 2008

The Multi-Core Problem

"Since 2002, the limits of power, available instruction-level parallelism, and long memory latency have slowed uniprocessor performance [growth] recently, to about 20% per year [down from 52% per year]." Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Fourth Edition (2007), p.3.

Post-2002 the Moore's Law growth in transistor count has been increasingly devoted to adding additional cores to CPUs, rather than trying to improve single-core performance. In the prior decade single-threaded C/C++ applications were sped up "for free" with successive CPU generations. This free ride has ended, and left programmers with a choice. Either they can rewrite their single-threaded C/C++ applications to be multi-threaded (either explicitly, with pthreads, or more implicitly, say with OpenMP), or they can rewrite their applications in a different more multi-core-friendly programming language.

In choosing an alternative language two important related factors must be examined. First, how much of a single-threaded performance hit is taken moving to the new language? If the new language is say 40 times slower than C/C++, you will need a 40-core CPU to get back to where you started (assuming perfect linear speedup). Second, how much worse than linear is the actual speedup? If the speedup starts out at 1.5 with 2 cores, and drops below 1 at 8 cores, the new language will probably never catch single-threaded C/C++.

Wednesday, July 9, 2008

Ant Colony Optimization for TSP in Psyco Python

My last post about Parametric Polymorphism and runtime efficiency got me curious about Psyco Python, so I've implemented Ant Colony Optimization for TSP in Psyco Python.

Psyco is a compiler for Python. It is a just-in-time compiler, and uses the run-time type usage information to determine which functions should be native-compiled. It is a specializing compiler -- it will create multiple instantiations of the same function with different type parameters. It is very easy to use; just add 2 lines to the top of your program:

import psyco
psyco.full()

It is an experimental project. The documentation warns that map and lambda perform poorly, and that list comprehensions (often a reasonable substitute) should be used instead. The original developer has moved on to work on the PyPy project; I should investigate that later.

Compared to other tested languages the performance of uncompiled Python is average; with Psyco compilation it is surprisingly good.



Erlang Haskell Scala MLton_SML Python Psyco
93 22 10 1.5 40 5.5

All times in seconds.

Update 7/12
Leonardo Maffi has pointed out numerous potential style and performance improvements for the Python code. Turns out Python already has a zip function (hard to find the info searching www.python.org, mixed with the zip file functions :-), so I have removed my implementation from the code. No performance change with my zip removed.

Update 7/15
See new post on Shedskin Python.

Monday, July 7, 2008

Types and Programming Languages

"The first type systems in computer science, beginning in the 1950s in languages such as Fortran, were introduced to improve the efficiency of numerical calculations by distinguishing between integer-valued arithmetic expressions and real-valued ones; this allowed the compiler to use different representations and generate appropriate machine instructions for primitive operations."

I've been reading Pierce's Types and Programming Languages. I've become interested in the interactions between Parametric Polymorphism and runtime efficiency. I do have a interesting (to me :-) point at the end, but it takes a lot of introduction...

rest of my article here

Friday, June 20, 2008

Multi-Core Ant Colony Optimization for TSP in Scala

I've added a new page Multi-Core Ant Colony Optimization for TSP in Scala.

Scala is a hybrid functional and object-oriented programming language. It is based on the Java JVM, so it has available Just-in-Time (JIT) native compilation and native thread support. It has access to the wide variety of class libraries available for Java.

Like Erlang, Scala supports an Actor-style message-passing concurrency model. More traditional thread-based concurrency with locks and shared memory is also available. Unlike Erlang and Haskell, Scala values are not immutable. Like SML and Haskell, typing is static with type inference.



Erlang and Alice SML omitted from graph. Note change from previous tests: 1000 interations, instead of 100. Still 200 cities.

Cores Erlang Haskell Scala OCaml MLton Alice
SML SML
1 372 86 41.6 6.0 676
2 192 63
3 141 58
4 98 54 14.6 1.6

All times in seconds.

If anyone knows how to reduce the number of cores used by the JVM, I'll fill in the rest of the table for Scala.

Tuesday, June 3, 2008

Functional Languages on Quad-Core (revised)

Yesterday's post generated lots of suggestions. I've revised the Erlang code to use dict mapping and updating functions. I didn't use the new Erlang arrays since they are one-dimensional and I need two dimensional arrays. Here are the new results:



Cores Erlang Haskell SML
1 38.4 8.6 0.76
2 19.6 6.1
3 14.5 5.5
4 10.3 5.1

Here is the revised Erlang source code, as well as the Haskell quad-core source code used in tests. The MLton Standard ML implementation is still single-core, as MLton does not have SMP support (see comments below).

Update: Added new post with Scala implementation.

Sunday, June 1, 2008

Functional Languages on Quad-Core

In an earlier series of articles ending with Multi-Core Ant Colony Optimization for TSP in Erlang I evaluated the performance of the Standard ML, Haskell, and Erlang functional programming languages using a test problem of solving the traveling salesman problem using ant colony optimization. At the time I only had available a dual-core machine. I have re-run the tests using an Intel quad-core machine and the latest language versions available for Debian Linux.



The results match my previous summary: Erlang is much slower than the others. Erlang is a bytecode interpreted virtual machine while the others are native code compilers. Erlang is also dynamically typed while the others are statically typed. Like Haskell, Erlang has immutable (single assignment) variables which force lots of copying. Unlike Standard ML and Haskell, no array type is provided so I had to inefficiently implement them using dictionaries. In Erlang each process (thread) does have a separate heap, so there is little interference between processes due to memory allocation or garbage collection. Near perfect speedup is achieved.



The CPU utilization graph (Erlang, then Haskell, then Standard ML) does show the lack of interference between Erlang threads which makes its speedup possible. Haskell achieves little speedup at higher numbers of cores.

Update: thanks, responses to comments:

As discussed at the end of an earlier posting, based on email feedback I did try HiPE compilation with Erlang 5.5 on AMD. It provided no significant improvement, and was not compatible with multi-core operation. This time I again tried HiPE compilation with Erlang 5.6.2, and get the following with erl:

$erl
Erlang (BEAM) emulator version 5.6.2 [source] [smp:4]
[async-threads:0] [kernel-poll:false]

Eshell V5.6.2 (abort with ^G)
1> c(ant,[native]).
./ant.erl:none: Warning: this system is not configured for
native-code compilation.
{ok,ant}
2>

Same for erlc:

$erlc +native ant.erl
./ant.erl:none: Warning: this system is not configured for
native-code compilation.

So it appears the latest i386 Erlang available from Debian Linux apt-get doesn't support native compilation, probably because they chose to enable SMP support instead.

Erlang arrays are new in the latest release. Thanks for the info, I'll try them out.

Yes, MLton SML with one core was the fastest. Older (pre-quad-core) versions of the code are available linked from the various ACO TSP pages and here: Standard ML, Haskell, Erlang.

Actually the next language I will probably try is Scala. Based on the Java JVM, it may inherit JIT native compilation and good SMP thread support. It also supports an Actor concurrency model like Erlang. Apparently Caml dialects don't support SMP. Of course for many problems (like ACO TSP :-) you are better off simply starting N different fast single-threaded instances of the program and collecting the results in Ruby or some-such. I am really looking for instances where using a SMP-capable language is a clear win over single-threaded C++. I haven't written a single-threaded STL C++ version of ACO TSP; I expect it could easily crush Mlton SML once you finally got the bugs out...

Here are the raw numbers, in seconds:

Cores Erlang Haskell SML
1 87.0 8.6 0.76
2 46.3 6.1
3 34.6 5.5
4 26.2 5.1

When I increase the iterations in Haskell by 5x the runtime goes up by 5x (43 secs with 1 core, 26 secs with 4 cores), still showing the same pattern of decreasing improvement with increased cores.

Update 2: see new post for revised Erlang code and results.

Update 3:
Turns out Erlang HiPE is available in Debian as a separate package (erlang-base-hipe), and is now SMP compatible. But the improvement is only 5%, or 11% if you also native compile dict.erl.

Sunday, May 18, 2008

What I.D. is really about

My earlier post Life is like Javascript, not like Java discussed the conflict between Plato's theory of Forms and evolution of new species [*]. I think this conflict is the real motivator behind Intelligent Design. The idea of Forms is pre-Plato; it is implied in Genesis in the animal Kinds and is actually part of our early cognitive heritage. For Plato the belief in Forms both presupposed and necessitated a deity.

The IDers believe in Platonic Forms (often without realizing it), and view species as Forms. They believe natural processes like evolution cannot produce new Platonic Forms, since the Forms preexist and reside in some other timeless dimension our minds somehow contact. Space aliens (an alternative proposed by the IDers in my earlier post Did the First Giraffe have a Navel?) are also unlikely to be able to add new Platonic Forms. The only viable alternative is a deity, the same one who set the mathematical and physical constants (like Pi, and the speed of light C) before the universe was created. The species Forms would have also all been predefined before the universe started.

[*] Note this summary also slides towards a common trap, not unique to IDers. My linked post also asserts species don't really exist, but are an arbitrary man-made construct. So in an important sense evolution doesn't produce species, only individuals. The IDers are asserting nature is bounded by preexisting species Forms, while I say nature in generating new individuals does not recognize or respect any such boundaries.

Tuesday, May 13, 2008

Why many of us dislike Pair Programming

From Extraversion and introversion on Wikipedia:

An extravert is energized when around other people. Extraverts tend to "fade" when alone and can easily become bored without other people around. Extraverts tend to think as they speak. When given the chance, an extravert will talk with someone else rather than sit alone and think...

An introvert is energized when alone. Introverts tend to "fade" when with people and can easily become overstimulated with too many others around...

Acting, teaching, directing, managing, brokering are fields that favor extraversion...

(Introverts) often take pleasure in solitary activities such as reading, writing, drawing, watching movies, and using computers. The archetypal artist, writer, sculptor, composer and inventor are all highly introverted...

Some careers such as computer programming may be more satisfying for an introverted temperament, while other areas such as sales may be more agreeable to the extraverted type.

Sunday, May 4, 2008

Maker Faire 2008




A bit more Burning Man,



a bit more Love Parade,



a bit more Disney.



The Theremin by the Steampunk section was cool.

Sunday, April 27, 2008

Waking Life

Kant says somewhere: "The lunatic is a dreamer in the waking state." According to Krauss, "Insanity is a dream in which the senses are awake." Schopenhauer terms the dream a brief insanity, and insanity a long dream. Hagen describes delirium as a dream-life which is inducted not by sleep but by disease. Wundt, in his Physiologische Psychologie, declares: "As a matter of fact we ourselves may in dreams experience almost all the manifestations which we observe in the asylums for the insane."

Sigmund Freud, The Interpretation of Dreams, p. 66.

Saturday, April 5, 2008

Black Swans in the Market

Modern portfolio theory is based on the assumption that changes in stock prices are normally distributed (follow a bell curve). Benoit Mandelbrot and Nassim Nicholas Taleb disagree with this assumption, and believe market crashes like that of 1987 are much more likely to occur than standard theory predicts.

For my spring 2006 U.C. Berkeley Extension Statistics class I created a presentation Are Daily Changes in Stock Prices Normally Distributed? I analyzed MSFT (used in the standard MBA Finance textbook as an example of normality!), BRKA, and the S&P 500 index over various date ranges. I found none of them satisfied statistical tests for normality.

Black Swans

"Black Swans", as unlikely or unexpected economic events, are currently in the news. This usage was popularized by Nassim Nicholas Taleb. It was based on the old European belief that all swans were white, which was contradicted by the discovery of black swans in Australia. But as I discussed earlier genus and species are human-defined categories. There is no logical contradiction inherent in a black swan, and probabilities related to finding one are really probabilities of finding individuals (animal instances) fitting into ill-defined categories.

Note the first black swan was discovered in 1697, over 150 years before publication of Darwin's On the Origin of Species (1859) and over 250 years before Watson and Crick's A Structure for Deoxyribose Nucleic Acid (DNA) (1953). So seventeenth century notions of species would not include modern concepts of evolutionary descent or DNA similarity. Instead they were based on ideas of fixed categories inherited from Aristotle.

Progress in GPGPU Ray Tracing

I've been getting traffic and questions about my pages on Cell and CUDA ray tracing. Saarland University has been doing some interesting work on Cell and CUDA ray tracing including this paper: Realtime Ray Tracing on the GPU with BVH-based Packet Traversal.

Saturday, March 1, 2008

Funes el memorioso

AKA "Funes, His Memory", "Funes the Memorious", "Funes the Elephant-Memoried." Borges short story about a man who never forgets.

Funes, we must not forget, was virtually incapable of general, platonic ideas. Not only was it difficult for him to see that the generic symbol "dog" took in all the dissimilar individuals of all shapes and sizes, it irritated him that the "dog" of three-fourteen in the afternoon, seen in profile, should be indicated by the same noun as the dog of three-fifteen, seen frontally.
...
He had effortlessly learned English, French, Portuguese, Latin. I suspect, nevertheless, that he was not very good at thinking. To think is to ignore (or forget) differences, to generalize, to abstract. In the teeming world of Ireneo Funes there was nothing but particulars -- and they were virtually immediate particulars.


Collected Fictions, by Jorge Luis Borges, translated by Andrew Hurley, p. 136.

Saturday, February 2, 2008

Spiral Galaxy Generation in Ruby


The logarithmic spiral form often appears in nature. It is seen in nautilus shells, hurricanes, spiral galaxies, and plants. A simple Ruby program can be used to generate this pattern, and the results can be displayed with the GNU statistics package R. My Ruby code is available here.

Friday, January 25, 2008

Life is like JavaScript, not like Java

Today most object-oriented programming languages, such as Java, are based around the class concept. The class describes the properties shared by all instances (objects) of that class; and individual objects are created as instances of a particular class. Class-based object-oriented programming originated with Simula and Smalltalk. An alternative is prototype-based object-oriented programming. This is used in JavaScript, and originated with Self. In prototype-based object-oriented programming there are no classes, only instances. New instances are created by copying the properties of an existing instance (the prototype) and adding new properties.

The difference between class-based and prototype-based programming is a useful metaphor for understanding the complexities of the biological concept of Species. Life consists of instances copied from one another (like JavaScript). Species are a class hierarchy (like Java) we try to impose on the instances after the fact -- we forget species is a man-made construct. Nature just does whatever it does, and we try to simplify it by mapping to categories and rules. Certainly species is a useful concept for living in the world. You don't want to take a child to the zoo and say "see the animal instance?" "see the plant instance?" (or just "see the instance?" !). You want to say "see the giraffe?". However, the usefulness of species (or folk categories) does not mean they always obey our expectations.

The difference between class-based and prototype-based programming is also related to the philosophical Problem of Universals. Do classes exist in the real world, or only instances? The most famous theory of real-world classes is Plato's Theory of Forms. In Platonic Idealism pure archetypes for different objects (such as chairs, or giraffes), and ideas (such as justice, or triangles), actually exist in some other space or dimension our minds can contact. This theory was modified, but not completely rejected, by Aristotle's Theory of Universals. Aristotle thought universals were constructed from the common properties of the instances, rather than existing as pure Forms somewhere else. From these Greeks we inherited the idea of dividing the world up into invariant categories, and performing logical operations on them. That works to a point, and makes science possible. But the simplified model is not the real world -- "the map is not the territory".

The psychology of folk-biology, along with folk-psychology, folk-physics, etc. arose naturally through our evolutionary heritage and individual maturation and learning via interaction with the natural world. Even beliefs in the soul, etc. are understandable in this context. A great book on all this is Philosophy in the Flesh : The Embodied Mind and Its Challenge to Western Thought, by George Lakoff and Mark Johnson. Yes, it is a 624 page slog. In part II some of the examples can be skimmed. But if you make it through it will completely change your understanding of what metaphor is and how we think. You will realize that 99% of philosophy is out of date. That we often just trade one set of absolutes for another, without understanding the real biological reasons why we are wired to search for and believe in absolutes in the first place.

Sunday, January 20, 2008

Did the First Giraffe Have a Navel?

Intelligent Design is turning into a hot topic. Given all the efforts being made by IDers to get attention, it is surprisingly hard to find details on how their alternative to Evolution is supposed to work.

A common criticism among IDers is about the lack of fossils of "intermediate forms" between species that evolved from one another. The ID answer is that there was no intermediate form; instead the new species was created whole. An example of an allegedly missing intermediate form is a short-necked ancestor of the giraffe.

What I'm not finding any details about is exactly how, according to IDers, the first giraffe appeared. Did 2 adults (one male, one female) miraculously poof into existence a million years ago? Or was the first pair of baby giraffes born to and raised by deer? The first seems more likely if the intelligent designer was a deity, the second if it was space aliens[1].

Any links to explanations would be appreciated. Does anybody know if The Design of Life (3rd edition of the ID high school biology textbook Of Pandas and People) has an answer?

[1] "It could be space aliens," said William Dembski, a mathematician and philosopher at Baylor University in Texas and author of No Free Lunch [and later The Design of Life], a new book on intelligent design. "There are many possibilities." San Francisco Chronicle, Sunday, March 17, 2002