Friday, March 27, 2009

Brain Scans, Machine Learning, and Trillion-word Web Text Corpus

"The question of how the human brain represents conceptual knowledge has been debated in many scientific fields. Brain imaging studies have shown that different spatial patterns of neural activation are associated with thinking about different semantic categories of pictures and words (for example, tools, buildings, and animals). We present a computational model that predicts the functional magnetic resonance imaging (fMRI) neural activation associated with words for which fMRI data are not yet available. This model is trained via a combination of data from a trillion-word text corpus, and observed fMRI data associated with viewing several dozen concrete nouns. Once trained, the model predicts fMRI activation for thousands of other concrete nouns in the text corpus, with highly significant accuracies over the 60 nouns for which we currently have fMRI data."

Predicting Human Brain Activity Associated with the Meanings of Nouns, Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, Marcel Adam Just, Science, 320, pp. 1191-1195, May 30, 2008.

This research is the most interesting I have seen on the mapping of internal brain structure based on analysis of large bodies of text available from the web. When generating machine-learned models from fMRI data, they found that for their tested set of concrete nouns the most accurate intermediate semantic features were sensory-motor verbs. This matches with others theories of ideas being represented as the convergence of many related sensory patterns. For instance, apple as the convergence of the word "apple", redness, shiny, apple taste, apple texture, picking-by-hand, etc.