2blowhards.com: Family Trees

In which a group of graying eternal amateurs discuss their passions, interests and obsessions, among them: movies, art, politics, evolutionary biology, taxes, writing, computers, these kids these days, and lousy educations.

E-Mail Donald
Demographer, recovering sociologist, and arts buff

E-Mail Fenster
College administrator and arts buff

E-Mail Francis
Architectural historian and arts buff

E-Mail Friedrich
Entrepreneur and arts buff
E-Mail Michael
Media flunky and arts buff

We assume it's OK to quote emailers by name.

CultureBlogs
• Sasha Castel
• AC Douglas
• Out of Lascaux
• The Ambler
• PhilosoBlog
• Modern Art Notes
• Cranky Professor
• Mike Snider on Poetry
• Silliman on Poetry
• Felix Salmon
• Gregdotorg
• BookSlut
• Polly Frost
• Polly and Ray's Forum
• Cronaca
• Plep
• Stumbling Tongue
• Brian's Culture Blog
• Banana Oil
• Scourge of Modernism
• Visible Darkness
• Seablogger
• Thomas Hobbs
• Blog Lodge
• Leibman Theory
• Goliard Dream
• Third Level Digression
• Here Inside
• My Stupid Dog
• W.J. Duquette

Politics, Education, and Economics Blogs
• Andrew Sullivan
• The Corner at National Review
• Steve Sailer
• Samizdata
• Junius
• Joanne Jacobs
• CalPundit
• Natalie Solent
• A Libertarian Parent in the Countryside
• Rational Parenting
• Public Interest.co.uk
• Colby Cosh
• View from the Right
• Pejman Pundit
• Spleenville
• God of the Machine
• One Good Turn
• CinderellaBloggerfella
• Liberty Log
• Daily Pundit
• InstaPundit
• MindFloss
• Catallaxy Files
• Greatest Jeneration
• Glenn Frazier
• Jane Galt
• Jim Miller
• Limbic Nutrition
• Innocents Abroad
• Chicago Boyz
• James Lileks
• Cybrarian at Large
• Hello Bloggy!
• Setting the World to Rights
• Travelling Shoes

Miscellaneous
• Redwood Dragon
• IMAO
• The Invisible Hand
• ScrappleFace
• Daze Reader
• Lynn Sislo
• The Fat Guy
• Jon Walz

Links

Our Last 50 Referrers

« Yet More Blowhard "Art" | Main | Mini Link-a-palooza »

May 31, 2003

Family Trees

Michael:

In the June edition of Scientific American, there is an interesting article by Charles H. Bennett, Ming Li and Bin Ma on a computer algorithm they devised to study the evolution of chain letters. And estimating the �relatedness� and evolutionary history of different organisms. And reconstructing the �family tree� of human languages. In short, this is one powerful algorithm.

Apparently Mr. Bennett, an IBM fellow, owns a collection of 33 chain letters, all of which are �descendants� of the same aboriginal chain letter. All were acquired or received by Mr. Bennett over a 15-year period. Because this period was prior to the computer age, such documents were reproduced on typewriters and Xerox machines. Mr. Bennett mentioned his collection during a hike in the Hong Kong mountains with University of Waterloo (Ontario) bioinfomatics professor Li. It dawned on them that the chain letters had �evolved� through multiple generations as mutations (either deliberate or accidental) were introduced during the retyping of the letters. The problem of reconstructing the �family tree� of the chain letters was a very similar problem to figuring out from DNA evidence how closely related two different species are, a problem known as phylogeny. Morover, the chain letters could serve as a test of the methods currently used to estimate such interrelatedness.

A Sample Phylogeny

Finding that most existing methods didn�t work very well on the chain letter problem, Bennett, Li and Ma (a professor of computer science) devised their own algorithm of �relatedness.� This involves retyping the letters into computer files in lower case, while ignoring the division of the text into paragraphs. This procedure converted a letter into a string of characters. The various strings are tested for relatedness by a file compression method. That is, the length of each compressed string is first measured independently, and then compared to the compressed length of a file made up of both strings arranged one after the other. If the two original strings were completely independent, the compressed length of the �summed� string would end up as the sum of the compressed lengths of the two original strings. The degree to which the compressed �summed� string is shorter than the sum of its parts indicates the degree of interrelatedness of the two strings.

This method works much better on the chain letters, and apparently quite well on DNA strings as well. That�s nice for biologists doing phylogenetics, but what I find more interesting is that it can be applied to a wider set of culturally evolving items.

Apparently three professors from La Sapienza University in Rome developed a phylogeny of human languages by applying this method to translations of the Universal Declaration of Human Rights into 52 different languages (which were conveniently available from the U. N.) The resulting family tree that the algorithm developed from the translations turns out to be quite close to the �standard� model of the historical relationships between the translation languages. Of course, the standard model has been developed by an immense analysis of the literature and history of each language, while the three professors had come up with a similar analysis just by running a computer program.

After reading this story, I happened to remember a posting I�d done on a producer in the record industry, who deliberately �composed� new hits by slightly modifying older tunes and encasing them in the aural hallmarks of contemporary music (which you can read here.) At the time I read about the producer, I wondered how widespread such a practice is in the popular music industry. My suspicion is that it is pretty common, and always has been. But it dawned on me that because music could be converted to computer file strings quite easily, with this method we could construct �family trees� for pop tunes. (Some pretty interesting litigation might spring up.)

And although it would be more difficult to track the pedigree of works of fiction, I wondered if it would be possible to reduce stories, or at least their plots, to a standard alphabet of relationships between the characters. For example, Hamlet might be reduced to �father, son, step-father, mother, murder, revenge, madness, mistaken identity.� It would be interesting to see the family tree of Hamlet�s antecedents and its descendants.

In fact, it would be interesting to equip �Google� with such a relatedness testing device and use it to create family trees for �memes� propogating themselves through cyberspace.

Now all I need is a grant (a measly $10 million or so should do) and I�ll set myself up to really study cultural evolution. Just mail the check to Friedrich�s Southern California Institute of Culture�I�ll pick out some nice real estate near the beach.

Cheers,

Friedrich

posted by Friedrich at May 31, 2003

Comments

In "Middlemarch" by George Eliot, Dorothea marries an dry, old man named Casaubon who is trying to find the "key to all mythologies"....Sounds similar to your cultural evolution theory to me.

He fails, by the way, and she goes on to marry the handsome young buck, Will Ladislaw, and finds happiness in other ways.

Just a thought.

Posted by: Deb on May 31, 2003 11:02 AM

Neat info and thinking, thanks. Reminds me of the kinds of things Nikos Salingaros was talking to us about -- algorithms for better (rather than worse) growth. I find this an amazing cultural moment, what with developments like this relatedness algorithm, with people like Alexander and Salingaros doing what they're doing, with the revival of classical forms in a lot of different fields, with neurbiologists makes some real (if still fairly primitive) headway in better understanding perception and emotion, with the genome itself finally beginning to be unravelled ... I mean, there really are built-in (if fluid), pre-given structures, methods, processes -- culture seems to flow from them, and then to feed right back into them too. Culture is like a language, which in turn is like genetics, which in turn is what makes culture possible.

Bizarre that the arts worlds aren't paying more attention to these kinds of thinking and discoveries. Why don't they, do you suppose? Just not smart enough? Too invested in and wrapped up in modernist styles of thought?

Hey, somewhere I've got a CD of Bach-esque music composed by computer. Some scientist analyzed a lot of pieces by Bach, devised rules and algorithms, fed 'em into a computer and let rip. If I remember right, he let the computer spew out a lot of compositions and then chose the ones he thought were best. Pretty good, actually. But maybe that's an interesting new role for the human to play in the process -- the chooser, the one with instincts and taste. Maybe there'll be ways of assigning the grunt work to the machine, and the fun work of making taste choices to the human...

Posted by: Michael Blowhard on May 31, 2003 11:17 AM

Actually, American jazz musicians have paid attention to underlying structures for quite some time. For example, many be-bop "originals" turned on the legal discovery that a tune could be copyrighted but the chord structure was public domain. That's how Charlie Parker came to base "Donna Lee" on the old chestnut "(Back Home Again in) Indiana" (couldn't afford the tune "Indiana," but hey, the chords were free), and why thousands of jazz standards have been worked on the 12-bar blues.

This is also the secret of Duke Ellington's extraordinarily prolific career as a composer: He wrote some 2000 pieces in his lifetime -- which puts him in roughly the same category, note for note, as Mozart. Rhythmically and harmonically, Ellington is more complex, of course, but he worked off of pre-existing forms just as Mozart did. (Sometime I might write more about the Ellington-Mozart connection, but I think for the moemnt you get the idea.)

Posted by: Tim Hulsey on May 31, 2003 12:52 PM

Hey Tim -- Couldn't agree more. Have you read Albert Murray? I'll blog about him some day, but he makes many similar points, and (son of a gun) Ellington is his hero. Murray is like a Southern literary precursor to evo-bio and neuro-science thought. I think two of his books are expecially dazzling: "Stomping the Blues" (best thing I've ever read about jazz), and "The Omni-Americans," essays about race in America.

Posted by: Michael Blowhard on May 31, 2003 1:44 PM

Hi Friedrich,
Interesting post and comments.
I was going to say that we remember tunes we like very well, and this unconscious knowledge of other tunes might cause some "plagiarism" when a person writes a new melody which is really a variation of something he/she knows well. For instance I sometimes do long and mindless right hand [guitar] arpeggios around the chords for The House of the Rising Sun. I am just plinking as an amateur, but this simple chord structure is enough for good finger practice.

Anyway, I enjoy your blog guys, I will be back
Alan

Posted by: Alan McCallum on May 31, 2003 6:09 PM

The idea of using this algorithm to analyze pop music is a good one, but harder to pull off that the areas where its been demonstrated.

First, you have to get all of the songs into a standard format. Which means probably working from digitized sheet music, as getting there from actual recordings is a very tricky problem.

Then you have to normalize the chord progressions (effectively transpose them all into a standard key), compensate for tempo somehow for analyzing rythms, and limit how many parts you are going to analyze.

Basically, squeeze the complexities of actual music down to it's essential elements in ONE dimension. That one-dimensional representation, effectively a series of letters, can then be analyzed with the algorithm in question.

It's a difficult, but not insurmountable, question of data representation and information theory. I wouldn't be surprised if it's someones ambitious Ph'd project, though :-)

I like your idea of tracking memes with some sort of super-Google a lot, daypop and popdex are fairly successful attempts to do that kind of tracking for self-selected parts of the blogosphere. I'm interested in it being applied to the larger culture, as you seemed to be.

Cheers!

Posted by: David Mercer on May 31, 2003 8:31 PM

Oh, another thing, from knowing people in the music biz, yes, producers do that kind of thing ALL the time, especially in pop music!

Posted by: David Mercer on May 31, 2003 8:33 PM

I doubt that there's a sufficiently reliable way of reducing stories to plot elements for a computer program to trace family trees, but I'm reading _Google Hacks_ from O'Reilly, and it looks as though it wouldn't be that hard (or at least not that hard for someone who knows more about programming than I do) to write a program that would do family tree analysis for texts found on google.

Posted by: Nancy Lebovitz on June 1, 2003 10:21 AM

And think of the opportunities for eager-to-plaigiarize students! Copy, paste, and then apply the algorithm so it's just original enough to get by ... Eek.

Posted by: Michael Blowhard on June 1, 2003 11:27 AM