Using Mechanical Turk for behavioral experiments

I’m seeing more and more work using Mechanical Turk as a subject pool.  Here’s another piece discussing some of the features, advantages and problems with Mechanical Turk – Rand, D (2011), The promise of mechanical turk: how online labor markets can help theorists run behavioral experiments, Journal of Theoretical Biology.


Combining evolutionary models with behavioral experiments can generate powerful insights into the evolution of human behavior. The emergence of online labor markets such as Amazon Mechanical Turk (AMT) allows theorists to conduct behavioral experiments very quickly and cheaply. The process occurs entirely over the computer, and the experience is quite similar to performing a set of computer simulations. Thus AMT opens the world of experimentation to evolutionary theorists. In this paper, I review previous work combining theory and experiments, and I introduce online labor markets as a tool for behavioral experimentation. I review numerous replication studies indicating that AMT data is reliable. I also present two new experiments on the reliability of self-reported demographics. In the first, I use IP address logging to verify AMT subjects’ self-reported country of residence, and find that 97% of responses are accurate. In the second, I compare the consistency of a range of demographic variables reported by the same subjects across two different studies, and find between 81% and 98% agreement, depending on the variable. Finally, I discuss limitations of AMT and point out potential pitfalls. I hope this paper will encourage evolutionary modelers to enter the world of experimentation, and help to strengthen the bond between theoretical and empirical analyses of the evolution of human behavior.


VCs Innovate VC

An interesting development in venture financing is the creation of the “lean finance” model. This is an adaptation to winner-take-all markets; i.e., markets in which the best performer captures a massive share of the market. The funding model is to provide the minimum funding necessary to reach the point at which it becomes apparent who the winner is likely to be. Then, investors do a huge, “shovel-in” round of funding to seal it. On Friday, the Swedish commerce startup Klarna raised $155m following its May, 2010 round of only $9m.

Dropbox, a company whose product is well-known in academic circles, similarly raised $250m at the point it boasted 45m users, following a previous round of only $7m. An intriguing wrinkle is that different experts may have different opinions about  who the winner is going to be. Around the same time Dropbox raised its shovel-in round of funding, so did one of its primary competitors,, which raised $81m at the point it hit 7m users.  The solution to the puzzle may be that is viewed as the likely winner of enterprise segment (having turned down a $500m acquisition offer), while Dropbox is poised to take the personal user segment.

Technology Entrepreneurship by Chuck Eesley

Another class to add to the mix (here’s the previous post) — Chuck Eesley is teaching a free online Technology Entrepreneurship class.  I exchanged emails with Chuck and a mere 33,000 people have signed up for the course.  So far.

U got flu? Bio-surveillance, networks and twitter

Twitter is emerging as a popular source of data for scientists — see various twitter-related arXiv articles here. For example, here’s a piece validating the Dunbar number by looking at social interactions among 1.7 million people on Twitter (now published in PLoS ONE).  At I posted about a recently published Science piece attempting to measure aggregate mood by analyzing millions of tweets.

Here’s a set of papers studying twitter and health-related issues.  One paper suggests that monitoring the Twittersphere makes “bio-surveillance” possible – OMG U got flu? Analysis of shared health messages for bio-surveillance.

Here’s the abstract:

Background: Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts (‘tweets’) are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\’e (2009)’s behaviour response survey. Applying the guidelines to a corpus of 5283 Twitter messages related to influenza like illness showed a high level of inter-annotator agreement (kappa 0.86). We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a self-reported diagnosis. In addition to classification performance we report moderately strong Spearman’s Rho correlation by comparing classifier output against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season. Conclusions: The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks. We believe that the signals we have modelled may be applicable to a wide range of diseases.

Innovations in Nordic Sport

I grew up in Finland where winter sports are huge.  Several winter sports saw quite significant transformations while I was following them in the late 80s.  One was skijumping, the other cross-country skiing.

In skijumping one of the big rivalries during the 1988-1989 season was between Sweden’s Jan Boklov and Finland’s Matti Nykanen.  Nykanen was a skijumping phenom – by the late 80s he was a veteran who had already won four previous world cup titles.  But Boklov introduced a style of skijumping that radically changed the physics and even aesthetics of the sport.  His V-style jumps carried him further and eventually led to a “paradigm shift” of sorts in the sport (judges at first discounted the technique, to an extreme).   “Style” points were quite important in skijumping (see Nykanen’s style versus Boklov’s style in the clip below). But the “uglier” V-style eventually had to be integrated given its clear superiority.  The V-style, introduced by Boklov (and a few others) in the late 80s, is now the exclusive approach in skijumping.

A similar, stylistic innovation also radically shaped cross country skiing.  Traditional cross country skiing was largely about a gliding motion on an established track. But in the 70s and 80s it became increasingly clear that “skating” was actually a far better and faster approach to skiing (the Finn Pauli Siiltonen apparently gets some credit, though the technique was used by many in different forms).  The skating technique, by the late 1980s, led to the creation of two separate sports: classic cross country and skate skiing (separate events in the olympics as well).

There you go: a bit of random trivia you might need in Trivial Pursuit, or to impress your friends over Thanksgiving dinner or if you need a sporting-related example for a class discussion.

Inventions are inevitable

So, here’s the argument: inventions (including theories and technologies) are inevitably invented.  (This links nicely with Sid Winter’s thesis.) Thus we shouldn’t focus on or celebrate mythic “heroes” who happen to get credit for inventions that are inevitable – someone else would have invented them if the hero wasn’t around (Simonton highlights the increased instance of simultaneous discovery, here’s a wiki site cataloguing simultaneous discoveries).  As Robert Merton put it – “discoveries become virtually inevitable when prerequisite kinds of knowledge and tool accumulate.”

Kevin Kelly talks about this in his book What Technology WantsHe pulls in examples from mathematics and physics.  For example, Einstein was ahead of his time with the theory of relativity but some scholars were concurrently looking at similar questions and would inevitably have come up with the same theory.

Sort of an interesting issue embedded in here.  That is, discovering the realities and truths of nature is one thing – but clearly the possibilities and forms that technologies might take is a very different issue.  This is the space that the STS folks (Science & Technology Studies) have carved out – though they employ a confused epistemology and frequently overstep their bounds (Latour/Woolgar’s Laboratory Life is an example of this problem).  More perhaps on this later.

Here’s a figure from Kevin Kelly’s book (sorry, the quality isn’t the hottest).

Vertical Integration and a Teardown Analysis of the iPhone 4S

iPhone 4S Teardown

Last week there was a very useful WSJ article reporting on an analysis of the supplier relationships at the core of the new iPhone 4S (here … while it lasts). This seems like a nice mini-case analysis to see how our theories seem to explain actual outcomes.

They note that Qualcomm “is the big winner” because it is supplying a suite of chips that adds up to $15 per phone. Intel is a loser because it acquired Infineon and then those chips were dropped from the product.

Samsung lost out on the memory chips to its Korean rival Hynix — a surprise since Samsung is known to have a more reliable product. However, interestingly, Samsung did retain its role as the manufacturer of Apple’s proprietary A5 processor which provides the iPhone 4S and the iPad 2 with the bulk of its computing power.

Read the rest of this entry »