Wednesday, September 5, 2012

Models of human skills: gaussian, bimodal or power-law?

The first thing I have to say is that this post actually contains my own reflections. I have not found clear scientific facts that actually prove these ideas wrong or right. On the other hand, if you have evidence (in both directions) I'd be glad to have it.

On of my first contacts with probability distributions was probably around 7-8th grade. My maths teacher was trying to explain to us the concept of gaussian distribution, and as an example, he took the grade distribution. He claimed that most of the students get average results and that comparatively fewer students got better or worse grades (and the more "extreme" grades were reserved for a few).

I heard similar claims from many teachers of different subjects and I have not been surprised. It looks reasonable that most people are "average" with fewer exceptionally good or exceptionally bad exemplars. The same applies to lots of different features (height, for example).


201209051915.jpg

Lots of years later, I read a very interesting paper claiming that, instead the distribution of programming results usually has two humps. For me, the paper was sort of an epiphany. In fact, my own experience goes in that direction (but here, I may be biased because I have not collected data). The paper itself comes with quite lot of data and examples, though.

So apparently there are two models to explain the distribution of human skills: the bell and the camel. My intuition is that for very simple abilities, the distribution is expected to be a gaussian. Few people have motivation to become exceptionally good, but the thing is simple enough that few remain exceptionally bad either. On the other hand, sometimes there is simply some kind of learning barrier. It's like a big step that not everybody is able to make. This is common for non-entirely trivial matters.

In this case, the camel is easily explained. There are people who climbed the step and people who did not. Both groups are internally gaussian distributed. Another example could be an introductory calculus exam. If the concepts of limit and continuity are not clear, everything is just black magic. However, some person may still get decent marks because of luck or a lower level form of intuition. Among those that have the fundamental concepts clear there is the usual gaussian distribution.

However, both models essentially assume that the maximum amount of skill is fixed. In fact, they are used to fit grades, that have, by definition, a maximum and a minimum value. Not every human skill has such property. Not every human skill is "graded" or examined in a school like environment. And even for those that actually are, usually grades are coarse grained. There is probably a big difference between a programming genius (like some FOSS super stars) and a regular excellent student. Or between, say, Feynman and an excellent student taking full marks. The same thing may be even more visible for things usually perceived as artistic. Of course, it is hard to devise a metric that can actually rank such skills.

My idea is that for very hard skills, the distribution is more like a power law (at least in the tail). Very few people are good, most people are not even comparable to those and the probability to find someone twice as good as a very good candidate is just a fraction.

Just to conclude, I believe that for very simple skills most of the time we have a gaussian. If we have "learning steps" or some distinctive non-continuous feature that is related, then we have multi-modal distributions (like the camel for programming: those that are able to build mental models of the programs are successful, those that only use trial and error until the programs "compile" are unsuccessful). Eventually, for skills related to exceptionally hard (and challenging and gratifying) tasks we have power-law distributions. May we even use such distributions to classify the skills and the tasks? And in those cases, a gaussian distribution is a good thing or not?

So that, given someone that is able to make mental models of programs, maybe Haskell programming remains multi-modal, because there are some conceptual steps that go beyond basic programming, while Basic is truly gaussian? Is OOP gaussian? And FP?

And in OOP languages, what about static typing? Maybe dynamic typing is a camel (no, not because of Perl) because those that write tests and usually know what they are doing fall in the higher hump, while those that just mess around fall in the lower hump. Ideas?

No comments: