Sometimes I get depressed about the quality of statistical work in economics. Then I read something from another social science. Here is a recent study where psychologists find that having the initial "K" increases your chance of striking out when playing professional baseball. Why? Well, it’s obvious isn’t it? The letter "K" is used when keeping score in baseball to represent striking out. So it’s obvious now isn’t it? Still don’t get it? Neither do I. But hey, it’s in the data. Between 1913 and 2006, players with first or last initial "K" struck out 18.8% of the time compared to 17.2% for the fortunate players unhandicapped by their initials. Here is the "explanation" of the authors:
Despite a universal desire to avoid striking out, K-initialed players strike out more often. For those players, we argue that the explicitly negative performance outcome may feel implicitly positive. Even Karl “Koley” Kolseth would find a strikeout aversive, but on the whole, he might find it a little less aversive than players who do not share his initials, and avoid it less enthusiastically.
But why? Why would having the initial "K" make striking out more pleasant? I just don’t get it. The authors go on to "test" their theory by looking at grades of a sample of MBA students:
The MBA students in our sample are well aware of a direct connection between academic performance and successful job placement. Nevertheless, despite the pervasive desire to achieve high grades, students with an unconsciously-driven fondness for C’s and D’s were slightly less successful at achieving their conscious goal.
That is, Charles Darwin received poorer grades than Alan Alda. But it turns out that Alan Alda didn’t do better than the non-ABCD initialed:
Interestingly, A- or B-initialed students did not perform better than students whose initials were grade-irrelevant. There are two possible explanations for this. First, students with grade-irrelevant initials may already be maximally motivated to succeed. Second, because performance is determined by motivation and ability, any increased motivation to succeed that arises from having initials that match positive performance outcomes may not necessarily translate into increased performance.
There is, of course, a third explanation: there is no real relationship and the authors have been fooled by randomness. Yes, their results are statistically significant. But how many relationships did they explore before finding the ones that were statistically significant. And ho many relationships are there to explore? To really test the theory, you’d have to look at baseball players with the initial "E" and see if they commit more errors than others. You’d have to look at guards in the NBA to see if those with initials "A" have more assists. Centers whose initials include an "R" should be better rebounders. You’d have to look and see whether students with the initials IC were more likely to take an "incomplete" in a class.
I guess Rabbi Jonathan Sacks, the Chief Rabbi of England should have been a football player. Or maybe he just gets fired more often than the average Briton because it doesn’t bother him as much as someone with a different last name.
Did Kafka know baseball scoring? Does this explain why he found success in life so difficult? Is this why he named a character "K"?
Do players whose initials are a backwards "K" strike out looking more than the average?









{ 38 comments }
These guys need to be introduced to the word "spurious" and forced to pay back the federal grant they undoubtedly received to engage in this nonsense.
Articles like this make me chuckle. Thanks, Russ!
Were pitchers over-represented in this population? And if so, would a K named pitcher strike more batters out?
The sad part is I'm sure these guys take themselves way too seriously.
"These findings provide striking evidence that unconscious wants can insidiously undermine conscious pursuits."
I can't decide whether to laugh or cry.
This is an obvious example of data mining. Although it is obvious in this example, there are many similar examples that are not obvious to others. Just pick up books or articles on picking stocks, and one can find all sorts of examples of data mining.
Noticed that the authors are from schools of management. Wondering if they are in training to be pointy-haired bosses or being paid to turn out pointy-haired bosses.
My last name starts with a "K" and I never liked striking out. And I was generally more patient at the plate and took more walks then my teammates, but my initials do include "P" or "W".
Of course, I'm a pretty small sample size.
Oops, "do include" should be "do not include".
I wonder what happens when you throw Kevin Mitchell out. But seriously, what percentage of players had K initials? The smaller percentage, the less statistically significant the difference.
Don't be put off by your small sample size, Avatar!
These great researchers certainly wouldn't let a thing like that stand in their way!
"Centers whose initials include an "R" should be better rebounders."
Dennis Rodman… although he was a forward. Coincidentally, his middle name is Keith.
This was discussed on various blogs related to baseball and statistics about six months ago. For some interesting insights, see this post: http://www.hardballtimes.com/main/blog_article/ridiculous-science
They found that yes indeed, batters with an initial K struck out slightly more often that average. But there were eight other initials that were even worse.
And he teaches at Yale? Really? Somebody needs to point him to Andrew Gelman's papers and others in that literature (Bonferroni bounds, Hal White's data snooping tests….)
Av300, I'll second the point about not being discouraged by your small sample size. Just put yourself in the data 1,000 times and you'll have a big sample.
Truly a study devoid of merit and one that wasted resources. I suppose now Dave Kingman will go on the ESPN lecture tour and claim that his massive strikeout ratio wasn't really his fault…he was inherently doomed from birth.
My question is, how in the heck did anyone even think to ask such an asinine question? Are pyschologists really that short on topics about which they can write? Honestly, who even conciously or unconciously automatically associates initials with a scoring metric?
I'm wondering. How do they explain variations from the mean for other letters?
I'm reminded of the old joke about the economist who drowned in a river with an average depth of six inches.
Economists (yes even here) torque around numbers with the best of them, and even engage in selectivity designed to mislead.
My question is, how in the heck did anyone even think to ask such an asinine question?
Two words:
Grant money
Am I an asshole for suggesting that Latin American baseball players are less likely to have a K in their name than descendants of Eastern European families, and perhaps it is a cultural phenomenon that the former group are less likely to strike out than the latter? (I have two K's in my last name, and that's only one out away from retiring the side.)
mpk…HA…i love it; perfect point.
After some initial bewilderment, my first thought was that people from different ethnic groups and cultures were more likely to have certain initials than the rest of the population.
Did they control for race and culture at all? I'm assuming they at least controlled for gender?
I wonder if the predicted values from their regression went anywhere near the mean of the actual data…..
….sarcasm, of course, as this is a property of all regressions.
It just occurred to me how odd it is that Russell Roberts, or R.R., which is to say, R-squared, isn't a bigger fan of regression.
Russ’ kurtosis precludes that.
The progressive thing to do for the sake of equity would be to allow players with K's in their names to have more strikes before being out.
Unfortunately, that was from a business school not a school of social science. It could easily have come from a medical school. I stopped paying attention to reports of medical findings because most of them are innumerate as well.
My question is how do referees let this through? Now THAT's scary.
Several flaws in the report are pretty obvious. First "Kingman" , a player from the 1970s alone accounts for nearly a third of the deviation from the null hypothesis. A cursory review from a similar (but not identical) data set suggests they used a binomial distribution with all batters with the same letter category having the same mean strikeout. Since there is a significant variation among individual players, the null is almost certain to be false once you partition your data set among the smaller subsets. Kingman easily skews "k". In fact, I found huge deviations for every single letter. Their model was wrong, every subset was skewed because there are not enough individual players to wash out.
For grades, there was no difference between A&B, nor between C&D. That should have been the end of it. Besides, the absolute difference between AB and CD was about 0.02 with a mean around 3.4. That's one letter grade in 50 for a poor CD? That's a tiny effect even before being swamped by different standards in different classes and schools. Besides, with an average around 3.4, Just how many grades of C let alone D could have been in that data set? I'd have expected A-B to be the much bigger contributer, but its not there.
It's funny these guys are both from management schools. I'm a development econ student currently taking an international finance class in the business school, and I'm doing a regression on foreign exchange rates for a project we're working on.
The instructions for the regression analysis are ridiculous. They ignore autocorrelation and multicolinearity effects, and when I brought this up to my groupmates and the professor, it was clear that none of them knew anything about stats past how to run a regression in Excel. Meanwhile, I've only got two methods classes under my belt compared to my instructor's PhD.
It makes me think I could make a fortune in the finance world as the one of the only competent statisticians.
Would this work in reverse? Would pitchers with a K be better hurlers? Worked for Kevin Brown. Not so much for Knolan Ryan.
Skepticism is always good, but one should examine the evidence at least before concluding that it's bunk!
I learned about these findings years ago. This paper is a new study that came out last year, replicating the results of the first one. This result has been replicated time and time again from different data sets, so it deserves some attention.
It's possible there's a different explanation, but the fact that people with the letter K in their initials strike out more often has been shown many many times. Just as the result that people with C's and D's in their initials get more C's and D's has been shown many different times.
"The progressive thing to do for the sake of equity would be to allow players with K's in their names to have more strikes before being out.
Posted by: Justin Ross"
Assuming you say this in jest, am I to understand that you oppose handicaps in gold because they are progressive?
Are you sure this wasn't an April Fools study?
Did they make some sort of Bonferroni adjustment for multiple comparisons?
If they think they have discovered a "significant" correlation, they should test the hypothesis prospectively.
Assuming this wasn't an April Fools study.
A fine example of junk science. This paper would be an F in a methods class. An example of spurious correlation. I guess if there is a consensus then it is correct.
Wow. I am amazed at the effort that we as a society goe to, to prove a point by a point(% point that is). I once had a boss that asked me to prepare a report to prove the worth of his proposal to top management by using statistics gathered from manufacturing records. I told him tell me what you want to see and I will make it happen. The figures were all true, just presented in a different manner.
Hey, Nice Blog Here, I have been heavily into sports betting for a few years now.
I recently came across a very impressive system for winning 97% of all bets in the NBA and the MLB.
Its called sports betting champ, and it actually does what it says it does. For a more detailed look, check out the URL in my comment. All the best
I agree, this is the way we should all run our lives, sports, personal or otherwise.
Dang… I just don't think I can get enough sports. Shhh my wife is coming. lol Hey thanks for the post and my for satisfying my need to "feed on sports" info. Kenney
Yaa ,you are right this was a very good post,thanks for it.
{ 1 trackback }