Striking out

by Russ Roberts on April 8, 2008

in Data, Sports

Sometimes I get depressed about the quality of statistical work in economics. Then I read something from another social science. Here is a recent study where psychologists find that having the initial "K" increases your chance of striking out when playing professional baseball. Why? Well, it’s obvious isn’t it? The letter "K" is used when keeping score in baseball to represent striking out. So it’s obvious now isn’t it? Still don’t get it? Neither do I. But hey, it’s in the data. Between 1913 and 2006, players with first or last initial "K" struck out 18.8% of the time compared to 17.2% for the fortunate players unhandicapped by their initials. Here is the "explanation" of the authors:

Despite a universal desire to avoid striking out, K-initialed players strike out more often.  For those players, we argue that the explicitly negative performance outcome may feel implicitly  positive. Even Karl “Koley” Kolseth would find a strikeout aversive, but on the whole, he might  find it a little less aversive than players who do not share his initials, and avoid it less  enthusiastically.

But why? Why would having the initial "K" make striking out more pleasant? I just don’t get it. The authors go on to "test" their theory by looking at grades of a sample of MBA students:

The MBA students in our sample are well aware of a direct connection between academic  performance and successful job placement. Nevertheless, despite the pervasive desire to achieve  high grades, students with an unconsciously-driven fondness for C’s and D’s were slightly less  successful at achieving their conscious goal.

That is, Charles Darwin received poorer grades than Alan Alda. But it turns out that Alan Alda didn’t do better than the non-ABCD initialed:

Interestingly, A- or B-initialed students did not perform better than students whose  initials were grade-irrelevant. There are two possible explanations for this. First, students with  grade-irrelevant initials may already be maximally motivated to succeed. Second, because  performance is determined by motivation and ability, any increased motivation to succeed that  arises from having initials that match positive performance outcomes may not necessarily  translate into increased performance.

There is, of course, a third explanation: there is no real relationship and the authors have been fooled by randomness. Yes, their results are statistically significant. But how many relationships did they explore before finding the ones that were statistically significant. And ho many relationships are there to explore? To really test the theory, you’d have to look at baseball players with the initial "E" and see if they commit more errors than others. You’d have to look at guards in the NBA to see if those with initials "A" have more assists. Centers whose initials include an "R" should be better rebounders. You’d have to look and see whether students with the initials IC were more likely to take an "incomplete" in a class.

I guess Rabbi Jonathan Sacks, the Chief Rabbi of England should have been a football player. Or maybe he just gets fired more often than the average Briton because it doesn’t bother him as much as someone with a different last name.

Did Kafka know baseball scoring? Does this explain why he found success in life so difficult? Is this why he named a character "K"?

Do players whose initials are a backwards "K" strike out looking more than the average?

Be Sociable, Share!



77 comments    Share Share    Print    Email


Methinks April 8, 2008 at 11:42 am

These guys need to be introduced to the word "spurious" and forced to pay back the federal grant they undoubtedly received to engage in this nonsense.

marysienka April 8, 2008 at 12:04 pm

Articles like this make me chuckle. Thanks, Russ!

Stretch April 8, 2008 at 12:06 pm

Were pitchers over-represented in this population? And if so, would a K named pitcher strike more batters out?

The sad part is I'm sure these guys take themselves way too seriously.

"These findings provide striking evidence that unconscious wants can insidiously undermine conscious pursuits."

I can't decide whether to laugh or cry.

PaulD April 8, 2008 at 12:11 pm

This is an obvious example of data mining. Although it is obvious in this example, there are many similar examples that are not obvious to others. Just pick up books or articles on picking stocks, and one can find all sorts of examples of data mining.

Randy April 8, 2008 at 12:14 pm

Noticed that the authors are from schools of management. Wondering if they are in training to be pointy-haired bosses or being paid to turn out pointy-haired bosses.

Avatar300 April 8, 2008 at 12:35 pm

My last name starts with a "K" and I never liked striking out. And I was generally more patient at the plate and took more walks then my teammates, but my initials do include "P" or "W".

Of course, I'm a pretty small sample size.

Avatar300 April 8, 2008 at 12:36 pm

Oops, "do include" should be "do not include".

Brad Hutchings April 8, 2008 at 12:54 pm

I wonder what happens when you throw Kevin Mitchell out. But seriously, what percentage of players had K initials? The smaller percentage, the less statistically significant the difference.

Methinks April 8, 2008 at 12:55 pm

Don't be put off by your small sample size, Avatar!

These great researchers certainly wouldn't let a thing like that stand in their way!

Brad Hutchings April 8, 2008 at 12:56 pm

"Centers whose initials include an "R" should be better rebounders."

Dennis Rodman… although he was a forward. Coincidentally, his middle name is Keith.

John S. April 8, 2008 at 1:05 pm

This was discussed on various blogs related to baseball and statistics about six months ago. For some interesting insights, see this post:

They found that yes indeed, batters with an initial K struck out slightly more often that average. But there were eight other initials that were even worse.

Jack April 8, 2008 at 1:07 pm

And he teaches at Yale? Really? Somebody needs to point him to Andrew Gelman's papers and others in that literature (Bonferroni bounds, Hal White's data snooping tests….)

dave smith April 8, 2008 at 1:21 pm

Av300, I'll second the point about not being discouraged by your small sample size. Just put yourself in the data 1,000 times and you'll have a big sample.

tw April 8, 2008 at 1:26 pm

Truly a study devoid of merit and one that wasted resources. I suppose now Dave Kingman will go on the ESPN lecture tour and claim that his massive strikeout ratio wasn't really his fault…he was inherently doomed from birth.

Matt C. April 8, 2008 at 1:40 pm

My question is, how in the heck did anyone even think to ask such an asinine question? Are pyschologists really that short on topics about which they can write? Honestly, who even conciously or unconciously automatically associates initials with a scoring metric?

Marcus April 8, 2008 at 1:48 pm

I'm wondering. How do they explain variations from the mean for other letters?

save_the_rustbelt April 8, 2008 at 1:51 pm

I'm reminded of the old joke about the economist who drowned in a river with an average depth of six inches.

Economists (yes even here) torque around numbers with the best of them, and even engage in selectivity designed to mislead.

Methinks April 8, 2008 at 1:52 pm

My question is, how in the heck did anyone even think to ask such an asinine question?

Two words:

Grant money

mpkomara April 8, 2008 at 2:14 pm

Am I an asshole for suggesting that Latin American baseball players are less likely to have a K in their name than descendants of Eastern European families, and perhaps it is a cultural phenomenon that the former group are less likely to strike out than the latter? (I have two K's in my last name, and that's only one out away from retiring the side.)

shawn April 8, 2008 at 2:17 pm

mpk…HA…i love it; perfect point.

Grant April 8, 2008 at 3:18 pm

After some initial bewilderment, my first thought was that people from different ethnic groups and cultures were more likely to have certain initials than the rest of the population.

Did they control for race and culture at all? I'm assuming they at least controlled for gender?

dave smith April 8, 2008 at 3:25 pm

I wonder if the predicted values from their regression went anywhere near the mean of the actual data…..

….sarcasm, of course, as this is a property of all regressions.

noahpoah April 8, 2008 at 6:53 pm

It just occurred to me how odd it is that Russell Roberts, or R.R., which is to say, R-squared, isn't a bigger fan of regression.

Mesa Econoguy April 8, 2008 at 7:14 pm

Russ’ kurtosis precludes that.

Justin Ross April 8, 2008 at 7:19 pm

The progressive thing to do for the sake of equity would be to allow players with K's in their names to have more strikes before being out.

Bill April 8, 2008 at 10:15 pm

Unfortunately, that was from a business school not a school of social science. It could easily have come from a medical school. I stopped paying attention to reports of medical findings because most of them are innumerate as well.

My question is how do referees let this through? Now THAT's scary.

Several flaws in the report are pretty obvious. First "Kingman" , a player from the 1970s alone accounts for nearly a third of the deviation from the null hypothesis. A cursory review from a similar (but not identical) data set suggests they used a binomial distribution with all batters with the same letter category having the same mean strikeout. Since there is a significant variation among individual players, the null is almost certain to be false once you partition your data set among the smaller subsets. Kingman easily skews "k". In fact, I found huge deviations for every single letter. Their model was wrong, every subset was skewed because there are not enough individual players to wash out.

For grades, there was no difference between A&B, nor between C&D. That should have been the end of it. Besides, the absolute difference between AB and CD was about 0.02 with a mean around 3.4. That's one letter grade in 50 for a poor CD? That's a tiny effect even before being swamped by different standards in different classes and schools. Besides, with an average around 3.4, Just how many grades of C let alone D could have been in that data set? I'd have expected A-B to be the much bigger contributer, but its not there.

Mike April 8, 2008 at 10:32 pm

It's funny these guys are both from management schools. I'm a development econ student currently taking an international finance class in the business school, and I'm doing a regression on foreign exchange rates for a project we're working on.

The instructions for the regression analysis are ridiculous. They ignore autocorrelation and multicolinearity effects, and when I brought this up to my groupmates and the professor, it was clear that none of them knew anything about stats past how to run a regression in Excel. Meanwhile, I've only got two methods classes under my belt compared to my instructor's PhD.

It makes me think I could make a fortune in the finance world as the one of the only competent statisticians.

Christopher W. April 9, 2008 at 5:43 am

Would this work in reverse? Would pitchers with a K be better hurlers? Worked for Kevin Brown. Not so much for Knolan Ryan. :)

brian April 9, 2008 at 7:44 am

Skepticism is always good, but one should examine the evidence at least before concluding that it's bunk!

I learned about these findings years ago. This paper is a new study that came out last year, replicating the results of the first one. This result has been replicated time and time again from different data sets, so it deserves some attention.

It's possible there's a different explanation, but the fact that people with the letter K in their initials strike out more often has been shown many many times. Just as the result that people with C's and D's in their initials get more C's and D's has been shown many different times.

brian April 9, 2008 at 7:45 am

"The progressive thing to do for the sake of equity would be to allow players with K's in their names to have more strikes before being out.

Posted by: Justin Ross"

Assuming you say this in jest, am I to understand that you oppose handicaps in gold because they are progressive?

liberty April 9, 2008 at 12:52 pm

Are you sure this wasn't an April Fools study?

Paris April 9, 2008 at 5:07 pm

Did they make some sort of Bonferroni adjustment for multiple comparisons?

If they think they have discovered a "significant" correlation, they should test the hypothesis prospectively.

Assuming this wasn't an April Fools study.

bee April 9, 2008 at 8:36 pm

A fine example of junk science. This paper would be an F in a methods class. An example of spurious correlation. I guess if there is a consensus then it is correct.

Joe Jaegers April 17, 2008 at 8:53 pm

Wow. I am amazed at the effort that we as a society goe to, to prove a point by a point(% point that is). I once had a boss that asked me to prepare a report to prove the worth of his proposal to top management by using statistics gathered from manufacturing records. I told him tell me what you want to see and I will make it happen. The figures were all true, just presented in a different manner.

Sports Betting Champ Review April 21, 2008 at 6:34 pm

Hey, Nice Blog Here, I have been heavily into sports betting for a few years now.

I recently came across a very impressive system for winning 97% of all bets in the NBA and the MLB.

Its called sports betting champ, and it actually does what it says it does. For a more detailed look, check out the URL in my comment. All the best

Donald September 18, 2008 at 1:10 pm

I agree, this is the way we should all run our lives, sports, personal or otherwise.

Athletic College Recruiting October 17, 2008 at 11:53 pm

Dang… I just don't think I can get enough sports. Shhh my wife is coming. lol Hey thanks for the post and my for satisfying my need to "feed on sports" info. Kenney

Chelsea Football club October 30, 2008 at 1:04 am

Yaa ,you are right this was a very good post,thanks for it.

{ 1 trackback }

Previous post:

Next post: