Wednesday, June 09, 2004

Sean Casey and hitting .400

There was a story in last week's New York Times specifically about Sean Casey's torrid start and more generally about his quest to hit .400. Of course, the last time someone hit .400 he was named Ted Williams and it was 1941. Since then there have been several near-misses (Carew in 1977, Brett in 1980, Gwynn a few times in the 1990's), but the article got me to thinking about what it means to hit .400. I know, you have to average two hits per game, but I'm more interested in how much better was a player compared to his peers, regardless of whether he hit .400 or not.

In other words, even though Wade Boggs hit .366 in 1988 (34 points short of .400), how much better was his .366 when compared to other players' batting averages that year. I decided to look at a truncated list (I basically picked players who had historically high batting averages -- with the exception of Musial in 1951 -- and looked to see how much better they performed than their peers for a given season), use zscores to standardize all batting averages for a given season, and then use the zscores as a means of ranking player performances (A quick note on zscores: zscores measure how much better or worse a particular observation is compared to the average. Increasingly positive zscores are increasingly better than the average and the opposite is true for decreasingly negative zscores).

year player ba zscore
1977 Carew 0.388 4.19
1980 Brett 0.390 4.05
1941 Williams 0.406 3.71
1988 Boggs 0.366 3.50
1999 Walker 0.379 3.40
2000 Helton 0.372 2.82
2004 Casey 0.377 2.75
1951 Musial 0.336 2.29
A quick look at eight seasons, including Ted Williams' .406 season in 1941, and a few things stick out. First, Williams has a zscore of 3.71 for his 1941 performance -- which means that he was 3.71 standard deviations above the average batting average in 1941. What's more impressive is that even though George Brett his .390 in 1980 and Rod Carew hit .388 in 1977, they had higher zscores than Williams (4.05 and 4.19, respectively).

So what does this mean? Compared to their peers for a given season, Carew and Brett were statistically better than Williams when he was compared to his peers. On average, the league batting average was lower in 1977 and 1980 than it was in 1941. Why were batting averages lower in those years? Who knows. But if we're only interested in comparing performances within seasons, then Carew and Brett faired better than Williams.

Of course Williams still hit .400 and that, at least in the eyes of the media, is what people still care about. I doubt very seriously that if Harold Reynolds said something about zscores on Baseball Tonight he'd still have a job come morning. Either way, I think it's more interesting to ask a question like, "how did Williams do compared to his peers" instead of looking at a single number as a benchmark of greatness.

As an aside, in the table above, I also looked at Casey's numbers (as of a few days ago) and his zscore is only 2.75 which indicates that batting averages are bunched together, at least relative to history. We'll see if that changes by October.