Wednesday, March 10, 2004

Who is the most prolific home run hitter of all time? Well it depends.
As opening day approaches, and Barry Bonds advances on the all-time home run record, I got to thinking about different ways of determining who the most prolific home run hitter of all time is. To answer that question, I'll need to be more specific and define 'prolific.'

A quick look at the record books and you can find home run leaders by season, career and home runs per at-bat. As I looked through the numbers I realized that there wasn't a statistic for how many home runs a player hit relative to other players in a season (or over a career). Let me explain what I mean. In 2003, Alex Rodriguez and Jim Thome led the major leagues with 47 home runs each. Now 47 home runs falls well short of the single-season record of 73 set by Barry Bonds in 2001, but how much better is 47 HRs in 2003 when compared to how the rest of the league did that year? Likewise, how much better is 73 HRs in 2001 when compared to how the rest of the league did that year? Or in broader terms, what players, over the course of their careers, were marginally more productive than their peers? The method I'll describe below not only allows you to answer these questions, but it also allows you to compare players across seasons and careers.

Historically, baseball statistics have focused primarily on averages (batting and on-base, for example) and sums (total home runs, RBI, runs scored). Who is tenth on the all-time HR list? Look it up and you'll find that it's Sammy Sosa (539 HR). Who was tenth for the 2003 season? Coincidentally, it was also Sosa (40 HR). The point is, finding this information is a pretty easy exercise. But my question is how much better is A-Rod than Sosa in 2003 or Babe Ruth than Sosa over the course of their careers when compared to all other players? That required a little more work than opening the baseball almanac and here's what I found.

Looking at all the HR data from 1914 to 2003, on a yearly basis I normalized HRs hit to something called z-scores. (I should note that I only used players hitting at least 10 HRs in a season and having at least 100 ABs in a season). This conversion gives you an idea in percentage terms, how much better player A is when compared to player B (If you're really interested in the details, send me a note and I'll forward them on to you).

I looked at how HR leaders compared by season for the years 1914 to 2003. I also looked at how players compared when considering their career HR statistics (for example, how did Babe Ruth, who played 16 seasons, compare to Carlos Delgado, who has played eight seasons, when talking about HR production?).

First, let's look at the single-season HR leaders:

Single-Season HR Leaders
Rank Player Year HR
1 Barry Bonds, SF 2001 73
2 Mark McGwire, STL 1998 70
3 Sammy Sosa, CHI 1998 66
4 Mark McGwire, STL 1999 65
5 Sammy Sosa, CHC 2001 64
6 Sammy Sosa, CHI 1999 63
7 Roger Maris, NY 1961 61
8 Babe Ruth, NY 1927 60
9 Babe Ruth, NY 1921 59
10 Mark McGwire, OAK/STL 1997 58
11 Jimmie Foxx, PHI 1932 58
12 Hank Greenberg, DET 1938 58
13 Alex Rodriguez, TEX 2002 57
14 Luis Gonzalez, ARI 2001 57
15 Ken Griffey, SEA 1998 56
16 Hack Wilson, CHI 1930 56
17 Ken Griffey, SEA 1997 56
18 Ralph Kiner, PIT 1949 54
19 Babe Ruth, NY 1920 54
20 Mickey Mantle, NY 1961 54
21 Babe Ruth, NY 1928 54
22 George Foster, CIN 1977 52
23 Mark McGwire, OAK 1996 52
24 Willie Mays, SF 1965 52
25 Mickey Mantle, NY 1956 52


No surprises here. Now what happens if we revisit the single-season HR leaders, but this time instead of measuring success by the raw number of HRs, we instead measure success by how many more HRs a player hit when compared to other players for a given year. For example, Barry Bonds leads the single-season HR list with 73, Mark McGwire is second (70) and Sammy Sosa third (66). However, if we convert HRs to z-scores we get a better idea of how much better, marginally, a player is compared to others for that season. Looking at the z-scored table for the above players, Bonds is still first (73 HRs is a lot of HRs), McGwire is still second (in 1998, the McGwire/Sosa race helped distance them from the rest of the pack), but Babe Ruth is now third with 54 HRs. Fifty-four HRs is nothing to dismiss (it would have won the title last year), but in 1928, 54 home runs was a mammoth feat. Because it was so much better than other top players that year, it caused Ruth's z-score to be very high (in case you're interested, Sosa ranked ninth on the z-scored list--see the table below). So, based on seasonal success, we can then see how players stack up from different seasons (like McGwire in 1999 and McGwire in 1998 or Cecil Fielder in 1990 and Babe Ruth in 1921).

Here is the z-scored table of single-season HR leaders:
(A quick note on z-scores: A z-score basically tells you, in standard measurements, how far a player is from some average value. For this post, a z-score of 1.00 means that a player is better than 84% of all players, a z-score of 2.00 means that a player is better than 95% of all players and a z-score greater than 3.0 means that a player is better than 99% of all players).

Z-scored HR Leaders
Year Player Zscore z rank HR hr rank
2001 B. Bonds 4.62 1 73 1
1998 M. McGwire 4.41 2 70 2
1928 B. Ruth 4.37 3 54 17
1965 W. Mays 4.29 4 52 21
1921 B. Ruth 4.16 5 59 9
1990 C. Fielder 4.13 6 51 27
1980 M. Schmidt 4.12 7 48 54
1999 M. McGwire 4.11 8 65 4
1998 S. Sosa 4.04 9 66 3
1977 G. Foster 4.03 10 52 21
2002 A. Rodriguez 3.95 11 57 12
1989 K. Mitchell 3.95 12 47 62
1949 R. Kiner 3.95 13 54 17
1999 S. Sosa 3.92 14 63 6
1961 R. Maris 3.85 15 61 7
1997 K. Griffey 3.82 16 56 14
2001 S. Sosa 3.82 17 64 5
1926 B. Ruth 3.74 18 47 62
1978 J. Rice 3.74 19 46 81
1932 J. Foxx 3.70 20 58 10
1946 H. Greenberg 3.66 21 44 112
1971 W. Stargell 3.66 22 48 54
1995 A. Belle 3.65 23 50 31
1920 B. Ruth 3.63 24 54 17
1994 M. Williams 3.63 25 43 136

When compared to the single-season HR leaders table a few things stick out. First, all the z-scores are extremely high, but that's to be expected. More interesting though, is that using z-scores allows you to compare how Willie Mays ranks when compared to Barry Bonds (he ranks 4th using z-scores; using the single-season HR table he ranks 21st). Kevin Mitchell, who in 1989 hit 47 HRs, ranks 12th using z-scores but only 62nd using the single-season HR table. Seeing this, one question you might be tempted to ask is, "A-Rod hit 47 HRs in 2003, why isn't he in the top 25?"

Well, remember z-scores are based on how all other players did in that season. In 1989, there were fewer players hitting close to 47 HRs than in 2003. In fact, A-Rod ranks 81st on this list with a z-score of 3.00 (he's of course tied with Mitchell on the single-season HR list at 62nd). Using that same reasoning, we can see why Sosa fell from 6th (on the single-season HR list) to 14th (on the z-score list)--there were many more players closer to 62 HRs in 1999 and as a result it made his z-score lower (it was 3.92 in 1999) than it would have been in say 1921 (where it would have been about 4.20).

Next, let's look at the all-time HR list:

All-time HR Leaders
Rank Player HR
1 Hank Aaron 755
2 Babe Ruth 714
3 Willie Mays 660
4 Barry Bonds 658
5 Frank Robinson 586
6 Mark McGwire 583
7 Harmon Killebrew 573
8 Reggie Jackson 563
9 Mike Schmidt 548
10 Sammy Sosa 539
11 Mickey Mantle 536
12 Jimmie Foxx 534
13 Rafael Palmeiro 528
14 Ted Williams 521
15 Willie McCovey 521
16 Ernie Banks 512
17 Eddie Mathews 512
18 Mel Ott 511
19 Eddie Murray 504
20 Lou Gehrig 493
21 Fred McGriff 491
22 Ken Griffey 481
23 Willie Stargell 475
24 Stan Musial 475
25 Dave Winfield 465


This also looks pretty familiar. But what happens if we use z-scores for all-time HRs leaders like we did for single-season HR leaders? More specifically, I averaged, over the course of a career, the number of HRs hit by each player and converted that number to a z-score. This z-score value gives an idea of how dominant a HR hitter each player was when compared to other players during their career.

Here's what the z-scored table looks like (same z-score rules apply as above):

Z-scored All-time HR Leaders
Player Zscore Zrank Tot. HR Hrrank
B. Ruth 2.67 1 688 2
M. Schmidt 2.23 2 541 9
M. McGwire 2.23 3 562 6
A. Rodriguez 2.14 4 340 67
R. Kiner 2.09 5 362 52
S. Sosa 2.05 6 527 11
A. Belle 1.96 7 373 43
J. Foxx 1.96 8 516 12
B. Bonds 1.85 9 658 3
H. Killebrew 1.82 10 557 8
A. Pujols 1.82 11 114 416
H. Greenberg 1.74 12 328 73
J. Thome 1.71 13 371 45
L. Gehrig 1.68 14 492 20
W. Mays 1.67 15 642 4
H. Aaron 1.64 16 755 1
F. Thomas 1.62 17 407 35
Ju. Gonzalez 1.60 18 416 33
C. Delgado 1.54 19 292 93
K. Griffey 1.51 20 473 23
R. Calovito 1.48 21 358 55
R. Palmeiro 1.46 22 509 14
T. Helton 1.44 23 214 188
D. Kingman 1.42 24 421 31
M. Ramirez 1.41 25 345 62
T. Williams 1.39 26 507 16


What immediately stands out is that players like A-Rod and Ralph Kiner rank very high on the z-scored table but only rank 67th and 52nd, respectively, when looking at the all-time HR list. Again, this is because A-Rod and Kiner hit a lot more HRs during their careers than did their peers, and as a result, had high z-scores. Also notice that the length of career is unimportant--all that matters is how many HRs a player hits in relation to his peers. A glaring example is Albert Pujols. He ranks 11th on the z-scored table but 416th on the all-time list for no other reason than he's played only three seasons. But in those three seasons, he's been very productive. (This could be a strength or a weakness of using z-scores depending on who you ask).

It's also interesting to note that Hank Aaron only ranks 16th on the z-score list even though he is first on the all-time HR list. This is due primarily to the fact that he was so consistent over such a long time (23 seasons) but never had a season in which he hit an inordinate number of HRs--at least when compared to his peers. On the other hand, because Babe Ruth hit so many more HRs than his peers, he's now regained the top spot (at least on this list).

Using z-scores allows you to now compare HR hitters to their peers; z-scores also allow you to compare players across different seasons. This method now offers insight into why Willie Mays had a better season than Sammy Sosa even though Mays hit 52 HRs and Sosa hit 66 HRs (Mays had a z-score of 4.29 while Sosa's z-score was only 4.04; the reason Mays season was more impressive was because there were far fewer players hitting close to 50+ HRs in 1965 when compared to players hitting close to 60+ HRs in 1998).

At the very least, I hope this post will encourage discussion and serve as a good starting place for more research.