Tuesday, September 29, 2015

Batting Average / BABIP Gap

Albert Pujols is having an odd season. According to reporter Pedro Moura:
As of press time (after September 28th's games), he's batting .237. That's a -26 point gap between his average and BABIP.

As a refresher, BABIP is Batting Average on Balls In Play. It's (Hits - Home runs) / (At Bats - Strikeouts - home runs). The idea is that the batter has the least control over balls he hits onto the field, so the stat removes balls that were not put into play. The league average hovers within a few points of .300 almost every year. A hitter's BABIP has to be taken in context with other numbers. Alone, t's not always clear whether it means the batter is hitting the ball really well and fielders can't get to them, or the batter is just getting lucky and hitting weak balls that fall in for hits.

I expressed the difference above as -26 points, because for most batters BABIP is higher than batting average (because they're removing strikeouts). Among hitters qualifying for the batting title, the league average BABIP is .311* and batting average is .282. That's a difference of +29 points.

Pujols, on the other hand, is nearly that far in the other direction. His batting average is 26 points lower than his BABIP. Generally, players with higher BABIP than batting average hit a high percentage of their hits for home run and don't strike out a lot.

Following Sam Miller, I ran the numbers for all qualified hitters of the modern era (since 1988).

pacman::p_load("sqldf") 

batters = read.csv("../r-scripts/CSV/lahman/Batting.csv") 
batters.modern = sqldf('select playerID, yearID, sum(AB) as AB, sum(H) as H, sum(HR) as HR, sum(SO) as SO from batters where yearID > 1988 and AB > 480 group by playerID, yearID') 

batters.modern$AVG = batters.modern$H / batters.modern$AB 
batters.modern$BABIP = (batters.modern$H - batters.modern$HR) / (batters.modern$AB - batters.modern$HR - batters.modern$SO) 
batters.modern$Diff = (batters.modern$BABIP - batters.modern$AVG) * 1000
batters.modern$HomerPct = batters.modern$HR / batters.modern$H * 100 

higherAvg = (batters.modern[batters.modern$Diff < 0, ]) 
higherAvg = na.omit(higherAvg) 

print(higherAvg[order(-higherAvg$Diff), ]) 
cat("Mean difference:", mean(batters.modern$Diff), "\n") 
cat("Mean Batting Average:", sum(batters.modern$H) / sum(batters.modern$AB), "\n") cat("Mean BABIP:", (sum(batters.modern$H) - sum(batters.modern$HR)) / (sum(batters.modern$AB) - sum(batters.modern$HR) - sum(batters.2014$SO)))

# Fangraphs data for 2015 -- It has BABIP already. Nice! Go to bat!
fg2015 = read.csv("BABIPvsBA/FanGraphs Leaderboard-2015.csv")
fg2015$Diff = (fg2015$BABIP - fg2015$AVG) * 1000

higher2015 = fg2015[fg2015$Diff < 0, ]

Turns out Pujols's -26 would be a modern record, if it weren't for... Albert Pujols. Since 1988, there have been 154 such seasons. Pujols holds the largest negative differece: -37.3 in 2006. It's not even close, the next highest is Pujols again with -27.4 in 2004. Jose Bautista's 2010 season follows with -24.5. Bautista will drop to 4th if Pujols keeps up his pace this season. (2015 data from Fangraphs confirms no one else is lower than -13, Pujols is unrivaled)

Of those 154 seasons, 11 of them belong to Pujols. (This will most likely be his 12th). That comprises  every season of his career other than his rookie year and his injury-marred 2012-2013. In the modern era, Rafael Palmeiro had 10 such seasons, Gary Sheffield 9 and no one else has more than 6.

As a Cardinal fan, it's sad to see ALBERT FREAKING PUJOLS turn into Albert "Batting .237 and and sharing the lineup with Mike Trout" Pujols. But those BABIP numbers tell a story. Pujols has not only hit a large percentage of his hits for home runs, he's at the same time limited his strikeouts (a rare combination). That punishes his BABIP, since fielders are bound to be standing in the way of a certain number of the balls he puts in to play. But it means he's rarely giving away an out without at least putting the bat on the ball.

As Pujols continues ascending past the ranks of modern players, he's moving into territory no one has reached in decades. Fittingly, his 12th season will put him into a tie with Cardinals legend Stan Musial. Only Hank Aaron has more, with 13.

* I said above the average hovers around .300. That includes numbers from pitchers, utility players, and others who don't qualify for the batting title. They are typically worse hitters, and bring the numbers down quite a bit.

No comments:

Post a Comment