Tuesday, August 18, 2015

Isolated On Base

If you follow baseball advanced stats at all, you're probably familiar with Isolated Slugging (ISO). It's a stat that separates a player's slugging percentage from his batting average, to get an idea of how much of his slugging is from extra base hits vs. how much is just a load of singles propping up a lack of power (I didn't say anything, Jon Jay, I don't know why you're looking at me like that).

The cool thing about ISO is its simplicity. You can calculate it in your head, just based on the slash line you would get on the back of a baseball card, or on the jumbotron at a stadium.

Is there a comparable stat for OBP? My hypothesis is that we can subtract batting average from OBP to get a fairly good estimate of walk rate. In homage to ISO, I call this stat IBO.

Comparing OBP to average is trickier that slugging and average. Both SLG and AVG are denominated on AB. OBP is more complicated: (H+BB+HBP) / (AB +BB+HBP+SF), per Baseball-Reference.

To test this out, I'm using the Lahman Baseball database data from 2014, for all qualified hitters (a total of 135 players).

First step, I created a table to store IBO values. I'm going to get a little hand-wavey here, because I don't want to post all the database structure code. Trust me on this, I'm pretty sure I subtracted one number from another successfully.*

That accomplished, I wrote an R script to pull the data and do some MATH!

library(RMySQL)

ammlb = dbConnect(MySQL(), user='myuser', password='mypassword', dbname='AMMLB', host='localhost')

rs = dbSendQuery(ammlb, "select * from vw_QualifiedIBO where yearID = 2014")
tempdata = fetch(rs, n=-1)
query_results = data.frame(tempdata)
leaders <- query_results[c("nameLast", "nameFirst", "BBPct", "IBO")]
leaders <- leaders[order(-leaders$BBPct),]
print(leaders)

The leaders are who you would expect, the guys who walk a lot:

Last Name First Name BB% IBO
Santana Carlos 0.1712 0.1341
Bautista Jose 0.1548 0.1176
Stanton Giancarlo 0.1473 0.1074
LaRoche Adam 0.1399 0.1027
Carpenter Matt 0.1344 0.1025
Smith Seth 0.1327 0.1009
Werth Jayson 0.132 0.1022
Fowler Dexter 0.131 0.0985
McCutchen Andrew 0.1296 0.0966
Freeman Freddie 0.1271 0.0973
Dozier Brian 0.1264 0.1027
Ortiz David 0.1246 0.093
Crisp Coco 0.1234 0.0902
Granderson Curtis 0.1208 0.0987
Valbuena Luis 0.119 0.0917
Rizzo Anthony 0.1185 0.1001
Trout Mike 0.1177 0.0899
Duda Lucas 0.1158 0.0961
Mauer Joe 0.1158 0.0841
Moss Brandon 0.1155 0.1005
Zobrist Ben 0.115 0.0824
Davis Chris 0.1145 0.104
Encarnacion Edwin 0.1144 0.0859
Teixeira Mark 0.1142 0.0971
Holliday Matt 0.1109 0.0985
Choo Shin-Soo 0.1096 0.0985
Donaldson Josh 0.1094 0.0875
Ramirez Hanley 0.1094 0.0862
Martinez Victor 0.1092 0.0736
Yelich Christian 0.1065 0.0788
Rollins Jimmy 0.1056 0.0799
Crawford Brandon 0.105 0.0774
Puig Yasiel 0.105 0.0867
Howard Ryan 0.1034 0.087
Heyward Jason 0.1032 0.0808
Gordon Alex 0.1011 0.0851
Lucroy Jonathan 0.1008 0.0716
Montero Miguel 0.1 0.0852
Upton BJ 0.0984 0.0786
Carter Chris 0.0979 0.0809
McGehee Casey 0.097 0.0673
Upton Justin 0.0938 0.0719
Beltre Adrian 0.0928 0.0634
Peralta Jhonny 0.0924 0.0735
Cano Robinson 0.0917 0.0677
Plouffe Trevor 0.0911 0.0705
Lowrie Jed 0.0904 0.0719
Kipnis Jason 0.0903 0.0705
Gardner Brett 0.0899 0.0715
Jennings Desmond 0.0882 0.0746
Cabrera Miguel 0.0876 0.0582
Markakis Nick 0.0873 0.0666
Kemp Matt 0.0868 0.0591
Rendon Anthony 0.0852 0.0639
Gonzalez Adrian 0.0848 0.059
Jones Garrett 0.0841 0.063
Pedroia Dustin 0.0837 0.0589
Abreu Jose 0.082 0.0661
Escobar Yunel 0.0819 0.0654
Longoria Evan 0.0815 0.0673
Cruz Nelson 0.0811 0.0625
Bruce Jay 0.0809 0.0643
Eaton Adam 0.0802 0.0615
Utley Chase 0.0798 0.069
Seager Kyle 0.0796 0.066
Aoki Nori 0.0795 0.0643
Walker Neil 0.0789 0.0706
Frazier Todd 0.0788 0.0634
Posey Buster 0.0777 0.0528
Ellsbury Jacoby 0.0773 0.0568
Brantley Michael 0.0769 0.0573
Span Denard 0.0752 0.0533
Freese David 0.0744 0.0612
Chisenhall Lonnie 0.0739 0.0625
Pence Hunter 0.0734 0.055
Gomez Carlos 0.0731 0.0721
Wright David 0.0717 0.055
Kendrick Howie 0.0715 0.0538
Gillaspie Conor 0.0711 0.0537
Calhoun Kole 0.071 0.0534
Desmond Ian 0.071 0.0587
Braun Ryan 0.0707 0.0581
Cabrera Melky 0.0695 0.0495
Pujols Albert 0.0691 0.052
Andrus Elvis 0.0681 0.0508
Butler Billy 0.068 0.052
Martin Leonys 0.0677 0.0508
Ozuna Marcell 0.067 0.048
Castro Jason 0.0665 0.0642
Brown Domonic 0.0664 0.0505
Bogaerts Xander 0.0659 0.0575
Hosmer Eric 0.064 0.0477
Mercer Jordy 0.0636 0.0506
Loney James 0.063 0.0464
Castellanos Nick 0.0622 0.0468
LeMahieu DJ 0.0621 0.0473
Morneau Justin 0.0618 0.0449
Castro Starlin 0.0615 0.0475
Navarro Dioner 0.0615 0.0429
Sandoval Pablo 0.0611 0.0456
Murphy Daniel 0.0607 0.0432
Marte Starling 0.0606 0.0651
McCann Brian 0.0595 0.0539
Ackley Dustin 0.0594 0.0481
Davis Khris 0.0583 0.0552
Reyes Jose 0.0582 0.0408
Infante Omar 0.0579 0.0428
Viciedo Dayan 0.0568 0.0492
Hamilton Billy 0.0565 0.042
Aybar Erick 0.0564 0.0429
Jeter Derek 0.0559 0.047
Simmons Andrelton 0.0557 0.0413
Byrd Marlon 0.0549 0.0484
Hill Aaron 0.0518 0.043
Hardy JJ 0.0512 0.0408
Segura Jean 0.0512 0.0432
Altuve Jose 0.051 0.0359
Blackmon Charlie 0.0483 0.0465
Dominguez Matt 0.0479 0.0417
Gordon Dee 0.0479 0.0371
Cozart Zack 0.0465 0.0464
Gomes Yan 0.0463 0.0343
Adams Matt 0.0462 0.0331
Hechavarria Adeiny 0.0457 0.0315
Rios Alex 0.0441 0.0304
Harrison Josh 0.0401 0.0313
Kinsler Ian 0.0401 0.0322
Ramirez Aramis 0.0395 0.0442
Hunter Torii 0.0392 0.0331
Johnson Chris 0.0378 0.0294
Escobar Alcides 0.0376 0.032
Ramirez Alexei 0.0366 0.0316
Perez Salvador 0.0363 0.0293
Jones Adam 0.0279 0.0298
Revere Ben 0.021 0.0185

So how do those numbers actually correlate?



That looks pretty good... but who are we, the old scouts from Moneyball? Let's do the numbers.

print("Correlation between BBPct and IBO:")
correlation <- cor(leaders$BBPct, leaders$IBO)
print(correlation)
[1] 0.9704839

0.97 correlation. Yeah, that seems alright. So we can get a general sense of what IBO looks like (at least last year):

.090+ Among the league leaders
.065 Above average
.050 Middle of the pack
.040 Below average
.019 Ben Revere

EDIT: it occurs to me that quartiles might be a useful measure, and I figured out you can do it with a single command in R!

quantile(leaders$IBO)
[1]   0%     25%     50%     75%     100%
   0.01850 0.04715 0.06150 0.08035 0.13410

So that's nice, but can we use it to extrapolate walk rate?

Tune in next time!

* Ha! Double check on me when this gets up to BitBucket. I'm working on getting it ready to open source.

No comments:

Post a Comment