Tagged: Sabermetrics

Finally…Some Reading Recommendations

I’ve been delayed posting today due to (a) a job interview and (b) writing an email response.  The email response was from an aunt who found out I was writing a baseball blog, but who has no experience with the game and so asked me to recommend some reading materials.  Well, I got wild and provided a lengthy response.  

Since it is already 5:40 EDT, and since the Twins were simply overwhelmed by the Mariners’ bats (!!!), I think I’m just going to post the email I wrote, with some light editing, here as my post for the day.
So here it is, First Pitch Strike’s essential baseball reading list.

The Official Rules of Major League Baseball can be found on the web at the always awesome web site, MLB.com (this link is simply the homepage for the site), beginning on this page; at the left you will find links to the later sections of the document.

However, the best one-source guide to baseball rules, along with illustrations and a very helpful text, is the really cool new book Baseball Field Guide.  

On thing to remember about baseball is the unique nature of the game: the defense controls the ball and the pace of the game.  In tennis the server puts the ball in play and can score as a result of superior service.  In baseball, the pitcher puts the ball in play, but he and his teammates can only prevent the other team from scoring.  
There have been many guides to watching the game published, but here are the few I recommend with a few caveats.  (I think watching a lot of games is the best means to figure stuff out at first, though watching them on TV is deceptive since the cameras can only see little bits of the field at a time.)

Tim McCarver, a former Major League catcher, and a color analyst for FOX Sports’ baseball coverage wrote a decent book, Baseball for Brain Surgeons and Other Smart People, but the book demonstrates McCarver’s smarty-pants demeanor, and his tendency to forget that he’s supposed to be writing a book to help people understand the game more rather than trying to impress upon them how much they don’t know.

From the “X for Dummies” line of titles is Joe Morgan‘s Baseball For Dummies, which is very basic, as well as reflecting a few of Morgan’s out-of-date biases.  Don’t get me wrong: Morgan is a Hall of Fame player, the best second baseman to even lace on cleats, and one of the great, great players from my childhood.  But some of his thinking about the game…is practically medieval.  He really is coming close, these days, to crossing over into bitter old man territory when he discusses the game and his distaste for the kind of advanced statistical analysis that I think expands (at least my own) understanding of the game.  
An better book that either McCarver’s or Morgan’s–though its author did not play Major League baseball nor does he broadcast the game–is a relatively new book, Zack Hample’s Watching Baseball Smarter.  I read (uhm, devoured) this book in about four hours one day sitting in the local Borders bookstore.  It is very thoughtful and enthusiastic.  The only caveat is that one must be willing to sit down and look out for the things Hample discusses or else it doesn’t hold much value.  So watch some baseball!

My favorite of the “fan’s guiide to the game” genre is an old book in its third edition, Leonard Koppet’s Thinking Fan’s Guide to Baseball.  Now, this book gets a mediocre rating at Amazon.com, but that’s because most of the reviewer already are pretty savvy baseball watchers and they wanted more, so much more, out of the book.  Oh well.

The best overall book about the game is also an oceanic immersion in its history and assumes a great deal of knowledge about the game,The New Bill James Historical Baseball Abstract.  I actually here that the o
riginal (1988) Bill James Historical Baseball Abstract may be the best baseball book ever written, and I just ordered my copy from a used bookstore last night.    

The magisterial documentary, Ken Burns’ Baseball, is also worth one’s time, and it is available from Netflix.  There is a companion text.  I have been watching the documentary, and I am learning a great deal.

Baseball, due to its non-continuous action and the primacy of the pitcher-batter interaction, is actually a sport that is very, very amenable to statistical analysis.  Scoring chances are made up of discrete moments during batter-pitcher interactions.  What happens in those moments can be described in numbers.  Therefore, statistics have more to do with what is actually going on in baseball to a much greater degree of precision than they do in any other sport.

You might think I would refer you to a baseball encyclopedia, but why bother with a 1800 page–and fifteen pound–book when the website Baseball Reference.com has every piece of data in such books, and much, much more, and is updated daily.  It is simply the best one-stop resource for baseball research (such as the First Pitch Strike study I am doing about the results of the pitcher throwing, believe it or not, a first pitch strike to a batter…it really improves things for the pitcher and his team).

The Baseball Prospectus Team of Experts produced Baseball Between the Numberswhich may be the best pure number crunching tome I’ve ever read.   They don’t just slice and dice numbers better than pretty much anyone else, but they really can write.  As much as I read, the quality of writing comes to mean more and more to me, and I would stack their baseball writing up against almost anyone else’s except for Roger Kahn, Roger Angell, and, Bill James, all of whom are so good as to make me embarassed by what I am doing at this blog.  I  recommend that book for those willing to do lots and lots of hard thinking about the stuff.  It’s not light reading.

For an introduction to the kind of spreadsheet fun I like to play around with, some mathematics professors have written Understanding Sabermetrics, though for a lot of people who bother to read this blog, or, more to the point FanGraphs, The Hardball Times, Baseball Prospectus, the Baseball Analysts, or similar sites, it will likely prove too basic. 

While some intellectually lazy sportwriters (Dan Shaugnessy, Steve Cameron, Randy Galloway, to name a few) sneer at advanced statistical analysis, they are really highlighting their own ignorance: addiction to statistical descriptions of players has always pervaded the game of baseball, ever since Henry Chadwick discovered baseball and began writing about it.  Alan Schwartz’s The Numbers Game: Baseball’s Lifelong Fascination With Statistics recounts the history of the game’s statistics and nicely discusses how growing interest in the game by professional statisticians and analysts generated all sorts of interesting (and useful) insights into the game.  What prevented this kind of work from really reaching large audiences in previous decades was the lack of computing power.  With an Excel spreadsheet and Baseball Reference I can crank out more analysis in mere moments than people could do in months in the fifties, sixties, seventies, and a lot of the eighties.  

At this point, we’re all tired, so lets all watch baseball smarter! To the ballpark!!!


Twins Review: Their Goodness and Justin Morneau’s Awesome Early Season

“Twinkie Talk” answers the question “How Good Are the Twins?” in two posts, one about the offense and one about the pitching.  (These links will appear below, as well, just so no one can miss the terrific analysis Erin has done at “Twinkie Talk”; in fact, you should take a moment an bookmark that blog, for it is always worth a Twins’ fan’s time.)

I linked to the hitting post, and I’ll re-link here.  
Two things to note: (1) Kubel, while “disappointing” has also been very unlucky thus far; (2) Morneau has been outstanding, in that junior high joke sort of way, since he’s standing out by himself, at the very top of the heap, with a cumulative Win Probability Added (WPA) value that equals the value put up by two different groups comprised of three good first basemen (including Albert Pujols(!)).  To reiterate with a (very) bad pun, Morneau’s aweseom act-ivity at the plate has been fueled by “select-ivity and connect-ivity.”
The pitching part of the “How Good Are the Twins?” series is here, and while I agree with almost all of it, I am still wondering whether the Twins’ bullpen could be easily improved, particulalry after Saturday.  I am wondering, in fact, if Anthony Slama could help?  Seriously, Twins front office, why won’t you believe what this guy’s performance has been screaming at you.  Aaron Gleeman wonders the same thing, at the bottom of this long post, saying:

There’s speculation the Twins could dump Crain once Hardy returns, which would involve eating over $1 million. He figures to eventually settle into the same 4.50 ERA as usual, but I’d be just fine cutting Crain loose if it meant finally giving Anthony Slama a chance.

Much has been made of the fact that the Twins don’t have to add Slama to the 40-man roster until the offseason, but they’re free to do so whenever they want and keeping a 26-year-old at Triple-A because they’re not required to do otherwise is absurd. Slama has a 1.85 ERA and .123 opponents’ batting average in 24 innings at Triple-A and a 1.86 ERA in 208 career innings. For a team carrying eight relievers, not giving him a shot post-Crain would be laughable.

However, this post, over at “The Big Puffy Hand” says it all, which I will summarize as follows:  Jeez, Twins’ Front Office, remove your collective h
ead from your fundamental posterior orifice, do some statistical analysis and realize that Slama belongs in the Show!
By the way, it wouldn’t be that hard for them, as Slama’s current Triple-A statistics are available both at Baseball-Reference.com and at the Rochester Red Wings’ website.

Missing the Forest for the Trees, Ranger-Watching Style

The question for the day is if the Rangers miss Rudy Jaramillo, their old hitting coach.  

First, Randy Galloway writes in the Fort Worth Star-Telegram that the answer is yes, the Rangers do, indeed, miss Jaramillo.  As evidence he cites the following “statistics”: 

Meanwhile, a comparison of stats from a year ago, and for those who doubted or blamed Rudy for the Rangers’ ’09 offensive slide, the early “numbers,” even the “deep count” geek numbers, work against them.

Not counting Thursday night:

Pitches per plate appearance: 3.8 then and now.

Walks: 140 now, 122 a year ago. (Thank you, Elvis, and also Justin Smoak.)

Strikeouts: 288 now, 335 a year ago.

Batting average: .265 now, .272 a year ago.

Home runs: 37 now, 57 a year ago.

Runs: 194 now, 221 a year ago

And worst of all:

Hitting with runners in scoring position: .241 now and .267 a year ago.

Hmm.  Pretty convincing, right?  Well, not really.  
Galloway tries to preempt what I’m going to say/write by referring to “deep stat geek numbers” a couple of times in his piece and then throws some numbers against the page to “prove” that the Rangers’ offense is terrible, just terrible, just well and truly terrible this season.
*Sigh* If only he had a point.
Rob Neyer of ESPN.com, who referred me to Galloway’s silly piece, thinks Galloway’s “analysis” is either all wet or that it doesn’t hold water (hmm, funny how those opposite images mean the same thing, isn’t it?).  Neyer concludes his piece with the following analysis of his own:
In 2009, the Rangers finished seventh in the American League in OPS and seventh in scoring. 

In 2010, the Rangers rank sixth in the American League in OPS and fifth in scoring. 

I’m not sure what else to say about this. The Rangers apparently decided that no hitting coach, even one with Jaramillo’s track record, is worth “big money and a multiyear deal.” Nothing that’s happened this season would support the argument that they were wrong. 

About their team, anyway. The Cubs’ hitting has improved some under Jaramillo, thanks largely to Alfonso Soriano‘s and Kosuke Fukudome‘s twin rebirths. 

But analyzing the Rangers’ hitters without accounting for league context just isn’t good enough. I don’t know how much money the Rangers saved when they didn’t match (or exceed) the Cubs’ offer to Jaramillo. But a fourth of the way into the season, it looks like that money was probably better spent elsewhere.

Neyer hits it right on the head by noting that citing the stats without providing a wider context isn’t good enough.  I think he’s being rather charitable, for I think that citing the stats without evaluating the larger context is largely meaningless.  What’s worse, if Galloway had done about seven minutes of work with an internet connection and Excel, he would have seen that, well, things have changed in the AL between 2009 and 2010.  
More on the “change” thing below, for there are some other matters that need to be addressed right away.
First, batting average with runners in scoring position, let’s call it BARISP for short, is extremely subject to variation and fluctuation due to small sample size.  Additionally, it is prone to extreme turns as things “regress to the mean.”  The best example–albeit an extreme one–would be the Minnesota Twins in 2008 and 2009.  The Twins had a BARISP of .305 in 2008, allowing them to score the fourth most runs in the AL despite having a and OBP that was in the middle of the pack.  But in 2009, the Twins experienced a sharp decline in  BARISP compared to the previous season (it was.278, a decline of .027, which was larger than the Rangers’ decline in BARISP between ’09 and ’10); the small sample size just caught up with them.
Second, one of the factors that Galloway completely glosses over is the vast improvement in the Rangers’ walk to strikeout ratio.  In the first two months of 2009 it was 122 to 335, or .36 walks for every K; thus far in 2010 it is 144/280, or .49 walks per K.  Why is this important?  Well, for one thing, they are seeing more pitches and making more contact.  While contact can lead to double plays–just ask Michael Cuddyer of the Twins–it also tends to lead to more base hits over time, for a ball has to be in play for it to have a chance to become a hit.
Third, saying that the number of runs scored is down can only show that the offense is worse if the overall number of runs scored is the same league-wide.  Neyer points out that the Rangers are fifth in scoring in 2010 compared to seventh in scoring in 2009.  It is the rela
 standing that matters here.  That is, the context matters.  
Fourth, run scoring in the AL is down.  (Heck, offense across the Majors is down.)  In April and May of 2009, scoring in the AL was 4.92 per team per game; in 2010 it is down to 4.52 per team per game.  In April and May of ’09, the Rangers scored 5.42 runs per game, which worked out 110% the AL average.  So far in ’10, the Rangers are averaging 4.86 runs per game, which is 107% of the AL average.  The Rangers’ average runs per game per team is down, but it is only slightly down, 3% of the league average (or about one-sixth of a run in 2009 terms).  However, the Rangers, who averaged 4.43 runs per game in April ’10, are averaging 5.35 runs per game so far in May.  And that May figure works out to 116% of the AL average for May of ’10 (which is 4.60 runs per game per team).  
By the way, while for April combined with May of ’10 the Rangers offensive numbers look down, their May numbers alone are way, way up from both their April numbers of ’10 as well as their April and May numbers of ’09. In short, Galloway’s time-slice doesn’t account for change within the time period he chooses.
Fifth, all offensive numbers are down across the American League.   The League’s batting average, .268 through May of ’09, sits presently at .258; the League’s OBP, .335 in ’09, is sitting at .328 in ’10; the League’s SLG, .429 in ’09, is .406 in ’10, decline of .023.  So, the League’s “slash line” for 2009 was .268/.335/.429.  In ’10 it is: .258/.328/.406.  The Rangers’ slash line in 2009 was .272/.331/.489; for 2010 it is .270/.335/.406.  
The Rangers’ slash lines show three things: their average relative to the League’s is up, their OBP is up in absolute terms, and their slugging percentage is way down.  Fueling that decline in SLG is a sharp decline in home runs, measured in home runs per plate appearance, which in 2009 was 4.4%, while in 2010 is only 2.5%.  The League has seen a much smaller decline: from 2.9% to 2.6%.  Perhaps this decline is attributable to Jaramillo’s absence, but does it matter thus far?  The Rangers are still scoring runs, and doing so relative to the League average at a higher rate: they are fifth in scoring in ’10, while having been 7th in scoring in ’09, remember?
I prefer the stat wOBA to OPS, because it is scaled to OBP and it  directly correlates to run scoring.  At any rate, the AL had a wOBA of .336 through May of ’09, and the AL wOBA is .324 so far in ’10, another indication that offense is down in ’10.  The Rangers’ wOBA through May of ’09 was .352, which was 105% of the AL’s, while their wOBA so far in ’10 is .330, which is 102% of the League’s.  
So, yes, thus far in ’10 the Rangers’ offense is slightly down, but note that it’s down only slightly.  Don’t confuse the Rangers’ offense with the Mariners, for goodness’ sake!  And that’s the problem with Galloway’s piece, in a nutshell: his tone makes it sound as though the Rangers are displaying Mariner-level offensive ineptitude, and (1) they aren’t, and more importantly, (2) they are winning and in 1st place.
Sixth and finally (finally!), a couple of key bats have missed a lot of playing time for the Rangers so far this season: Ian Kinsler, who absolutely scorched the ball all through April of ’09 (with a wOBA of .428), missed all of April and substantial chunks of May in ’10.  Since he’s been back, he’s been pounding the ball like he did last spring, which may account for the improvement in the Rangers’ output in May.  Nelson Cruz has also spent time on the DL already this season, but he, too, is crushing the ball when he is in the lineup.

Statistical Primer and Setting the Table

I realize that if you are reading this post, you almost certainly don’t need this primer, but it is a well-written and informative post, so I thought I’d link to it.  

(By the way, this post simply sets the table, for some venting ranting I will be doing tomorrow.)

If you take just one thing away from reading Dingers’ post, remember that runs batted in (RBI) numbers are a terrible way to evaluate a player’s offensive ability:
Now just one final note: Runs Batted In is the worst cited statistic. RBIs are a result of inherited runners on base more than they are of the batter’s ability. Theoretically, yes, a batter with a higher batting average and/or slugging will get more RBIs, but they’re very unfair to good hitters in bad line ups. There have been a lot of damn seasons in history where a hitter had a better than average offensive season and still had 60 or fewer RBIs.