A month or so ago, I wrote a few words in a Stat of the Week piece for my employer Baseball Info Solutions titled The Ray Searage Effect. Let me tee it up for you.
Since taking over as the Pirates pitching coach in August 2010, Ray Searage has demonstrated a unique ability to maximize pitcher effectiveness. Collectively, over the last five seasons, the Pirates have led the majors in soft hit percentage, groundball rate, and Batting Average on Grounders and Short Liners, and are second in two-seam fastball and sinker usage—all ingredients for sustainable success on the pitcher’s mound.
Take a look for yourself—the Buccos are atop the league in each category. The bar graph doesn’t offer Batting Average on Grounders and Short Liners since that’s a proprietary metric of Baseball Info Solutions, so you’ll just have to take my word for it.
What Searage has done in the Steel City over the last half decade or so has been nothing short of sensational. He’s been handed countless reclamation-project pitchers, sprinkled on some of his magic, and viola, career revived. Thanks to our good friend Travis Sawchik and his phenomenal book Big Data Baseball, readers got an insightful understanding of Searage’s philosophy and the analytical movement set forth in Pittsburgh. Sawchik breaks down the Pirates willingness to really test the waters of sabermetric principles beginning in 2013, making radical changes that bridged the analytical insight of the front office to on-field personnel. Before we knew it, the field staff and players started embracing and employing these new-age strategies.
Advertently, most of the cornerstone movements the Pirates were utilizing built off each other. Sawchik highlights the Pirates propensity to increase the usage of two-seam fastballs, especially up and in on hitters. Two-seamers have a natural sinking-tailing action, which tends to generate more groundballs than four-seamers. Combine the motive of inducing grounders with their increasing deployment of defensive shifts, and easy outs can quickly become a recurrent theme. The Pirates also put greater emphasis on pitch framing and began targeting underperforming pitchers in hopes that Searage and their battery mate would be able to help them right the ship.
These concepts in and of themselves aren’t algorithmic heavy or anything. They’re simply about asking the right questions, working through the data you have to find an advantage, and then collaborating with the coaching staff to get those tactics onto the field. Easier said than done obviously. But that’s the message I think Sawchik is drawing to his audience: it’s about more than just the nuts and bolts of data mining and fancy algorithms; it’s about building trusting relationships, asking the right questions and communicating ideas, and ultimately embracing uncomfortable territories in an attempt to achieve greatness. It’s well worth the read.
So in the Stat of the Week analysis, we only examined a select group of pitchers that Searage worked with, offseason starting pitcher acquisitions. Obviously, the use of a limited sample hampers our ability to gauge our findings as Searage’s true influence. This time around, we obtained every pitcher who was acquired after Searage took over in 2010. With in-season trade acquisitions, we took single-season performance split by the day they changed teams. For guys who joined the Pirates in the winter months, we used the season prior to coming to Pittsburgh and their first season with Searage as comparison points. Also, I used Batting Average on Balls in Play (BABIP) as a surrogate for Batting Average on Grounder and Short Liners.
Here’s the list of the 35 hurlers Searage and Co. have brought on board since taking the reins in 2010. The group consists of 24 offseason acquisitions and 11 in-season acquisitions, most backend pieces that the limited-payroll Pirates acquired on the cheap.
To determine a more validated and wholesome effect of Searage on newcomers, I calculated the weighted average of their change in performance before and after joining forces. Each metric was weighted by the minimum number of pitches thrown or balls put in play, depending on the statistic of study. Drum roll please.
|The Searage Effect||FT & SI Usage||Groundball Rate||Soft Hit Percentage||BABIP|
Whether it’s mechanical, approach, or some undistinguished insight into pitching, Searage’s philosophy embarks on getting the most out of what, or who, he’s given. It’s undeniable, Uncle Ray gets his staff and ‘pen to go about their business the Pirate way. Given the upticks in what most would generally perceive as improvement, how does this influence quantifiable on-field performance?
As we know, performance can be defined in many different ways. The four pitching triggers that Searage has proven to emphasize all can be accounted for in batted ball performance, besides maybe two-seam fastball and sinker usage. It’s hard to use all-encompassing metrics because we haven’t established any trends in strikeout surges or command tendencies, and most ERA estimators have leaves that fall far from this tree.
So, for the actual assessment of Searage’s impact, I grabbed five everyday metrics that are applicable in the sense that they rely heavily on batted ball results, while trying to reach audiences of traditionalists and the sabermetric savvy:
- Opponent Batting Average (AVG) – good old-fashioned back-of-the-card batting average
- Opponent True Average (TAv) – Baseball Prospectus’ measure of total offensive value, scaled to batting average with adjustments for park and league quality
- Opponent Slugging Percentage (SLG) – a popular measure of gap-power/extra-base ability
- Opponent Isolated Power (ISO) – a proxy for raw offensive power at the plate
- Deserved Run Average-Minus (DRA-) – Baseball Prospectus’ context-neutral all-inclusive pitching metric scaled to 100, where each point under 100 represents a percentage better than league average (e.g. a 78 DRA- means a pitcher performed 22 percent better than league average)
To stay consistent with the methodology used in the Pirates-specific analysis I grabbed pitchers who threw in back-to-back seasons between 2010 and 2015, only including guys who saw at least 250 batters faced in subsequent seasons since statistic stabilization is somewhat of a priority here. In total we extracted 897 pitcher couplets. For perspective, the first of the two seasons had a median innings pitched of 145.7 and a mean of 139.3 innings. The second season had a median innings pitched of 143.0 and a mean of 137.1. From there, I calculated their change in performance for the four predictor variables (groundball rate, BABIP, soft hit percentage, and two-seamer plus sinker usage) and the five performance-based response variables listed above.
Using the four areas pitchers have excelled in under Searage, I wanted to work through some multiple linear regression models to determine how his influence plays out on the field. I’m more concerned with the interpretability of the models than their actual prediction accuracy. Multiple linear regression should establish the association between the change in each metric and change in performance, which should grant us the ability to interpret his true impact on newly acquired pitchers, aside from externalities such as park factors, catcher framing, and the Pirates ability to defend.
We’ll start with a correlation matrix of the four possible predictors and the five response variables.
The relationships vary pretty significantly throughout. Change in BABIP associates strongly with four of the five response variables, and change in groundball rate shows a moderate relationship with several as well. Otherwise, not a lot going on here.
Using a statistical significance level of α = 0.05 for our independent variables, we ran separate models for each of the five statistics of pitcher evaluation. Here’s what our model’s churned out.
The R-squared values represent the proportion of variance in the response that can be explained by the predictors. The higher the value, the better the model. The batting average model has an R-squared of 0.7517, showing change in groundball rate, BABIP, and soft hit percentage tell the majority of the story (75 percent) of change in opponent batting average. The R-squared values for the True Average, slugging percentage, and DRA- models reside at 0.4763, 0.3701, and 0.4199, respectively, explaining about 35 percent to 50 percent of the variance across the three. On the other hand, the low R-squared value of the isolated power model (0.1209) implies that predicting change in isolated power entails a variety of elements not included here. Change in two-seamer and sinker usage was left out of all five models, meaning there’s no evidence of an association between change in their usage and change in pitching performance in the presence of the other predictors.
Inserting Searage’s quantified contributions back into the model’s coefficients, when a pitcher joins the Pirates we can come to expect, on average, a 7-point decrease in opponent batting average, a 6-point decrease in opponent True Average, a 10-point decrease in slugging percentage, a 3-point decrease in isolated power, and a 4.2 decrease in DRA-minus. These aren’t mutually exclusive, but collectively they demonstrate the effects one could attribute to Ray Searage and his theory of coaching. Duly noted, it’s difficult to hand all the credit to Searage as the front office must’ve seen something in each individual to target them in the first place. But precisely valuing the contributions of a front office versus a coaching staff will have to be a topic back-pocketed for another day. If nothing else, it’s good news for the newly acquired Jon Niese, Ryan Vogelsong, Kyle Lobstein, Rob Scahill, Juan Nicasio, and Neftali Felix, who should all figure into the mix this season for the Bucs.
Multiple linear regression models make several assumptions that need to be examined a bit more thoroughly to assure our interpretations are accurate, or at least not heavily skewed. The residual plots below help handle the first assumption, linearity. The top-left plot shows the residuals versus the fitted values, with the red line exhibiting a relatively smooth fit. In the absence of any discernible pattern, we can move on to our Q-Q plot (top right) to make sure we have fair distributions. A somewhat straight line means we can check this box, thankful we don’t need to apply any non-linear transformations to our variables.
In our model’s summary statistics above, the variance inflation factor (VIF) of the coefficients shows multicollinearity in our least squares regression doesn’t exist in any fashion. For those wondering, multicollinearity is when two or more predictor variables are closely related to one another, which means one variable explains the other with a fair degree of accuracy. Our VIF values all come in under 1.02 which is more than acceptable. You can also revert back to the correlation matrix to look at the relationships between all of our predictors. Again, no sweat here.
You’ve probably noticed several points in our residual plots check in as outliers. The presence of extreme outliers and high leverage points can be problematic at times, affecting the interpretation of the fitted data and severely deteriorating R-squared values. Most of these points should’ve been removed with our batters faced requirements, and besides a few standouts I think we did a rather solid job of tackling this issue beforehand.
Using the standard cutoff for Cook’s distance of influential data points, only two observations were flagged as influential in our five models. The second season of our first point reflects Wade Davis’ historically dominate 2014 campaign, which saw a severe spike in groundball rate and his performance metrics plummet. The second concernable data point is a couplet of Steve Delabar’s 2012-2013 performance in which he barely reached the threshold of batters faced in both seasons. Between the two, he had a 12.9 percent dip in groundball rate yet saw his BABIP jump by 93 points. Both cases are quite anomalous. The distribution in our histograms demonstrate the normality of our studentized residuals. On this notion, we’ll let the two outliers slide as everything else looks clean.
The last assumption of linear regression is constant variance among the error terms. Ideally, you want unequal variability between your errors, which is termed heteroscedastcity in predictive modeling. Heteroscedasticity is pretty straight forward and can be viewed in the residuals versus fitted plot. Our residuals seem reasonably dispersed and show no presence of a funnel shape in any of our plots. All systems go from here.