Thanks to the recent statistical movement in major sports, basketball now has its share of “advanced” metrics to dazzle the eye and confuse the mind. All the new acronyms can be a little overwhelming at times: PER. WS. APM. TNT. OK, so the last one’s a TV network, but there are enough formulae that we need genius George Costanza to explain things:
Player Efficiency Rating (PER)
PER is a decent all-encompassing stat for summarizing basic box score data. But it has issues. The major failures of PER can be read about here, with the big flaw explained nicely:
Given these values, with a bit of math we can show that a player will break even on his two point field goal attempts if he hits on 30.4% of these shots. On three pointers the break-even point is 21.4%. If a player exceeds these thresholds, and virtually every NBA played does so with respect to two-point shots, the more he shoots the higher his value in PERs. So a player can be an inefficient scorer and simply inflate his value by taking a large number of shots.
Which means that volume shooting is rewarded by that metric as opposed to volume scoring. 11-30 shooting (37%) scores better than 5-12 shooting (42%). Yet at low percentages, the more attempts there are the worse the outcome is for overall offensive efficiency. Sorry, but I don’t find too much “advanced” information in a stat like that.
The other advanced stat that I rarely ever use. There is a good detailed discussion of Wins Produced here.
The biggest issue with the stat is how it treats scoring and rebounding, which is how it was possible to predict 57-wins for the Warriors this year after summer transactions, while simultaneously identifying shooting guard as their weakest position. Monta Ellis, the team’s leading scorer and ostensibly its best player, happens to play SG.
With WP, a player who shoots under 50% from 2-point range hurts his score, regardless of how much he scores. Rebounding is given tremendous importance and there is no way to generate a negative rebounding score. Compare the following results based on the marginal values:
- Player A: 17-30, 5 rebounds
- Player B: 3-3, 15 rebounds
Player A earns 0.288 WP’s. Player B 0.576. Which means Ben Wallace is way better than Kobe Bryant. In other words, Wins Produced just assumes that scoring at a reasonably high baseline-rate will happen automatically. So the flaw with both WP and PER is the way they treat scoring, which happens to be the most important element of the game. Kind of a major flaw in a metric.
Plus-Minus Statistics
There is an entire family of +/- numbers that have been tracked this decade stemming from raw plus-minus.
- Raw +/- is borrowed from hockey, and measures the team’s net result with a player on the court. When player A is in the game, if his team is 10 points better than the opponent, Player A’s +/- is +10.
- On/Off looks at what happens when Player A leaves the game as well. If Player is +10 for the half of the game he plays, and in the other half his team is +10 without him, his on/off is 0. In theory, he didn’t affect team play much.
- Adjusted Plus-Minus (APM) attempts to correct for teammate and opponent quality when a player is on the court.
Raw +/- has its obvious issues, some fleshed out in this New York Times article. Namely, team quality is the primary force behind the stat. Derek Fisher has an enormous +/- figure, but that probably has something to do with being on the floor with Kobe Bryant and Pau Gasol. It’s hard to find too much value in raw PM, particularly over short periods.
On/Off corrects much of that issue by looking at how a player performs relative to his own team. It’s an incredibly good stat for measuring situational value, assuming adequate sample sizes APM models also have some theoretical value, although not without issues.
There are three important things to be mindful of when comparing on/off and APM:
- The worse a team is, the easier it is to have a large number
- Noise
- The problem of Multicollinearity
First, when Kevin Garnett used to leave the game for the Minnesota Timberwolves, they’d fall apart. In a nutshell, Garnett’s impact measured in on/off and APM was enormous in Minnesota because it’s a lot easier to improve a 20-win team than a 60-win team. When Michael Jordan subbed off the court for the Dream Team, they probably didn’t miss a beat.
Second, there is an incredible amount of noise in plus-minus figures. A 10-0 run here, an injury there, some garbage time, whatever. In a small sample — something that plagues the plus-minus family — these make a big difference. Ken Pomeroy ran an interesting simulation to show how profound this effect can be in small samples.
Third, multicollinearity is a statistical phenomenon that, in this context, means the same players are constantly playing together. For example, Odom and Bryant in LA or Varejao and LeBron in Cleveland. It becomes difficult for any plus-minus model to differentiate between two players if they always play together.
For further issues facing APM models, I suggest Joe Sill’s paper on Regularized APM (RAPM). I will not delve into RAPM models because even Genius Costanza would be bored by machine learning techniques.
Win Shares
The good news about Win Shares (WS) is that they require no Ph.D. and yet they aren’t bland (the full explanation is here). WS are the resident stat over at Basketball-Reference and based largely on Dean Oliver’s great work with the box score. Win Shares are an extension of the individual ORtg and DRtg stats based on Oliver’s work.
Win Shares use box scores values, compares them to league averages (like points per possession) and finally adjusts for possessions played. The number of win shares is not based on actual team win-loss record, but the formula is designed to estimate contributions to a win based on these numbers, and the resulting shares approximates win-loss record quite well.
So, how in the world do we interpret Win Shares? First, similar to PER, they are limited by using only traditional box score metrics. However, unlike PER, I’ve yet to find a glaring flaw in WS. PER will break down if there aren’t enough shots to go around, which happens on a balanced team like the 2008 Celtics. Which result passes the sniff test?
Player | PER | Rank | WS per48 | Rank |
---|---|---|---|---|
K. Garnett | 25.6 | 5th | .265 | 2nd |
P. Pierce | 19.6 | 30th | .207 | 10th |
R. Allen | 16.4 | 81st | .177 | 21st |
It’s fairly obvious that Win Shares is doing a pretty decent job ball-parking a player’s value using just the box. As such, I think it’s a really good summary/quick glance stat and much prefer it over PER. Of course, it will sometimes incorrectly assign credit based on certain roles because of failures in the box score. For instance, distributors like Steve Nash are thought to be slightly undervalued by Win Shares. Defensive players who don’t bode well in rebounding, blocks or steals will be undervalued by WS.
The other factor to be wary of when looking at the per 48 minute rate is just how much that player plays. Manu Ginobili is a great example of being able to do more well because he’s logging shorter periods on the court.
Your explanation of WP didn’t show any flaws, for sure not connected with scoring. If any only that rebounders are more valuable than inefficient scorers.
And it’s completely not true that it’s easier to have big APM numbers in weaker teams (that’s why it’s ADJUSTED). Examples of Duncan and Manu (both in top 10 APM for last decade) shows that.
It’s the definition of “inefficient” scoring that is flawed. When Ben Wallace-types come out as twice as good as Kobe Bryant-types, there is something seriously wrong. That’s *because* of the way it treats scoring. So I’m not sure what you mean…
re: +/- The point is it’s easier to have a bigger impact on a weaker team. I included APM in that sentence, because, for APM models I’ve read, my understanding of the mathematics is that it’s still not correcting for that phenomenon. Every unique 5 v 5 lineup is an event, but players are still bound to their own teammates. Duncan can’t also play with KG’s teammates, no?
My degree was not in math – if you want to explain how every APM model accounts for this, please do. Thanks.
[…] and this subsequent blog post advocating Wins Produced. As I mentioned in my overview of popular advanced basketball statistics, Wins Produced has serious problems as an individual player valuator. Yet despite its massive […]
[…] will probably follow up with another post on some of the advanced statistics and how to treat +/- numbers. But the final piece of the puzzle is always glancing at the […]
[…] what exactly it is the metric is representing. I’ve written about the major players in the advanced stat community before, and my hope is that people keep that in mind when viewing Expected Value. It is still a work in […]