Some time ago, I began wondering what the value of certain actions in a game were. How did they translate to points, since points determine who wins based on the rules of the game?
So much goes in to scoring points: first one needs possession of the ball. Well, stop right there. How much is possession of the ball worth? On average, about 1.07 points. How much is a turnover worth, then? If it’s the loss of a possession, it should be worth about -1.07. Then what is the value of a missed field goal? Well, it’s failing to score on a possession, but on average, 26% of the time the offense still recovers the rebound, so it’s not quite as bad as a turnover. So 1.07 * 0.74 means a missed field goal is approximately -0.79 points. And so on…
I call this model “Expected Value,” as a tip of the cap to my poker background. Viewing actions this way isn’t new — PER uses value of possession concepts — but there are stats in play here that haven’t been used before.
A huge component of this rating is introducing defensive statistics in order to ballpark players defensive performances. On the offensive end, its major novel component is accounting for distributing the share of offensive scoring between creators and those they created for. The “helped” elements in the table below are an estimate of how much credit should go to the creator of an open layup, shot or layup foul (one of those fouls in which the player is intentionally fouled from behind to prevent a dunk).
Since EV is using novel stats from my stat-tracking (linked to in the tables below), it’s not even 400 games old (stat-tracking from the 2010 playoffs and 2011 regular season). Nonetheless, on the last round of correlations I tested, the correlation coefficients were as follows:
- Offensive EV to ORtg: 0.97
- Defensive EV to DRtg: -0.80
- Expected Value to Overall Efficiency: 0.91
Interesting correlations given that the model is built around causality. And the correlations are even stronger when including Help Needed, the defensive counterpart to Opportunities Created. Without further ado, here are the marginal values used for EV. First, the defensive values:
Event | Marginal Value |
---|---|
3-point FG Against | -1.93 |
2-point FG Against | -0.93 |
Defensive Error | -0.65 |
Shooting Free Throw | -0.47 |
Charge | 1.20 |
Forced Turnover | 1.07 |
Missed FGA Against | 0.79 |
Defensive Rebound | 0.28 |
Block | 0.15 |
Offensive Values:
Event | Marginal Value |
---|---|
Made 3-pt FG | 1.93 |
Made 2-pt FG | 0.93 |
Offensive Rebound | 0.79 |
Opportunity Created | 0.50 |
Made FT | 0.47 |
Foul Drawn | 0.30 |
Assist | 0.30 |
Turnover | -1.07 |
Missed FGA | -0.79 |
Missed FTA | -0.40 |
Helped Layup | -0.70 |
Helped 2-pt FG | -0.37 |
Helped 3-pt FG | -0.35 |
Helped FTA | -0.27 |
Other:
Event | Marginal Value |
---|---|
Technical Foul | -0.76 |
What’s Missing
Astute observers will notice there is absolutely no accounting for screens. Most players set similar screens, with a few outliers on both ends, generally determined by size. Comparably, most defensive players handle screens similarly (I’ve yet to see the player who can run through a screen), although some outliers hedge them better than others.
Which leads to another, difficult to quantify issue: spacing. This is also a small issue in most cases, but great shooters will prevent defenses from collapsing too much into the lane. It’s extremely hard to quantify how often a defender is reluctant to sag off of a shooter, especially since most players will double and rotate appropriately even if they are guarding Ray Allen.
Some other major elements still missing that might be possible to quantify in the future with devices like optical tracking:
- Shots deterred
- Quality of “closeouts”
- How long a player holds the ball and mucks up an offensive possession
A final note: Player ratings and comprehensive metrics are often polarizing. Fans tend to cling to metrics they intuitively like or ones that their favorite players do well in, or they tend to ignore metrics for converse reasons. Both extremes often miss what exactly it is the metric is representing. I’ve written about the major players in the advanced stat community before, and my hope is that people keep that in mind when viewing Expected Value. It is still a work in progress. (For example, an obviously superior model would be “Dynamic EV”, which incorporates how the values of events change as the shot clock changes and how they change based on opponent.)
In the next post I will look at the defensive leaders in this metric from the 2010 playoffs.
Absolutely brilliant, Elgee! I’ve been hoping to move toward a “state-based” plus/minus, and this is even a better approach because of the added “states”.
Question: will you give all of the credit for each of these events to the player doing them, or will you split some out to the other players on the floor?
This is pretty amazing. How much work is it to code games to create the novel stats you use? Do you do all the coding yourself?
Thanks. It’s a decent amount of work, although with the DVR I have it down to about an hour a game. I’m actually in the process of improving the data-collection process to make it easier in the future.
Yes, all the coding is me so far. I’d love to change that in the future though. 🙂
A lot of the values are very similar to those that I am using for ezPM.
http://thecity2.com/2010/12/28/ezpm-1-0-now-with-play-by-play-data/
Evan – just checked out the site and ezPM. Very cool stuff. Nice to see a model like this for an entire season.
I’m loving seeing it gradually become clear how ambitious you are being with your tracking analysis. Wow.
This “Help Needed” stat you mentioned, I haven’t seen you speak of it before. Clearly this cannot simply be the defensive error factor you’ve discussed already right? What is it?
Also, while what you’re doing is absolutely valuable, I get nervous whenever I see weights attached to steals and blocks. I worry that you might actually come up with something less valuable when you take blocks at face value and don’t try to account for the effect of block intimidation.
Obviously it’s impossible to nail the intimidation factor, but when you push off “shots deterred” as part of optical tracking, it makes me think you can’t make any headway in that area. What are your thoughts there?
“Help Needed” values are derived from looking at the difference between guarded and unguarded shots from those places on the court. It’s the “end” of the Opportunity Created, if you will. I intended to write about it last month, but never did. You’re right though, perhaps it deserves its own post.
–
Steals and blocks are tricky. Steals (a type of Forced Turnover), I’m more comfortable with because I view them as taking the possession (the defensive counterpart of an offensive turnover). Missing a steal is covered because the player is hit with a defensive error, so it’s not a free lunch.
Blocks I view as a defended FGA on steroids (because it literally has no chance to go in). So the block is a bonus of sorts. I did some math to try and ballpark a value on that, but yes, great point, blocks are hazy.
re: deterrence. I’m not overly concerned about shot-blockers “intimidating” shots because those same players still end up having a boatload of shots attempted against them in the lane. So Dwight Howard’s deterrence of shots doesn’t seem that different from Bruce Bowen’s, who can force you to pick up your dribble and sit on your shooting hand. It’s a similar problem for me, and I don’t have a good quantification method to handle it, even simplistically.
Always impressed at how quickly you find the gray areas Matt – love the feedback.
This is amazing work. The only thing I’m wondering about is varying values of different types of field goal misses. For example are 3 pt attempts recovered by the offense at the same rate as 2 pt attempts? What about post-ups vs. spot up jumpers? I don’t have any data supporting this just a hunch and I’m not really sure if it’s quantifiable but it’s something I’ve always wondered about.
Thanks Robby.
There was a debate on realgm about the ORB% of 3-pt shots vs. 2-pt shots. Don’t remember seeing anything historically (looking at 3-pta league-wide vs ORB%) that indicated it mattered. Then again, if someone has a great play-by-play script, he/she could check the data from this year.
I too have wondered about the relevance of court location in these matters. Good thought!
Re: Matt above, I’m more concerned when people weigh box-score stats by the ‘indicated’ value than the ‘observed’ value.
By that I mean, I prefer an approach that says, ‘a blocked shot creates a miss, and that’s where you find the value of blocks’ over approaches that say, ‘blocks indicate and correlate with defensive performance, let’s regress and or theorize to find the value of blocks’.
This is probably all in the realm of the judgment call. Especially if you’re going to include both observed-value inputs and a regression element (like people do when they create coefficients for SPM models).
Splitting credit for assisted field goals is another example where different tools will give you different results. I don’t pretend to understand the recent posts on this subject – in Scott Sereday’s SPM posts, at EvanZ’s site – heck, I don’t even understand how ORTG for players splits it up! But there isn’t necessarily consensus for who should get how much credit, especially once you break it down by shot type (layups, 2 pointers, 3s).
@Greyberger
The ‘a blocked shot creates a miss, and that’s where you find the value of blocks’ to me is essentially is a measuring of effect vs measuring the cause. If you’re doing raw tracking like ElGee is doing, I think you can make a good argument that measuring effect is the best way to measure total impact, which is good to know. However, it’s also good to know what the specific causes were for that impact.
When doing SPM, there I think measuring cause over effect is definitely the right approach. The whole idea behind +/- statistics is an acknowledgment that there is just a heck of a lot more to the game than we can measure exactly, so I’d want at what the average value of a particular causal action is.
Really though the right answer is: Let’s keep breaking this stuff down further and further, and have people track it. You’d like to track the cause, the effect, and the relationship between the two. So, uh, ElGee, get on that would ya? 😉
“When doing SPM, there I think measuring cause over effect is definitely the right approach.”
I agree, but when you actually set out on a project like this, I betcha dilemmas like this pop up all the time. Take unassisted and assisted 3 pointers for example. If your regression includes those, you’ll find the unassisted variety much more valuable than regular catch-and-shoot threes. More valuable in fact than the sum of one assisted three _and_ the assist.
Do you want your model to include this relationship? Or do you politely ignore this new information and find a different basis for splitting credit that does sum assisted and unassisted scoring to the same value? Different authors will come to differing conclusions, I bet.
I agree that there are some decisions to be made along these lines. As I said to ElGee, I worry about the unevenness of difficulty between tracking counterparts. It’s easier to track the steal than the failed steal attempt, and thus SPM suffers.
In the end, I just love seeing anyone track any piece of information that is of value. I’d like to see it all get tracked, and then some day in stat heaven, we can regress to our heart’s content.
Great stuff. I am unclear though, on a single possession, can the offensive player be credited for a 3PM and his defender also punished for it? Because it seems like this would be double-crediting the amount of differential.
I’m sure you’ve thought of this, but I don’t see how it works out.
Also, I have to say, I think stuff like this and Synergy Sports is the future of advanced stats. There are certain things you could do to improve this per-player, such as giving a player more EV for drawing free throws if they’re a better free throw shooter (Chauncey Billups’ FTs are more valuable than Dwight Howard’s).
Thanks Austin. Absolutely agree about improving it per-player, and I have plans to do that in the future. The FT one has been on my mind from the beginning, but I begrudgingly went with the league average. I have a simple fix for future iterations that will make it more player-specific.
re: double-credit. This is how I think about it: Over 100 possessions, we’d expect 107 points for Team A’s offense, which means Team B’s defense gave up 107 (those are the baselines).
But, if someone scored a bucket on every possession, Team A has 200 points (wow, what a game!). Team B surrendered 200 (let’s call them the “Knicks”). If it’s the same players involved, every time, Player A is +93 over expected value (the +0.93 value for a 2) and player B is -93 under expected value (the -0.93 value for allowing a 2).
So I don’t see the “double”-credit, but feel free to tell me if that’s not quite what you were getting at.
Okay, that makes sense. I think I may have been confusing +/- with differential, or something like that.
In a similar vein, but more just out of curiosity, do the defensive numbers and offensive numbers match up? In a typical game, will team B’s defensive EV be the opposite of team A’s offensive?
[…] A brief primer on Expected Value (something that will be familiar both to stat heads and poker players), and one writer saying […]
I’m puzzled as to why a charge is more valuable than a forced TO. Seems to me it would be the opposite as a charge necessarily results in a dead ball while forced TOs sometimes result in open floor opportunities. Along this vein, aren’t some possessions more valuable than others? i.e. the EV of a possession after a made FT is (I would presume) lower than the EV of a possession after a steal. Not sure there would be an easy way to take this into account though.
And generally awesome ideas. The value that a player contributes to each possession on each end of the floor is, I believe, where really useful statistical analysis ultimately lies.
If it were linear, a charge should be a TOV+PF = 1.37. The reason it’s only 1.20 is probably for exactly the reason you gave, that a TO results in more efficient offensive possession.
As Evan said, it’s because of the PF as well.
As mentioned in the post, the biggest change that I’d like to see happen in the future is making the value of events dynamic — so a steal might be 0.80 if it occurs with 2 seconds left on the shot clock near the baseline, but worth more if it’s early in the clock and leads to a runout.
Evan, how did you derive your value of the foul? I’d love more data on that…
Good question about the foul. I actually only debit players when the foul leads to a free throw, and use the same -0.5 per made FTA. So, for example, if you foul someone, and they hit one free throw, no deductions are made.
In theory, I could debit a player when he commits a foul that doesn’t lead to FTA (and credit the player who draws the foul), but I did not have a good idea of what that value should be. You suggest it is 0.3, so maybe I can incorporate that into my model, and see if it makes a difference.
I use 0.30 because there are two events: the team foul toward the penalty, and the personal foul on the individual. My estimate is based on the value of a starter over a bench player for the average period of time someone sits in “foul trouble” (about 0.12 or 0.13 – That’s why the charge is worth extra.) As I said, ball-parking this is fuzzy to me, so I’m open to analyzing much larger data sets.
One question I have, which might be impossible to track at a certain point, relates to the assist value. Assists can be very different depending on whether that assist was created by the ball handler or the player receiving the pass.
A dribbler, like Nash or Rose, penetrates to the lane, drawing a help defender away from their primary assignment, then kicks the ball out to the player who was being defended by the player who the dribbler just pulled into the help defense.
Another player gets away from his defender and the passer sees that and passes it to the open player.
Both players make the shot, both passers are credited with the assist, but in one case the recipient created his own shot whereas in the other the ball handler created the shot for him.
Thoughts?
Kelly, you might want to check this out: https://elgee35.wordpress.com/2011/02/18/opportunities-created-oc/
The estimate for the actual value of the different events based on these opportunities created are found in the “helped” values at the bottom of the offensive section. (I’m concerned with OC more than assists.)
That said, I have the numbers on actual points created – versus the *opportunity* created – so I do think it’s worth tracking and have considered it. It averages close to the OC values, depending on teammate quality. Couldn’t tell you the values strictly based on “assists” though – I’m not wild about assists as a stat.
[…] Comments « Expected Value (EV) […]
And yet all this information is still useless. Defense isn’t based solely on individual ability. It is based on the team. That is why the Celtics, Bulls, Lakers and Spurs seem to always pick up the right players. Not because they have better defensive analytics, it is because they have better coaching and higher accountabillity on the defensive end. If you have ever played basketball at a serious level, you would understand this. Here are the stats you need to be aware of, free throw attemps, offensive rebounding percentage, and turnovers. Look at all the data you want, but I would make a completely uneducated guess that 65% of the teams that win these categories win the game.
I strongly disagree. There’s never been an elite defensive team that I know of that hasn’t had defensive stars. The team/coaching component is important, as I’ve mentioned, but even just with this basic approach, the correlations are good. Individuals still need to guard, rotate, use size, etc.
“And yet all this information is still useless. Defense isn’t based solely on individual ability.”
A team is a group of individuals, so clearly, there should be some correlation between individual defensive talent and team-level outcomes. We can argue how much is due to the players vs. the system, but to completely dismiss individual stats would be as foolish as to completely dismiss the notions that there might be interaction between teammates and/or a system-level effect (i.e. coach). (Then again, I like the idea that I might have a shot at playing with the Bulls – whose coach could mold me into a defensive sensation.)
The hard part, of course, is to figure out which stats correlate more highly with the individual than the system (i.e. those that might stay with the player even when switching teams). And create new stats, whenever possible, that improve that capability.
Useless? That’s an absurd statement.
Like so many people who reject advanced statistics, you’re just approaching this wrong. You look for one thing that’s unaccounted for, find it, and then throw away the stat. The useful way to approach any stat is to see what it can and cannot give you, and then use it for the appropriate question while giving it the appropriate level of authority.
While it’s possible that a stat can be so flawed it’s unusable, what ElGee is doing here is simply compile the obvious atoms of defensive play and put them in some reasonable equations. If you don’t think NBA coaches are having trackers do some analogous things, then you haven’t been paying attention to the developments of the past decade.
[…] Important inroads into statistically breaking down aspects of basketball that I never thought could be quantified. […]
[…] A brief primer on Expected Value (something that will be familiar both to stat heads […]
[…] A brief primer on Expected Value (something that will be familiar both to stat heads […]
Great work!
I’m surprised the difference between a turnover and and a regular shot is that low. I would have guessed a turnover is FAR more detrimental than any shot. A turnover is a 0% chance of scoring + 0% chance rebound, while a bad shot might be .40 efg in + 25% off rebound, and a good shot might be .60 efg + 25% off rebound. The gap between a TO and everything else is huge. But the biggest difference between teams in determining overall ranking comes down to eFG more than TOV% still just cause everyone is in closer range for TOV%
Thanks Julien and welcome!
You are correct — as I’ve mentioned, there is a simplicity to this approach, especially with regards to turnovers. For future iterations, I’d love to include more data on turnovers, since we know, depending on the type of turnover, offensive efficiency going the other way increases due to the transition.
Then again, turnovers at the end of the shot clock (or shot clock violations, or 3-second violations, or charges…) don’t give the opponent transition chances, they just end the possession without a shot opportunity.
[…] the last post, we looked at the leaders in Expected Value (EV) on the defensive side of the ball for the 2010 playoffs. Not surprisingly, Dwight Howard was […]
Maybe I’m missing something, but why does helped layup, helped 2FG, helped 3FG, and helped FTA produce a negative value? What does a “helped” attempt mean?
It’s when a teammate creates that open opportunity. A “helped” layup is when Nash draws a double team and gives it to a now open Robin Lopez for an open layup. The negative value is an adjustment — Lopez still gets positive credit when we include the 2 points he scored on the layup.
Good question — I’ll have a post detailing the idea soon.
Ahhh, now I get it. Thanks for the response. That makes alot of sense-you shouldn’t reap the full credit when all you have done is catch a pass and score a lay-up.
[…] We’re on our way, with more advanced optical tracking. With more thought put into ideas like the definition of Expected Value (a phrase borrowed from economics and mathematical analysis of poker) in regards to the NBA, it may […]