Archive for the ‘Theory’ Category

I’m a big fan of On/Off data, which compares a team’s point differential with a player on the court versus when he’s off the court. I’ve referenced it frequently in the past and think it’s one of the more telling reflections of a player’s value to his team in the advanced stat family.

The nice part about On/Off is that it represents what actually happened. The problem with On/Off is it ignores the reasons why it happened. And sometimes, it creates a fuzzy picture because of it.

For example, let’s suppose Kobe Bryant plays the first 40 minutes of the a game and injures his ankle with the score tied at 80. LA wins the game 98-90. The Lakers were dead even when he was in the game, and +8 with him out of the game – Bryant’s on/off would be -8.

In this case, sample size is an issue. But that becomes less of a problem over the course of an entire season. The real concern is the normal variance involved in everyone else’s game. Practically speaking, it takes little outside the norm for Kobe to have played 40 brilliant minutes while his teammates missed a few open shots, and for the opponent to miss a few open shots down the stretch while Kobe’s teammates start hitting them.

The tendency is to look at a result like that and conclude that Kobe hurt his teammates’ shooting and when he left the game it helped their shooting. He very well may have by not creating good looks for them.

Then again, players hit unguarded 3-pointers about 38% of the time. Which means if the average shooter attempts five open 3-pointers, he will miss all five about 10% of the time, simply based on the probabilistic nature of shooting. A fact that has little to do with Kobe or any of the other players on the court.

In our hypothetical situation, all it takes is an 0-5 stretch from the opponent and a 3-5 stretch from LA to produce Kobe’s ugly -8 differential. The great college basketball statistician Ken Pomeroy ran some illuminating experiments on the natural variance in such numbers. His treatise is worth the read, but the gist of it is that his average player — by definition — produced a -57 on/off after 28 games (-5.7 per game) due to standard variance in a basketball game outside of that player’s control. Think about that.

For fun, I just ran the same simulation and my average player posted a +5.6 rating of his college season:

Average Player Simulation

So in two simulations, the average player’s On/Off ranged from -5.7 to +5.6. One guy looks like an All-Star, the other like an NBDL player.

“The Team Fell Apart When Player X Was Injured”

This is a common argument for MVP candidates: Look at how the team fared when he missed a few games and conclude the difference is the actual value a player provides to his team. Only this line of thinking runs into the same problems we saw above with on/off data.

Let’s take Dirk Nowitzki and this year’s Dallas Mavericks. In 62 games with Dirk, Dallas has a +4.9 differential (7.8 standard deviation). In nine games without Dirk, a -5.9 differential (7.5 standard deviation).

Which means, with a basic calculation, we can say with 95% confidence that without Nowitzki, Dallas is somewhere between a -1.0 and -10.8 differential team. Not exactly definitive, but in all likelihood they are much worse without Dirk. OK…but we can’t definitively say how much worse they are.

In a small sample, we just can’t be extremely conclusive. In this case, nine games doesn’t tell us a whole lot. New Orleans started the season 8-0…they aren’t an 82-win team.

We can perform the same thought-experiment with Dirk’s nine games that we did with Kobe’s eight minutes to display how unstable these results are. Let’s say Dallas makes three more open 3’s against Cleveland and the Cavs miss three open 3’s. What would happen to the differential numbers?

  1. That alone would lower the point differential two points per game.
  2. Our 95% confidence interval now becomes -12.1 points to +4.4 points.

That’s from adjusting just six open shots in a nine game sample.

Jason Terry — a player who benefits from playing with Dirk Nowitzki historically — had games of 3-16, 3-15 and 3-14 shooting without Dirk. He shot 39% from the floor in the nine games. By all possible accounts, Terry is better than a 39% shooter without Nowitzki. He shot 26% from 3 in those games. Let’s use his Atlanta averages instead, from when he was younger and probably not as skilled as a shooter: How would that change the way Dallas looks sans-Dirk?

Well, suddenly Terry alone provides an extra 1.7 points per game with his (still) subpar shooting. The team differential is down to -2.2 with a 95% confidence interval of -10.4 to +6.1. Just by gingerly tweaking a variable or two, the picture grows hazier and hazier.

Making Sense of it All

So, what can we say using On/Off data? It’s likely Dallas is a good deal better with Dirk Nowitzki. But, hopefully, we knew that already.

To definitely point to a small sample and say, “well this is how Dallas actually played without Dirk, so that’s his value for this year” ignores normally fluctuating variables — like Jason Terry or an open Cleveland shooter — that have little to do with Dirk Nowitzki’s value. So while such data reinforces how valuable Dirk is, we can’t say that’s how valuable he is.

We can’t ignore randomness and basic variance as part of the story.

Read Full Post »

Some time ago, I began wondering what the value of certain actions in a game were. How did they translate to points, since points determine who wins based on the rules of the game?

So much goes in to scoring points: first one needs possession of the ball. Well, stop right there. How much is possession of the ball worth? On average, about 1.07 points. How much is a turnover worth, then? If it’s the loss of a possession, it should be worth about -1.07. Then what is the value of a missed field goal? Well, it’s failing to score on a possession, but on average, 26% of the time the offense still recovers the rebound, so it’s not quite as bad as a turnover. So 1.07 * 0.74 means a missed field goal is approximately -0.79 points. And so on…

I call this model “Expected Value,” as a tip of the cap to my poker background. Viewing actions this way isn’t new — PER uses value of possession concepts — but there are stats in play here that haven’t been used before.

A huge component of this rating is introducing defensive statistics in order to ballpark players defensive performances. On the offensive end, its major novel component is accounting for distributing the share of offensive scoring between creators and those they created for. The “helped” elements in the table below are an estimate of how much credit should go to the creator of an open layup, shot or layup foul (one of those fouls in which the player is intentionally fouled from behind to prevent a dunk).

Since EV is using novel stats from my stat-tracking (linked to in the tables below), it’s not even 400 games old (stat-tracking from the 2010 playoffs and 2011 regular season). Nonetheless, on the last round of correlations I tested, the correlation coefficients were as follows:

  • Offensive EV to ORtg: 0.97
  • Defensive EV to DRtg: -0.80
  • Expected Value to Overall Efficiency: 0.91

Interesting correlations given that the model is built around causality. And the correlations are even stronger when including Help Needed, the defensive counterpart to Opportunities Created. Without further ado, here are the marginal values used for EV. First, the defensive values:

Event Marginal Value
3-point FG Against -1.93
2-point FG Against -0.93
Defensive Error -0.65
Shooting Free Throw -0.47
Charge 1.20
Forced Turnover 1.07
Missed FGA Against 0.79
Defensive Rebound 0.28
Block 0.15

Offensive Values:

Event Marginal Value
Made 3-pt FG 1.93
Made 2-pt FG 0.93
Offensive Rebound 0.79
Opportunity Created 0.50
Made FT 0.47
Foul Drawn 0.30
Assist 0.30
Turnover -1.07
Missed FGA -0.79
Missed FTA -0.40
Helped Layup -0.70
Helped 2-pt FG -0.37
Helped 3-pt FG -0.35
Helped FTA -0.27


Event Marginal Value
Technical Foul -0.76

What’s Missing

Astute observers will notice there is absolutely no accounting for screens. Most players set similar screens, with a few outliers on both ends, generally determined by size. Comparably, most defensive players handle screens similarly (I’ve yet to see the player who can run through a screen), although some outliers hedge them better than others.

Which leads to another, difficult to quantify issue: spacing. This is also a small issue in most cases, but great shooters will prevent defenses from collapsing too much into the lane. It’s extremely hard to quantify how often a defender is reluctant to sag off of a shooter, especially since most players will double and rotate appropriately even if they are guarding Ray Allen.

Some other major elements still missing that might be possible to quantify in the future with devices like optical tracking:

  1. Shots deterred
  2. Quality of “closeouts”
  3. How long a player holds the ball and mucks up an offensive possession

A final note: Player ratings and comprehensive metrics are often polarizing. Fans tend to cling to metrics they intuitively like or ones that their favorite players do well in, or they tend to ignore metrics for converse reasons. Both extremes often miss what exactly it is the metric is representing. I’ve written about the major players in the advanced stat community before, and my hope is that people keep that in mind when viewing Expected Value. It is still a work in progress. (For example, an obviously superior model would be “Dynamic EV”, which incorporates how the values of events change as the shot clock changes and how they change based on opponent.)

In the next post I will look at the defensive leaders in this metric from the 2010 playoffs.

Read Full Post »

There are a few ways offenses end up with unguarded shots, either in transition, off of screens and when defenses make errors. The thinking here is (deliberately) quite simple: defenders should be guarding someone. If they aren’t, they should be rotating or hedging a screen to resume guarding someone. The goal is to not give up open shots, which is always trumped by the goal to not give up open layups.

I categorize defensive errors in two ways:

  1. A blow-by
  2. A missed rotation

The first is an error in man defense, when one’s man beats them to the rim – “blows by them,” in hoops vernacular – either with the ball or off the ball. It’s a “blow-by” when the defender is no longer engaged in guarding the player. The second is an error in team defense, when players fail to logically rotate to an open defender, or even worse, don’t rotate to the rim to prevent an open layup (the majority of “missed rotations”).

In either case, two players can receive half an error each if two players were equally involved in the error. For a blow-by, this is simply when a double-team is split and left in the dust, but for a missed rotation it is equally blaming two proximal defenders who could have rotated to an open man and did not. It’s common for half missed rotations to take place when two players incorrectly rotate to the same shooter on the perimeter and fail to collectively protect the rim against an open layup. (They usually then start pointing at each other or looking around in befuddlement. The lesson: defensive communication is important!)

Defensive errors essentially create a power play — similar to the power play we saw in opportunities created — that spikes the opponent’s offensive efficiency by nearly 50%. In the simplest terms, tracking defensive errors is assigning responsibility to players who give up open shots when they otherwise shouldn’t.

As is the case with assigning credit for an assist, there is a gray area (mostly involving whether a player could have rotated and didn’t).

Some examples of defensive errors:

  • Being beaten off the dribble and no longer being within reach of the dribbler (BB)
  • Being backdoor cut on the wing without staying with the cutter (BB)
  • Being run by on the way down the court by your man (BB)
  • Failing to rotate to the basket when the screener rolls free as his defender double-teams the dribbler (MR)
  • Failing to rotate to a man to box out after a similar defensive scramble (MR)
  • Staying in the backcourt (cherry-picking) and not running back in a reasonable time to defend anyone (MR)

Through March 15 (162 team games tracked), here are the players with the  most defensive errors in the 2011 regular season (minimum 300 possessions played):

Statistics are per 100 possesions; BB = Blow By; MR = Missed Rotation

And the players with the fewest defensive errors:

Statistics are per 100 possesions; BB = Blow By; MR = Missed Rotation

*Qualifier players for leaders listed in this post play for: Atlanta, Boston, Chicago, Dallas, Denver, Indiani, LA Clippers, LA Lakers, Miami, New Orleans, New York, Oklahoma City, Orlando, Phoenix, Portland, San Antonio, Utah. Remaining teams don’t have 300+ possessions in the 2011 database.

For those wondering, the correlation coefficient between defensive errors and team defensive rating this year is about 0.35.

Read Full Post »

In 1974, the NBA started tracking steals. And apparently, they thought that was a sufficient measure of forcing turnovers on defense, because they haven’t added anything related in their box score since.

The easiest measure of forcing turnovers is to track offensive fouls drawn. Hoopdata provides charges taken, although nothing is listed for the 2011 season. In my stat-tracking, I note any offensive foul drawn, excluding the moving screen.

In last year’s playoffs, the average player drew 0.31 offensive fouls per 100 possessions. In tracking games this year, that number is 0.49/100 possessions (there was a lot of good offense in the 2010 postseason, despite the NBA Finals). Here are the leaders in “charge” rates — offensive fouls taken per 100 possessions — from last year’s playoffs:

Nick Collison was hitting the deck like a sailor in the Thunder’s six games against LA. Derek Fisher drew the most total charges, with 14. A familiar name for those who are abreast to charge-related statistics is Glen Davis, who was this year’s NBA leader at the last unofficial count I heard.


There are other ways to force turnovers on defense that don’t reach the box. There are two in particular that I track and both have to do with deflecting the ball in a manner not registered as a steal:

  1. Knocking the ball off an offensive player and out of bounds
  2. Knocking the ball away as to force a shot clock violation

The second method is inherently less valuable because it has to happen near the end of the shot clock, when the value of the possession is already reduced. Nonetheless, both are quite easy to keep track off and add to the overall picture of a player’s defensive ability to force turnovers. These kinds of forced turnovers occur at a rate of about 0.30 per 100 possessions.

Along with steals,we can combine all of these into one defensive measure for “forcing turnovers.” Below are the leaders from the 2010 playoffs (league average was 2.09/100):

Stats are per 100 possessions

Obviously, this is quite a different list than the one portrayed by looking only at steals. Here are the complete leaders from the 2010 playoffs in non-traditional turnovers and total forced turnovers.

Read Full Post »

I was excited to see the topic of fouls receive some attention at last week’s Sloan Conference in Boston. Although I’m not sure how I feel about the methodology (confusing) and conclusions (potentially confounded) of that paper. Nonetheless, fouls are a small part of the game that are often overlooked in analysis.

Turns out that drawing fouls is a really good thing. And committing fouls — specifically, shooting fouls — is really bad. Nothing revolutionary there.

On offense, drawing a foul has two effects:

  1. Brings a team closer to the penalty
  2. Causes foul trouble for opposing starters

When a player is in foul trouble, he loses minutes he would otherwise be on the floor (unless he plays for Don Nelson, apparently). Usually, this is on order of 5-10 minutes, as a player sits for a period before he is no longer in “foul trouble.” Occasionally, extreme cases render a player inactive for longer, like Dwight Howard in last year’s first round against Charlotte. Howard averaged around 26 minutes a game when he otherwise would have been playing closer to 40. When starters sit, they are replaced by bench players, who (theoretically) represent a downgrade.

The penalty represents a larger advantage for teams. Every foul before the penalty is 25% of the way to the automatic bonus for a team. Once in the penalty, any foul on the court produces two free throws for a team, which is the most efficient form of offense: The value of an average possession in the NBA is about 1.07 points. The value of two free throws is about 1.52 points.

On the team level, the correlation between fouls drawn per 100 possessions and ORtg is quite strong: 0.56 for last year’s playoffs. For this metric, a “foul drawn” (FD) is only counted when a player is fouled on offense. Setting screens and intentional fouls are excluded.

Here are the leaders from the 2010 playoffs in fouls drawn per 100 possessions, with free throw attempts/100 included as reference:

Clearly, there is a strong correlation between free throw attempts and fouls drawn. This allows a fairly accurate estimate of FD using FTA. However, as is the case with Opportunities Created and assists, it is the outliers who are often the most interesting. Someone like Dwight Howard shoots far less free throws than expected based on the number of fouls he draws because he’s constantly being banged around before the act of shooting, to prevent lobs or on offensive rebounding situations. Here is the full list of players ranked by FD from the 2010 playoffs who played at least 150 possessions.


On defense, committing personal fouls isn’t terribly detrimental to the team. For one, there’s a limit of six per game, and as discussed above, players will simply head to the bench if they foul too much. There is almost no correlation between personal fouls and team defensive rating.

However, there is a correlation between Shooting Free Throws (SF) and defensive rating (0.44 after 104 team games of tracking this year.) This information can be extracted from the play-by-play for comprehensive analysis by noting how many free throws a player gave to the other team by fouling. (eg 3 SF result from fouling on a 3-pointer.)

Again, free throws are the most efficient mode of scoring, so sending a player to the line is spiking the opponent’s offensive efficiency as described above. In short, shooting fouls are bad.* Using the 150 possession qualifier, here is the complete list of players from last year’s playoffs who caused the most free throws for the opposition (SF per 100 possessions).

*The obvious exception is “intentional” fouls to prevent layups or easy attempts around the goal from horribly inefficient free throw shooters.

Read Full Post »

As discussed before, True Shooting percentage is an estimate of points per shot. But it’s not exact, counting a free throw attempt as 0.44 shots. Why isn’t a free throw 1/2 a shot, you ask? Because of “And One” opportunities, when someone scores and is fouled for one extra bonus free throw. In Marv Albert’s language, it’s known as “Yes, and it counts.”

These are bonus chances after a successful conversion, so to count these free throws as half an attempt would actually be penalizing players for drawing an And One and missing compared to players who never drew the foul at all. To obtain a precise measurement of points per shot (PPS), we’d have to differentiate between And One free throw attempts and the conventional trips to the line. Without doing that, the 0.44 coefficient minimizes error across the league between points per shot and True Shooting percentage.

So how much can TS% be off by measuring PPS? Mathematically speaking, we can observe the following:

  • Free Throw percentage essentially does not affect TS% accuracy.
  • The ratio of And One FTA to total FTA affects TS% accuracy. 12% is perfect accuracy. The smaller the ratio, the more TS% will overestimate PPS. The larger the ratio, the more TS% will underestimate PPS.
  • The ratio of FGA/FTA slightly compounds TS% accuracy. The more free throws taken relative to field goals, the more TS% errors are magnified (both overestimating or underestimating PPS).

The 0.44 coefficient used for FTs in the TS formula is designed to minimize these errors as much as possible. It does that well across the league, But obviously, not all players have the same frequency of 3-point play opportunities.

The way to generate a truly accurate percentage would be to comb through play-by-play data and separate And Ones from other free throw attempts. In 2005 and 2006, 82games provided some And One data we can look at for an idea of how accurate TS% is among high-volume players. PPS/2 is points per shot divided by two, which is what TS% is trying to measure:

% of And1s is the percentage of total FTA that are And1 FTA. PPS/2 is points per shot divided by 2, which is the what TS% is trying to approximate. Error is the difference between the actual player efficiency and his listed TS% using 0.44 for free throw attempts.

As we can see, TS errors are generally small. In the games I’ve tracked this year, Wade and Bryant have And One ratios of around 12% (0.1% error for both) and James is just over 8% (for an overestimation of 0.3%). It would be nice to add And Ones to box scores, or completely track them in play-by-play, but in the meantime, TS% does a great job approximating points per shot.

Read Full Post »

The traditional stat for offensive usage looks at how often a player either shoots or turns the ball over. Which means passing and creation isn’t factored into any a player’s “usage.” Using Opportunities Created, a measure of offensive creation, it’s possible to estimate a player’s contribution, or “offensive load:” the percentage of possessions a player is directly or indirectly involved in a true shooting attempt, or commits a turnover.

In other words, the higher the offensive load, the greater the role in the offense. It’s a good way to see who really is “carrying” an offense, so to speak. Here’s an example breakdown of a team’s offensive load, using last year’s Los Angeles Lakers in the playoffs:

The mathematically inclined might be asking: “doesn’t this mean a team’s total load can exceed 100%? Absolutely – which is a reflection of what is being measured. (The league average per team is about 130 per 100 possessions.) Basketball is a team game, and this method is allotting credit not only to the shooter but also to the creator.

Here are the breakdowns by position from last year’s playoffs:

As we might expect, guards carry the greatest load (they have the ball the most). Centers the smallest share. Not all too different from the traditional usage metric. Below are last year’s leaders in the playoffs compared to their usage rate:

It seems that the most active offensive players make something happen on about half of the possessions they play. “Something,” in this case, being a shot attempt, creating a shot attempt or turning the ball over.

Here is the complete list of load leaders from the 2010 Playoffs of players who logged over 150 possessions.

Read Full Post »

Older Posts »