Posts Tagged ‘Statistical analysis’

I’m a big fan of On/Off data, which compares a team’s point differential with a player on the court versus when he’s off the court. I’ve referenced it frequently in the past and think it’s one of the more telling reflections of a player’s value to his team in the advanced stat family.

The nice part about On/Off is that it represents what actually happened. The problem with On/Off is it ignores the reasons why it happened. And sometimes, it creates a fuzzy picture because of it.

For example, let’s suppose Kobe Bryant plays the first 40 minutes of the a game and injures his ankle with the score tied at 80. LA wins the game 98-90. The Lakers were dead even when he was in the game, and +8 with him out of the game – Bryant’s on/off would be -8.

In this case, sample size is an issue. But that becomes less of a problem over the course of an entire season. The real concern is the normal variance involved in everyone else’s game. Practically speaking, it takes little outside the norm for Kobe to have played 40 brilliant minutes while his teammates missed a few open shots, and for the opponent to miss a few open shots down the stretch while Kobe’s teammates start hitting them.

The tendency is to look at a result like that and conclude that Kobe hurt his teammates’ shooting and when he left the game it helped their shooting. He very well may have by not creating good looks for them.

Then again, players hit unguarded 3-pointers about 38% of the time. Which means if the average shooter attempts five open 3-pointers, he will miss all five about 10% of the time, simply based on the probabilistic nature of shooting. A fact that has little to do with Kobe or any of the other players on the court.

In our hypothetical situation, all it takes is an 0-5 stretch from the opponent and a 3-5 stretch from LA to produce Kobe’s ugly -8 differential. The great college basketball statistician Ken Pomeroy ran some illuminating experiments on the natural variance in such numbers. His treatise is worth the read, but the gist of it is that his average player — by definition — produced a -57 on/off after 28 games (-5.7 per game) due to standard variance in a basketball game outside of that player’s control. Think about that.

For fun, I just ran the same simulation and my average player posted a +5.6 rating of his college season:

Average Player Simulation

So in two simulations, the average player’s On/Off ranged from -5.7 to +5.6. One guy looks like an All-Star, the other like an NBDL player.

“The Team Fell Apart When Player X Was Injured”

This is a common argument for MVP candidates: Look at how the team fared when he missed a few games and conclude the difference is the actual value a player provides to his team. Only this line of thinking runs into the same problems we saw above with on/off data.

Let’s take Dirk Nowitzki and this year’s Dallas Mavericks. In 62 games with Dirk, Dallas has a +4.9 differential (7.8 standard deviation). In nine games without Dirk, a -5.9 differential (7.5 standard deviation).

Which means, with a basic calculation, we can say with 95% confidence that without Nowitzki, Dallas is somewhere between a -1.0 and -10.8 differential team. Not exactly definitive, but in all likelihood they are much worse without Dirk. OK…but we can’t definitively say how much worse they are.

In a small sample, we just can’t be extremely conclusive. In this case, nine games doesn’t tell us a whole lot. New Orleans started the season 8-0…they aren’t an 82-win team.

We can perform the same thought-experiment with Dirk’s nine games that we did with Kobe’s eight minutes to display how unstable these results are. Let’s say Dallas makes three more open 3’s against Cleveland and the Cavs miss three open 3’s. What would happen to the differential numbers?

  1. That alone would lower the point differential two points per game.
  2. Our 95% confidence interval now becomes -12.1 points to +4.4 points.

That’s from adjusting just six open shots in a nine game sample.

Jason Terry — a player who benefits from playing with Dirk Nowitzki historically — had games of 3-16, 3-15 and 3-14 shooting without Dirk. He shot 39% from the floor in the nine games. By all possible accounts, Terry is better than a 39% shooter without Nowitzki. He shot 26% from 3 in those games. Let’s use his Atlanta averages instead, from when he was younger and probably not as skilled as a shooter: How would that change the way Dallas looks sans-Dirk?

Well, suddenly Terry alone provides an extra 1.7 points per game with his (still) subpar shooting. The team differential is down to -2.2 with a 95% confidence interval of -10.4 to +6.1. Just by gingerly tweaking a variable or two, the picture grows hazier and hazier.

Making Sense of it All

So, what can we say using On/Off data? It’s likely Dallas is a good deal better with Dirk Nowitzki. But, hopefully, we knew that already.

To definitely point to a small sample and say, “well this is how Dallas actually played without Dirk, so that’s his value for this year” ignores normally fluctuating variables — like Jason Terry or an open Cleveland shooter — that have little to do with Dirk Nowitzki’s value. So while such data reinforces how valuable Dirk is, we can’t say that’s how valuable he is.

We can’t ignore randomness and basic variance as part of the story.

Read Full Post »

One common trend in basketball discussions is the misuse of statistics. Since most people lack any formal education in statistics, and since humans fall prey to all sorts of statistical phenomena — Gambler’s Fallacy, for example — it never hurts to reiterate that statistics only capture what they are trying to capture. Nothing more, nothing less.

So how should we interpret the box score in basketball?

The first thing to remember is that basketball is a game of possessions. Adjusting for pace and minutes played is important in normalizing statistics. We want to compare numbers on a level playing field. Without normalizing, comparing raw statistics would be like comparing the speed of two runners in kilometers/hr versus miles/hr, or the averages of an NBA player after halftime versus per game.

Playing more minutes/possessions is indeed meaningful, but not that meaningful. While it’s better to have a star on the court for 90% of the game than 80% of the game, remember that fatigue is an issue and that in some situations, 2 mpg can be explained away simply because one player sat during garbage time of more blowouts.

Furthermore, if two players are both 10 points better per 100 possessions than their backups, the 2 mpg difference will result in a 0.25 increase in team efficiency at today’s pace (~93 possessions per game). In other words, with two superstars of the same value with the same quality backup, one would need to play eight minutes more per game to raise his team’s efficiency by a single point.

As for the specific figures from the box:


Points are not the be-all end-all that some make them out to be, but more importantly they aren’t even a true measure of “scoring ability.” They are, actually, just a measure of points. A number of variables go into a player scoring a basket, primarily who his teammates are and how successful the opposing defense is in defending him.

Think of the multitude of ways to score that don’t represent the same skill set or level of difficulty:

  • Making a contested perimeter jumper off the dribble
  • Scoring in the post against a double team
  • Driving by one’s man — with or without a screen — and finishing at the rim

All of these are ways individuals create offense. They don’t really have the help of their teammates. But then again, sometimes the above actions don’t result in scores, and we see players:

  • Shooting open jumpers or 3-pointers
  • Scoring on an uncontested putback
  • Getting an open layup or dunk

These three situations were assisted by the creation of other people’s offense. The creator drives and kicks to the open shooter because the defense had to sag. The offensive rebounder is suddenly alone because his man went to double a post threat. And if someone gets to play with a great creator who draws defensive attention, sometimes just by magically running to the hoop he will have an open layup or dunk.

Knowing how people generate their shot attempts (and subsequent points) is important in understanding them. Which leads us to…

FGA’s and True Shooting%

Points can’t be properly interpreted in a vacuum. A 30 point game isn’t too sexy if a player took 40 shots to get there. (However it might be more understandable if the player’s teammates shoot poorly as well and are unable to generate any offense. Both of these apply to understanding Allen Iverson’s statistics in Philadelphia.)

It’s standard to evaluate volume (FGA’s) in conjunction with efficiency (TS%), the thinking being that generating more good shot attempts is challenging and we should expect to see a corresponding dip in efficiency. Similarly, we would not expect a player shooting six times a game at high efficiency to maintain his shooting percentage if he were suddenly asked to generate more shots outside of the easy layups his teammates creates for him.


Rebounding % is the best statistic here, although it’s not in the box. Once we adjust for pace or calculate rebound % (an estimate of the rebounds of the total number of available misses in a game), one of the more difficult tasks is to determine impact rebounding. That is, who’s rebounding is truly helping their team more.

Taking boards from teammates instead of from the opposition isn’t terribly helpful. If all six of a guards rebounds are from misses that his teammates were in a position to grab, then his rebounding isn’t helping much. Then again, if all six of his rebounds were boards that the opponent otherwise would have grabbed, he’s contributing quite a bit to the differential. Measuring this is a difficult task.

The final note about rebounds is that positional adjustments and scheme are important to know. This matters quite a bit, as we want players under the hoop to grab more rebounds than players on the perimeter. And if teams don’t crash the glass on offense, we need to account for that in comparing offensive rebounding numbers to teams who do.


One of my least favorite stats. Ideally, we want to understand who creates more offense for teammates. Assists can often be simply passing the ball to a great shooter, a great scorer, or feeding a great post player. These passes border on rudimentary and can result in huge assist totals despite doing very little.

However, assists do a decent job of approximating creation. Some players have examples that swing one way or another. Rajon Rondo has huge assist numbers this season, but many of those are from making the “correct pass” versus creating for teammates. The creators often have many “hockey assists” from the extra passes that their double team draw, and as a result are creating more offense that their assist total would suggest.

Opposite of Rondo, we have someone like Deron Williams, creating more than the numbers would suggest. In last year’s playoffs, Williams tallied 12% more assists/75 possessions than Rondo and had a 2% higher assist rate. However based on my tracking, he created 248% more offense for his teammates than Rondo!


An important stat. Turnovers are worse than missed shots for two reasons: there is no available offensive rebounding opportunity (which results in a second chance 26% of the time) and turnovers often lead to decreased defensive efficiency because of the resulting fast breaks for opponents.

It’s important to know an individual’s role here again. We would expect more turnovers from someone who has the ball more. Turnover % attempts to capture this, and it’s a pretty good estimator from what I can gather.


Another good stat. These are almost the inverse of turnovers, because we know the defender pilfered a possession from the opposition and likely led to an odd-man break for his team. Steals are not a measure of being a good defender, as sometimes players cheat a lot or reach a lot to generate steals, and there is no current statistic for the number of times they are burned on failed steal attempts.

But steals are important and they are generally well-tracked. For instance, forcing a jump and winning it counts as a steal. Although there are times deflections for turnovers are missed.


Blocks is the final major stat in the box, but I find them somewhat deceiving. For bigs, blocks are, yes, just a measure of blocks. It’s a moderate estimate of that player’s presence at the rim (usually), but it’s not an indicator that someone is a defensive superstar. Tim Duncan has never finished higher than 3rd in blocks per game, but he certainly was the best post defender in the NBA on more than one occasion. For perimeter players, a small boost in blocks can often indicate some defensive prowess, but again, it’s not a major stat for determining great defensive play, and there isn’t too much inherent value in a block to begin with.

I will probably follow up with another post on some of the advanced statistics and how to treat +/- numbers. But the final piece of the puzzle is always glancing at the team’s stats and using them as context. How good is a team’s offensive or defensive rating? How well do they rebound? How many other scorers do they have? All of these must be factored in when understanding a player’s contributions. More on this later.

Read Full Post »