BuzzerBeater Forums

BB Australia > Draftee's Box Score Analysis RESULTS

Draftee's Box Score Analysis RESULTS

Set priority
Show messages by
This Post:
33
257837.1
Date: 4/12/2014 10:18:19 AM
Overall Posts Rated:
766766
Hi everyone

As you know I performed the Draft Box Score analysis recently. For a summary and discussion of what I did, see this thread

(256102.1)

The purpose was essentially to find out if there was a correlation between box score of a draftee and their final Player stats, regardless of how good or bad they were.

A big big big thanks to everyone whom contributed. This was a group effort from a lot of people in the Australian community and I truly thank you for giving me something to do over the last month haha

Also ive got a raw set of data that im going to publish somewhere and let everyone else play with. I will keep you notified of where this ends up. Also note that at this stage, im not taking in anymore data. im finished with this activity for this season.

So Ive got some results. The format of the results will be as follows:
1) A bit of a blog of all the success/problems that I encountered, and why some data was excluded/included.
2) Initial counts/box score averages, player stat averages, etc etc. Basically just some basic stats that don't delve into correlation
3) A simple Salary-Star correlation result.
4) Then I will present a matrix of the box score results vs Player stats results, as a Correlation Coefficient matrix (for those of you who don't understand Correlation Coefficient, back to year 10 maths for you!)
Essentially, things that have a high positive correlation have a value close to 1.0 and those that have no correlation tend to be close to 0
For example - The correlation between Kim Kardashian and men reacting with 'hooly dooly' is roughly a value of 0.9
Conversely, the correlation between Tony Abbot's Liberal government and sound economic policy, is around the 0.1 value.

5) There is a further note about the correlation values which ive made in the correlation matrix page.
6) Then I do a bit of interpretation. This is simply the things ive noticed with the data and really, are discussion starters.

7) Then I flattened the box score data by converting it into box score per 36 min values. (ie: divide alue by minutes played, times by 36). That way, we get more representative data and should help with the data providing accurate results.

Anyways that’s it! Hope you all get something out of this. Ive tried to be as 'neutral' as possible with my presentation. The correlation scores are the most 'interpretive' parts of this analysis, and everyone should really make up their own mind about this.

Please bare with me as a I complete this thread - there will be quite a few posts coming in the next hour or so.

This Post:
33
257837.2 in reply to 257837.1
Date: 4/12/2014 10:28:26 AM
Overall Posts Rated:
766766
Summary Blog
OK a bit of a summary of stuff.

All up I had 45 records that I was able to comfortably use.
That’s 45 sets of complete box score and player stats data, where I was able to 100% confirm that the draftee data matched the player data.

This in itself was an issue, because I had examples where there were 6 guys in the draft, that were 18 and 6'5. I had their box scores, but which one was which? This data unfotunately had to be excluded from the correlation but I included in other basic stats (like average box score values and average player stat values.

Even when more info was provided, I had a few examples where the age, height, position, potential AND star rating were both the same. Impossible unless the guy who drafted him ALSO supplied that box score. Which it never did.

So I was aiming for 50, 46 is pretty good. Its basically an entire draft. Is this reliable? Yer I think so. It gives a rough picture at least anyways. I think more records would have simply fine tuned the correlations, rather than provided drastic changes.

What was I hoping for? I think the thing I was hoping for was a clear cut message. Either it is or isnt. Something that surprised me early was that, the same player gets given a different box score for every team that scouts him. So his box score is different for EVERY single team. This gave me TWO box scores for some players, but the thing i noticed was that, even when the box scores were different, they were proportional. So the player shot 50% even if he did shoot 10 shots in game 1, but 20 i game 2.

Another thing that surprised me was that people didnt really understand what i was trying to achieve. User's sent in player data, but incomplete, other people were saying sorry they didnt capture the box score, so they didnt see the point in sending off player stat data - i was genuinely surprised that people didnt actually get what i was doing! haha. maybe they just didnt read all the thread. i do tend to dribble!

OK so something that worked REALLY well, was when 2 user's did a lot of scouting from the same league. And a lot of scouting, im talking more than 25 of the guys scouted. Less than this, the data become too hard to work with. One of my data suppliers from one league, had done some scouting of about 18 guys, and literally because of the nature of the draft, i couldnt use any of his data - that and people didnt give me player data but anyways.

I also worked out that i could capture the box score data in an easy manner. I won't go into this here, but if you want to know, just PM me.
Also on this, i realised afterwards that when people sent me this data, i essentially retrieved other information about them. Nothing bad, just things like who they had as buddies, things like that. But its probably something worthwhile noting if you are going to send off this data in this format in the future.

MS Excel - i never want to use you again.


This Post:
22
257837.3 in reply to 257837.2
Date: 4/12/2014 10:56:46 AM
Overall Posts Rated:
766766
Summary Data
As i said, 45 full and complete fully definable records were able to be used for the correlation.

But i was still able to get other things out from all the other data, which had some 150+ Box scores, and around 100 player stats.

(http://i980.photobucket.com/albums/ae282/regann0/Star-sal...)
OK so here we have just a listing of each Star rating, the end average salary of the player and the ACTUAL potential averages. (note - i should do a Star rating - salary range table. remind me would you!). I think the interesting part of this table, is the high average potential of a 4-star player.

(http://i980.photobucket.com/albums/ae282/regann0/Star-ave...)
This next picture shows the box score averages. In general, the higher the star player, the more points, rebounds, etc he will get. This basic summary is something i did early on to justify spending more time on the project, because if the star rating-box score averages showed an increasing trend, that would suggest that player stats would trend as well.

(http://i980.photobucket.com/albums/ae282/regann0/PlayerSt...)
This last picture is just an average of the end player stats i received. pretty even, except for ID and SB tend to be lower than the rest, suggesting that on average, the players have less inside defensive skills as a draftee.


This Post:
33
257837.4 in reply to 257837.3
Date: 4/12/2014 11:03:05 AM
Overall Posts Rated:
766766
Correlation

Notes about Correlation

So when I started to get some of the correlation values, I was surprised at how varied and low they were with some things. But then I remembered that, just because it’s a low relationship, doesn't mean its wrong right?

So I needed some kind of baseline, some kind of comparison correlation to see what the actual game says is correlated data. So I did a very small sample of my team's stats and last years player averages.
Now. I know this isnt 100% right, because I did training and it depends on quite a few things like gameshape, tactics etc.... - I just needed a guide really.

So I basically came up with the following guide for the Correlation Coefficient values that I felt meant something and these are what ive used in the draftee's to highlight relationships.

0 - 0.2 Pretty much no relationship
0.2 - 0.5 A decent relationship, but obviously nothing set in stone.
0.5 - 0.7 OK now we are talking about a consistent relationship
0.7 - 1.0 OUT OF THIS WORLD!

OK so this is just the basic Correlation matrix. every single box score item vs every single stat

Obviously SB and Minutes played have no correlation, but its all there. Ive bolded the items I think are relevent as per the above guideline.

(http://i980.photobucket.com/albums/ae282/regann0/Correlat...)

Note - ive just noticed that the picture is coming through really small, ill split it into two and re-post later on.

This Post:
11
257837.5 in reply to 257837.4
Date: 4/12/2014 11:04:45 AM
Overall Posts Rated:
766766
Per36min Correlation

So I starting thinking that stats for a guy who played 20 mins vs 40 mins is unfair, and i need normalised data. So i made the box scores into 'per 36 mins' values and ran the correlations again on that data. Here is the results.

(http://i980.photobucket.com/albums/ae282/regann0/Correlat...)

Again, small picture, will update later.

This Post:
22
257837.6 in reply to 257837.5
Date: 4/12/2014 11:15:37 AM
Overall Posts Rated:
766766
Interpretation

So, ive made my own conclusions which I would like to share as some discussion starters.
Taking a Myth-busters approach.

The following statement :
Box scores don't mean anything

I personally think ive busted this myth. Box scores DO mean something...... its not concrete, its not 100%, but they do mean something. Remembering that the highest value any stat will have is Respectable, its a scale of 7 numbers. And maybe not every element of the box score means something, but certainly looking at some of those correlations, I will be taking the box score into account with future decisions.

Assists
Really high correlation. obvious. No need for further discussion. EXCEPT FOR......

Turnovers
So we have this high correlation between passing and assists, yet why not a high correlation between Passing and turnovers?

OK thats all im going to talk about now because im tired :) DISCUSS PEOPLE!

Last edited by Coach Regan at 4/12/2014 11:15:58 AM

This Post:
00
257837.7 in reply to 257837.6
Date: 4/12/2014 7:08:53 PM
rimmers
III.2
Overall Posts Rated:
463463
Second Team:
Redbacks
its just a shame i cant +1 more than one of your posts per 12hrs. Such a great post to wake up and read!

Ill add that the high stls to OD correlation is exciting to see, since OD is such an important skill to get.

From: Mr J

This Post:
00
257837.8 in reply to 257837.7
Date: 4/12/2014 9:32:36 PM
Overall Posts Rated:
441441
its just a shame i cant +1 more than one of your posts per 12hrs. Such a great post to wake up and read!

Ill add that the high stls to OD correlation is exciting to see, since OD is such an important skill to get.


Could agree more. Great work coach.

This Post:
00
257837.9 in reply to 257837.6
Date: 4/13/2014 12:14:37 AM
Overall Posts Rated:
326326
Looks all pretty interesting. Have you considered running correlations for a few statistic combinations, like:
Assist to turnover ratio (may be a better indication of passing)
FG% and 3P% (May be an indication of shooting - though since you get higher correlations for FGA than FGM, perhaps not).
Percentage of shots attempted which are 3 point attempts (Could be a good indicator of JR).

One other thing you could consider is maybe running the correlations over groups of players, rather than the whole field. For example, there is a correlation between ID and rebounds, which make sense, because you would expect high ID players to be bigs with more of a chance to get rebounds. Maybe this correlation would disappear if you just ran the correlations for PF/Cs only (if this makes any sense!)

This Post:
00
257837.10 in reply to 257837.9
Date: 4/13/2014 5:37:02 AM
Overall Posts Rated:
766766
Yes, no, yes yes, no, sometimes and maybe.

when i ran the 3PT% and FT% in excel, i kept getting divide by 0 errors, so I parked that stat, but yes you are right with regards to these things, it is on my list to do.

Assis to turnover ratio - hadn't thought of that, and its easy to do, so ill add that to my list.

I can run the correlations based on the player position. That's also easy to do. and it would be interesting to see if the players classified as C/PF's have a higher correlation for things like ID and RB.

I can also run the basic stats broken down by position as well, but i don't want to go into too much. I think once the data is released into the free world, people can run their own things. :)


This Post:
00
257837.11 in reply to 257837.10
Date: 4/13/2014 8:40:09 PM
Overall Posts Rated:
1717
+1 Good job mate, taking a keen interest in your findings as I keep a close eye on box scores come draft time. Sorry I couldn't contribute this season but I didn't spend any points... Keep up the good work.