View Single Post
Old 04-13-2008, 02:45 AM   #1 (permalink)
Dr. Giganto
Formerly "Tom Dogg"
Dr. Giganto's Avatar
 
Status: Offline
Join Date: Feb 2003
Location: New York City
Posts: 11,458
vBookie Cash: 100
Rep Power: 42 Dr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of FamerDr. Giganto is a Hall Of Famer


Fun with baseball stats

Being the nerd that I am, I get a kick out of playing with Microsoft Excel, so I decided to put some baseball stats into a spreadsheet, and look at some correlations.

Basically, I considered team stats and how they are correlated with the number of runs a team scores.

For those who dont really know what correlation is, or how it's measured, here's a quick tutorial.

Correlation is the measure of the linear relationship between two variables/groups of data. Basically, if you put all the data points into graph form and put a line through it, how close to the line would those points be?

Correlation can range in value from -1 to 1. A correlation of 1 means that each time you increase one of the variables, the other variable will increase by the same amount. Thus, all the data points would lie exactly on a straight line, with an upward slope.

A correlation of -1 means that each increase in one variable will be followed by a decrease in the other variable, and each decrease is the same magnitude.

A correlation of 0 means that there is no linear relationship whatsoever between the two variables. If you looked at a graph of the data points, they would be scattered all over the page.

So, the closer you are to 1, the more highly correlated two sets of data are. The closer you are to 0, the less correlated. The closer you are to -1, the more the two data sets are negatively correlated.

So, first I considered the correlation between batting average and runs scored for all AL teams for the last 4 seasons. The correlation was .704, which is quite strong. That means that a team with a higher batting average is pretty likely to score more runs.

The next stat I looked at was OBP. The correlation was .844. Thus,
an increase in OBP was more likely to lead to an increase in runs scored.

Next was slugging percentage: .879. This was the best so far, until I considered OPS. The correlation between OPS and runs scored was .948!!! That's almost PERFECT correlation. That means that an increase in OPS is almost guaranteed to have a proportionate increase in runs scored.

Next, I looked at some "smallball" strategies, namely stolen bases, and sacrifice bunts.

The correlation between stolen bases and runs scored was -.145, which suggests that the more you steal, the less runs you're expected to score. However, this number is quite close to zero, and a test of the "significance" of this result suggests that the correlation is not materially different than zero. Thus, there is virtually no effect on a team's runs scored by stolen bases.

Sacrfice bunts had a correlation of -.310, thus the more you sacrifice, the less runs you're likely to score!!!

Last, I considered strikeouts. The correlation was .115, which is slightly positive, but was probably not materially different from zero. Thus, a team that strikes out a lot is not really expected to score many more or many less runs. Strikeouts have virtually no effect on runs scored.

  Reply With Quote