Sunday, February 22, 2015

Guest Post: Modeling Rays' Attendance: A Restricted Model

(This is our first guest post. We would like to welcome Josh Simmons, a Masters Student at the University of South Florida studying Economics. Josh contacted me and offered to do some statistical modeling on Rays attendance. Since I am far from an expert on that level of research, I gladly accepted.)

Do you remember when everyone thought that the (Devil) Rays’ attendance problems would disappear when the team started winning? It’s been shown over and over again that attendance is correlated with winning, so the Rays shouldn’t be any different, right? Well, the team is winning (and has been for most of the last seven years) and the Rays are continually close to the bottom of the league in overall attendance. So, if Tampa Bay residents don’t care about a winning team, what do they care about? What draws them to a game?

I. Literature

There are numerous studies that aim to answer just that question, for both major and minor league baseball. The papers that most closely resemble this one include Siegfried and Eisenberg (1980), Paul, Toma, and Weinbach (2009) and McDonald and Rascher (2000), although there are some obvious differences in the way the studies are constructed. One of the major differences between all of the major journal articles that I’ve come across and this post is that they all focus on an entire league, be it major or minor, while I have chosen to study just the Rays organization. It will be interesting to see if conclusions that are common to each study hold when examining just the Rays. In other words: Does the demand for Rays’ baseball mirror the demand for baseball across the rest of the major league, or is there something unique about the Rays’ organization and/or Tampa Bay?

Within the literature (and this post) you can combine the independent variables used in the statistical models into groups. One group could be temporal variables such as start time of the game, day of the week, month, and year. Next, you could combine demographic variables (which I don’t include in my model because I’m focusing on just one team) controlling for population, income, race, etc. for a certain region surrounding the team(s). Another group would consist of promotions (either differentiated into different categories or aggregated), the opponent, and starting pitcher(s). Lastly, I would group performance variables together, these include win percentage for each game (or for the year, if studying attendance in the aggregate and not on a game-by-game basis), net total runs (to capture the relative excitement of a team’s games), pitcher’s respective records, and whether or not the team is in playoff contention or not (I only include win percentage in this analysis, but will explore others later). This list is not exhaustive and you could, if desired, break certain variables into pieces to study a certain topic of interest.

For the temporal variables, Lemke, Leonard and Tlhokwane (2010) found at the major league level, Monday through Friday games draw fewer fans than Saturday and Sunday afternoon games. Games played in April, May, and September are expected to draw worse than games played in July or August. At the minor league level, Paul, Paul, Toma and Brennan (2007) found that Tuesday, Thursday, and Friday all drew better than the omitted Wednesday variable, while Monday (and curiously enough) Saturday and Sunday didn’t have a significant difference in attendance than, again, a game played on Wednesday.

Lemke, Leonard and Tlhokwane (2010) determined that, again at the major league level, whether or not a promotion or giveaway occurred during a game does have a positive and significant effect on attendance. McDonald and Rascher (2000) and Hill, Madura and Zuber (1982) came to the same conclusion.

As for performance variables, I have included only Win Percentage (WINPCT). Hill, Madura, and Zuber (1982), Rascher (1996), Bruggink and Eaton (1996), McDonald and Rascher (2000), and Coates and Harrison (2005) all found that a higher win percentage led to an increase in attendance at the major league level.

II. The Model

I will balance being thorough in my reporting of the statistical model, while also realizing that some (if not most) might not have much experience interpreting the results. So, if something sounds complicated, just keep reading - I will reword it. On the other hand, if you do have a background in quantitative analysis, please, try to stay patient as I try to make things easy to understand.

(For help interpreting SPSS regression results, I suggest reading this. If you need to brush up on dummy variables, try this.)

I am using OLS as my regression method. Due to the Rays’, let’s say, “less than stellar” attendance over the years sampled, I did not need to worry about using a restricted dependent variable model to account for the capacity constraints of the Trop (the only sell out was typically the Opening Day or playoff games).

During testing, I became worried about serial correlation in the data. After adding a lagged dependent variable of one game as an independent variable, evidence of serial correlation dropped out.

The data set used in this post was (generously) provided by Mike Lortz. He did all of the hard and extremely tedious work of compiling and maintaining data for the years included in this analysis, the 2009-2014 seasons.

In my model, I have included the following independent variables:
  • Win Pct.
  • Promo (Y/N)
  • Opening Day (Y/N)
  • Opponent (Y/N)
  • Day of the Week
  • Month of the Year
  • Year
  • Lag of Previous Games Attendance

All are dummy variables except Win Pct. and the Lag variable. I have omitted the following variables in order to compare the dummy variables:
  • Toronto
  • May
  • Thursday
  • 2013

I could have gone into much deeper detail, for example breaking out the promotions category into its individual parts, but I have decided to start my analysis at as high a vantage point as possible. I plan on expanding upon this model in subsequent posts, exploring the variables more thoroughly and adding new ones to further understand the demand for baseball in Tampa Bay. It should be noted then, that as more independent variables are added to the model, some of the results and conclusions that are found in this post could change. But, I wouldn’t suspect many large changes in the results of this regression – I would guess that some independent variables that are borderline statistically significant could be pushed one way or the other, but variables that are highly significant won’t change.

Okay, that’s a lot of technical mumbo-jumbo, so, congrats if you have made it this far – hopefully it gets a little bit more interesting from here on out.

Below are the results of the regression.





The first thing that should jump out is that the independent variable WINPCT (which, again, is a running win percentage for each season) is not statistically significant (that is, the amount of winning that the Rays do over a season doesn’t affect its attendance, game-to-game), and it’s not even close, really. This is contrary to what most other studies on major league attendance find (Hill, Madura, and Zuber (1982), Rascher (1996), Bruggink and Eaton (1996), McDonald and Rascher (2000), Coates and Harrison (2005)). Our result is supported anecdotally, though. As stated in the introduction, despite consistent winning over the last few years, the Rays have struggled to draw a crowd that matches their winning ways. This is a potentially huge confirmation of our intuition, and I look forward to further confirming this in later posts.

Next, we see that PROMO (a dummy variable that indicates whether there was a promotion at a particular game or not) is statistically significant and holds the expected sign. The results are interpreted as, if a promotion is held, we would expect an increase of about 3,452 people to attend that game. This represents a roughly 17% increase to attendance over the average from 2009-2014. This is generally supported by the literature (Lemke, Leonard and Tlhokwane (2010)). Next we see that OPEN (a dummy variable that indicates whether the game occurred on opening day) is statistically significant and holds the expected sign.

Now we come to the variables controlling for the opponent. The variables controlling for Boston, the Chicago White Sox, Cincinnati, Cleveland, Detroit, the Yankees, San Francisco, and Texas all were found to be statistically significant and have a positive sign. In other words, the attendance at a game featuring one of these teams as the visiting opponent is expected to be greater than a game featuring (the omitted variable) Toronto. Milwaukee, Minnesota, the Mets, Pittsburgh, and St. Louis were found to be significant at the 10% level. I would expect further testing to push their p-values to either a more insignificant level, or in increase those to the 5% threshold with the other teams listed above.

We now come to the temporal variables. The variables controlling for Monday and Tuesday are statistically significant and their coefficients have a negative sign, that is Monday and Tuesday are expected to draw less than a similar game on Thursday (the omitted variable, if you recall). Saturday and Sunday are also statistically significant, but their coefficients have a positive sign, similarly interpreted that we would expect a similar game held on a Saturday or Sunday to draw more than a game on a Thursday. There is no statistical evidence, though, to say that a game played on a Wednesday or Friday draws differently than a Thursday game, all else equal. This is consistent with what Lemke, Leonard and Tlhokwane (2010) found in their study.

Only the variable controlling for whether the game was played in the month of October was found to be statistically significant out of the months. It should be obvious that we would expect October to draw well because that is when playoff games are played. It is a bit surprising though that no other months are expected to differently than May.

Finally, we come to the variables that control for the year in which a game was played. Only 2009 and 2010 are statistically significant from the omitted 2013. The other three years have no statistical evidence to reject the null hypothesis. This means that there was a drop in attendance after the 2010 season that is not related to the Rays’ winning percentage, opponents played, or any other independent variable including in this model. What caused a statistically noticeable drop in attendance after 2010 that has been persistent since? Another question for another post.

III. Conclusion

The regression results for the Rays from 2009-2014 match some of the results, but not all, of the other studies. Fans respond positively to promotions and always come out in mass for opening day. Monday and Tuesday games draw worse than Thursday games while Saturday and Sunday games draw better. There isn’t a noticeable effect from the month in which a games is played except for October (but that’s because playoff games are in October). Most curiously, there is a noticeable drop in attendance from games played in 2009 and 2010 onward.

But the biggest surprise is the deviation from the literature in the effect of winning on attendance. My results show that fans in the Tampa Bay area do not respond to winning on a game-to-game basis. This would seem to indicate that the Rays have maximized the return on winning to attendance. In other words, if the Rays wish to increase their attendance, winning more isn’t an option.

Please email me with questions or comments!

jrsimmon(@)mail.usf.edu

References
  • Bruggink, Thomas H. and James W. Eaton. 1996. “Rebuilding Attendance in Major League Baseball: The Demand for Individual Games.” Baseball Economics: Current Research pp: 9-31
  • Coates, Dennis and Thane Harrison. 2005. “Baseball Strikes and the Demand for Attendance.” Journal of Sports Economics 6: (3) pp. 282-302
  • Hill, James Richard, Jeff Madura and Richard A. Zuber. 1982. “The Short Run Demand for Major League Baseball.” Atlantic Economic Journal 10:2 pp. 31-35
  • McDonald, M., & Rascher, D. 2000. “Does bat day make cents? The effect of promotions on the demand for Major League Baseball.” Journal of Sport Management, 14, pp. 8-27.
  • Paul, R.J., Weinbach, A.P., and Melvin, P. (2004) “The Yankee Effect: The Impact of Interleague Play and the Unbalanced Schedule on Major League Baseball Attendance." New York Economic Review, 35.
  • Paul, Rodney J., Paul, Kristin K., Toma Michael, and Brennan Andrew. 2007. “Attendance in the NY-Penn Baseball League: Effects of Performance, Demographics, and Promotions.” New York Economic Review.
  • Rascher, Daniel. “A Test of the Optimal Positive Production Network Externality in Major League Baseball.” Sports Economics: Current Research: pp. 33-45
  • Robert J. Lemke, Matthew Leonard and Kelebogile Tlhokwane. 2010. “Estimating Attendance at Major League Baseball Games for the 2007 Season.” Journal of Sports Economics 1: pp. 316-348
  • Siegfried, John J. and Jeff D. Eisenberg. 1980. “The Demand for Minor League Baseball.” Atlantic Economic Journal 8: (2) pp: 59-69