Some Details on How MIT Evaluated the Proposed Assignment Plans
By now you may have read my posts about the MIT report evaluating the three proposed student assignment plans for Boston. I know that people are often interested in how the numbers are calculated, so here's my attempt to clarify the data and methods used as much as possible. Note that I am using information from the various MIT reports, some discussions I've had about this with Peng, and my own intuition to describe this process. It's possible I have some things wrong.
First, it's important to understand exactly what information the researchers looked at. They looked at Round 1 school choice data for K1 and K2 for the school years starting in 2011 and 2012. They started by looking at choices in 2011 only and trying to use various variables to predict choices. One obvious variable to use is distance. They tried both the distance and square root of distance as factors. They found that square root of distance was much more strongly correlated to school preference than distance alone. The reason is that differences among relatively close schools (say 0.5 miles vs. 1 mile) are likely more significant to families than similar differences between farther schools (say 4.5 vs. 5 miles). They also checked to see if families seemed to prefer schools with a higher percentage of students whose race was the same as their child. They also looked at whether families with different socio-economic statuses (measured by access to free or reduced price lunch programs) chose differently.
They did find that families tend to choose based on racial make-up of school. There was also a correlation based on socio-economic level, but it was fairly week so they decided not to use it. The other variables they added to their model were the schools themselves. Different schools are selected more or less independent of these other variables. This is really a better measure of popularity than we've had in the past because it accounts for distance and population density. In other words, the model won't give a school credit for being more desirable just because a lot of children live near the school and choose it because it's nearby. This has been a problem with BPS's measures of popularity.
Peng took all this information and used it to create the demand model which uses all the variables above plus some randomness to predict each child's ranked list of schools. At this point, they had to decide how many choices to list for each family. While families tend to pick around 5 schools, they decided to have each family select 10. One reason for this is that most kids that didn't get an assignment to one of their listed schools ended up attending BPS. That means that in reality, these families had more than 5 schools they would send their child to, they just hadn't listed them initially. Also, they wanted to have an acceptability cut-off – the point after which they assume that the family will find a school completely unacceptable. This is used in some of the measurements to avoid saying a family got access to a quality school if they would not find the choice acceptable. This cut-off is somewhat arbitrary, but seems like a fairly reasonable decision.
The next step was to apply the demand model developed with 2011 data to the 2012 registration period. If the model is sound, Peng should be able to predict families' 2012 choices reasonably well in the aggregate. Note that the chance of predicting an individual family's choices are fairly low and this is not what the demand model is expected to do. Instead Peng looked at each school to see whether the model came close to predicting demand for that school. Further, he looked at how well it predicted demand by race and socio-economic status. The model performed well for the overall student population and for all sub-groups except for those with fairly small numbers of students in the data (e. g., students whose race is listed as "other").
In order to evaluate the proposed plans, MIT created a new version of the demand data using the same variables but with 2012 choice data in order to use the most recent data available. They then ran simulated lotteries for the current plan and each of the proposed plans. One key difference is that they used the current method of applying walk zone priority for the current plan, but the new "compromise" method for the new plans. They did this because BPS has proposed using the "compromise" method going forward. It's clear that walk zone priority can decrease equity, but it does not appear that a 50% priority using the compromise method makes that much difference. If the priority were increased to more than 50% of seats, it would start to significantly decrease the equity of any of the plans. Unfortunately, there doesn't seem to be any analysis in the report of what effect eliminating walk zone priority would have.
When you look at the various comparisons in the reports, most of them are for K2 only. If you're interested in see data for K1, take a look at the graphical appendix which has charts comparing the plans for K1 and quite a bit of other information.
Comments
Apply the demand model to different school configurations
Other than the reasonable point that there hasn't been time - I'm wondering why they wouldn't run a simulation using the current configuration of schools, with the various capacity expansions currently in place mentioned in the recent BPS memo (the neighborhood by neighborhood discussion).
I realize there's an argument there are some technical arguments against doing that, but in the big picture it would seem the benefits far outweigh the issues.
What's the deal with that graphical appendix link? It chokes every computer I try to load it on.