Flu prediction for EW11

Flu prediction isn’t getting easier as the season (fails to) wind down. The incidence profiles (or %ILI technically) – for at least some of the regions – are still showing relatively complex variations.

Here we’ll focus on our two principal models, one incorporating both specific humidity and schools and the second, based on our recent PLoS – Computational Biology paper, which allows R(t) to have two values, with a change in R(t) fit by the model. (Additionally, we ran a relatively large ensemble of other model types; such as ones incorporating priors, but we’ll discuss those results another time).

Generally, both models do reasonably well in fitting to the data up to the present point in time. Typically, when there are strong changes in specific humidity or school holidays that occur at a crucial point in the profile, the SH+school model performs well, and better than the paper 2 model. For these reasons, if we believe that the historical variations in specific humidity are likely to hold out for the remainder of this season, these regions can also be predicted well forward in time. In most cases, these profiles suggest that either the season has peaked or will peak during the next week or two.

The paper 2 model on the other hand, is not constrained in the same way. The model is allowed to place variations in R(t) at points where large changes in incidence occur. But, this can mean that future extrapolation (“prediction”) may not be well constrained. If we consider region 2, for example, the paper 2 model results show that incidence will continue to climb through week 26, which is very unlikely (although if it does, we will certainly learn something from that). On the other hand, in some cases, such as region 7, the paper 2 model does a markedly better job at capturing the variability in %ILI up through the present time. Thus, if we use region 7’s paper 2 results, we will forecast a continued increase for the next 4 weeks (the four-week CDC prediction interval is marked by the grey shaded box in each panel). Both because the AICc scores are significantly lower for the paper 2 model, and because it seems to have captured the strong increase over the last six weeks, this is the model that will be used to make our four-week prediction for region 7. We are aware, however, that the most recent data point, while falling close to the paper 2 curve, may be significantly revised next week; but in which direction, we do not know.

We are keeping records of all predictions as well as the data provided by the CDC, which is adjusted from week to week, so that we can perform a retrospective analyses of our effort after the season is over. By mining the model predictions this way, we should be able to getter gauge our confidence in each model’s predictions, and how that confidence evolves through the course of the season.

Flu prediction for EW10

We’ve been quietly making predictions each week and submitting to the CDC’s Flu challenge. Here’s a brief update of some of the lessons learned.

First, the data are continually updated (at least the previous week, and occasionally  further back). These can be modest changes or significant, but they can’t be anticipated (both the direction of the change and magnitude vary even in the same region). This should be factored into the predictions, either in the form of weighting the most recent data less, or performing sensitivity studies with some representative confidence intervals.

Second, the profiles this year are generally considerably more complicated than previous recent years. This may be due, in part, to El Niño. If so, our reliance on historical values for specific humidity cannot accurately account for humidity variations.

Third, incorporation of priors into the forecast appears to produce more accurate forecasts, although not in all cases. This changes as we move through the season, so that initially, the prior forecasts are consistently better, but, by the end of the season, they will have little value, and solutions not using priors will perform better. (This week, three of the regions performed better without priors).

Fourth, we also incorporated a timing delay to account for the fact that there is a delay between the time that an individual becomes infectious and presents themselves to a clinic. The shape of the prediction was not measurably altered, but the likelihood was significantly improved, due to the shifting of the profile. A more robust approach would be to include an “Exposed” category to the compartmental model, which is currently being done.

Fifth, we found that school holidays is often an important driver of transmission, although because of the limited number of parameters allowed by the model, it can sometimes result in strong future predictions that may not occur. In particular, to account for the decrease over the Christmas period, the model introduces a significant decrease in R0. This same magnitude is then applied in late March during the spring break resulting in a dramatic decrease in %ILI (see Regions 4 and 7 in the accompanying Figure). Although this may yet occur, it is quite dramatic, and is more likely the result of the model over-fitting smaller fluctuations during the early rise portion of the season.

For almost all regions, we predict that the peak has already occurred. For any remaining ones, it will likely occur this week or next.