Predicting the spread of pandemics in urban environments

Agent-based simulation models (where people are agents) emerge as a viable tool to address a complex problem.

By Dionne M. Aleman

pandemic simulation

Predicting the spread of disease is complicated by the unpredictable behavior of humans.

In the 2011 film “Contagion” (Warner Bros. Pictures), a new and unknown disease spreads like wildfire, starting in Hong Kong and eventually decimating the world’s population before a vaccine can be developed and disseminated. The film focuses on the U.S. Centers for Disease Control and Prevention’s (CDC) efforts to track individuals exposed to the disease and to determine the best mitigation strategies to implement to prevent further disease spread. “Contagion” has received praise from the CDC for being much more realistic than similarly-themed movies, but like most Hollywood tales, Contagion takes some liberties, notably with the CDC’s ease and accuracy in predicting disease spread.

Intuitively, predicting the spread of a disease is the first, and arguably most important, step in designing effective and efficient mitigation strategies. For many years, the most common form of disease spread model was a homogeneous mixing model, which is based on the premise that all members of a population can be treated identically, and everyone is equally likely to come into contact with anyone else. Each infected person in the population will infect the same number of people; this number is called the basic reproduction number (R ), and it is a simple function of a person’s number of contacts, probability of disease transmission per contact and duration of contagious period. See Table 1 for recent pandemics and their reproduction numbers.

Pandemic Year Worldwide deaths R
Smallpox 20th century 300-500 million 5-7
Spanish Flu 1918 50-100 million 2-3
Asian Flu 1954-58 1-4 million 1.8
Swine Flu (H1N1) 2009 14,000 1.4-1.75

Table 1: Summary of recent pandemics

In the “Contagion” film, scientists at one point observed that the disease mutated, and then calculated an R increase from 2 to 4. Unfortunately for the CDC, R as of now can only be calculated retrospectively; further, R is known to be different in rural vs. urban settings, where individuals’ behaviors and contacts are different, so even if R was known in densely populated Hong Kong, that may not be useful for Minneapolis. Although public health agencies have largely moved away from homogeneous mixing models due to questions of accuracy regarding the abstracting away of the important uniqueness of individuals [3, 4], many of the non-homogeneous mixing models consider only a small number of unique population subgroups, and continue to rely on R to determine disease transmission between groups.

Another concern of traditional non-homogeneous models is that while closed-form calculations of some metrics (e.g., number of infections at a particular day) are available, they are based on complex differential equations, and usually cannot address intervention measures set out by public agencies. For example, if a vaccine is not available until three months into the outbreak, and 10 percent of the population can be vaccinated every two weeks, how quickly will the pandemic die out? What if certain schools were closed at certain points in the pandemic? Models based on differential equations cannot answer these questions, but simulation-based approaches can.

Agent-based simulation models (where people are agents) have recently become popular tools to address population heterogeneities in predicting pandemic disease spread. Simulations are especially adept at handling the uncertainties inherently present in modeling disease transmission, individual behaviors and public health interventions. For instance, a person recommended to receive a vaccination may choose not to receive the vaccine; a vaccine may only be 95 percent effective; one person may spend all day with an infected family member and not get sick, while another person becomes ill after only a short exposure; one person may choose to stay home when sick while another continues to use public transportation and go to work, while yet another seeks medical attention.

Although agent-based simulations have reputations for being slow and computer-resource intensive, advances in computing technology have greatly increased the speed of simulations and the size of populations that can be modeled. Recent models built in utilitarian C++ and run on distributed computing systems have been able to tackle disease spread simulations involving millions of individuals [1, 5].

The EpiSimS model [5] is perhaps the most well-known of agent-based simulations for disease spread and is capable of simulating millions of individuals. EpiSimS is distinguished by the exceptional level of detailed individual behavior incorporated into population movements within the model, as well as 14 stages of disease progression in each infected person. As with all simulations, however, additional details come at a cost of increased computation time: EpiSimS can only simulate about 1,000 days overnight on a large computing cluster, translating to about 19 hours for one processor to simulate one day (assuming an 8-hour overnight computation window).

In contrast, a higher-level agent-based simulation model, called morPOP (Medical Operations Research Pandemic Outbreak Planner) [1], has been used by the Ontario Agency of Health Protection and Promotion (OAHPP) to understand pandemic influenza disease spread. The morPOP model is similar to EpiSimS in concept and implementation, but takes a more macroscopic view of individuals’ movements throughout metropolitan center. For example, EpiSimS models individuals’ patronage to specific businesses identified by Dun & Bradstreet, while morPOP more simply assumes a number of random community contacts for each individual each day; EpiSimS has 14 states of health, while morPOP only has four. The difference in detail level is in part due to the level of population and commercial information available in the targeted regions (Southern California for EpiSimS, and the Greater Toronto Area, Ontario, Canada for morPOP).

Due to the simplifications of population movement, one processor in the morPOP model can simulate one day of the 5-million person Greater Toronto Area in seconds, while also accounting for use of public transportation and healthcare facilities (not considered in EpiSimS). The speed of the morPOP model allows for the easy exploration of many “what-if” scenarios, and will therefore be the focus of this article.

An Agent-based Simulation Model

According to the traditional epidemiological SIR model (susceptible, infectious, removed), each individual is classified as being either susceptible (that is, not infected), infectious or removed/recovered. In the morPOP model, each individual can transition to another state in each time period; a fourth state, death, is also incorporated. This can be thought of as a unique Markov chain (Figure 1) for each individual. Each susceptible individual has a unique possibility of transitioning to an infected state or staying susceptible. Similarly, each infected individual has the probability of transitioning to a removed/recovered state or of staying infected; these probabilities are determined by the individual’s age, health and vaccination status. Alternatively, in the case of a rapidly mutating virus, individuals could be allowed to transition back to susceptible after recovering from infection. Infected individuals who expire are removed from the simulation since their state is fixed and they no longer have contact with other members of the population.

Pandemic Fig 1

Figure 1: An individual’s state of susceptible, infected, recovered/removed, or dead can be thought of as a Markov chain.

Like a Markov chain, in each unit of time, an individual has a certain probability of transitioning from one state to another. In real life, these probabilities are determined by a number of factors: age, vaccination status, contact with infected individuals, the rate of disease transmission per unit time of contact, etc. These probabilities also change on a daily basis as the number of infected individuals with whom a person comes into contact fluctuates. In the morPOP model, transitions occur at the start of each day. In contrast, the EpiSimS model has hourly transitions.

In order to account for these changing and inter-related probabilities, the agent-based simulation model individually monitors the interactions of each member of the population. Each individual is an object in the simulation with various characteristics including age, vaccination status, home location, work location and household membership. Other behavioral characteristics include public transportation usage, likelihood of seeking medical attention (at either a hospital, family doctor, or flu center) and likelihood of staying home when sick. Once infected, each individual will be contagious for a randomly generated number of days, which is calculated as a function of age. Similarly, each infected person has a probability of dying once infected, as determined by age. The removed and death states are absorbing, so the only unknown transition probability is how a person moves from the susceptible state to the infected state.

A susceptible person’s probability of becoming infected is based on that person’s interactions with infected members of the populations. Interactions occur through direct contact, that is, being in the same place at the same time (e.g., being in the house together for 60 minutes), or through indirect contact, that is, being in the same place but at a different time (e.g., picking up bacteria left on a handle on the subway). The time and type of contact between two individuals can vary to represent different behaviors, for example, parent-to-child, colleague-to-colleague and nurse-to-patient. For every contact with an infected individual, the time of contact and disease transmission rate per unit time for that type of contact is incorporated into the individual’s overall probability of transitioning to the infected state at the start of the next day.

One of the main advantages of a simulation model is the ability to change parameters as time passes. In particular, modeling individuals’ travel patterns is crucial to mimicking real-world disease transmission possibilities. In the morPOP model, once a person contracts the disease, he or she may choose to stay home in an attempt to rest and recover, seek medical treatment at a hospital or family doctor or continue about his/her usual daily activities. Each of these possibilities is assigned a probability, and the resulting actions of the individual may result in increased exposure for other members of the population.

For instance, say a newly infected person decides to visit the hospital for treatment. Other individuals in the emergency department waiting room, as well as doctors and nurses, will have contact (direct or indirect) with the person, and therefore will have an increased chance of becoming infected. After the first day in the hospital, the model can assume that the patient will be properly isolated and therefore not infect anyone else. While in the hospital, the patient is unable to transmit disease to family members, subway riders or anyone else. By counting the number of doctors and nurses who become infected, absenteeism rates can be anticipated and appropriate additional medical help can be planned in advance.

Alternatively, say the public health agency plans to deliver vaccines at designated flu centers to priority groups (e.g., children under the age of 5 and seniors over the age of 65). Based on anticipated compliance rates and anticipated length of time to deliver all the prioritized vaccines, the model can determine how many people in each prioritization group will choose to be vaccinated each day. Those who get vaccinated will travel to a flu center, where they may come into contact with infected individuals. They may become infected from that contact, or the vaccine may be successful with a pre-determined success rate. If the vaccine is successful, the person will have zero probability of becoming infected. If the vaccine is not successful, the person’s probability of transitioning to the infected state will be determined as usual.

In this way, agent-based models can transparently control any and all individual behaviors, and by extension, disease transmission probabilities. By running numerous scenarios and evaluating the differences in infections and deaths, the model can help public health officials assess which mitigation strategies are likely to be most effective.

Answering Public Health Questions

The morPOP model has been used by the OAHPP to specifically understand how changes in social distancing behavior (that is, staying home when sick) and vaccine uptake rates (that is, how quickly the population gets vaccinated) affect the spread of pandemic influenza. Because the Canadian-centered morPOP model accounts for healthcare usage, which is not considered in other disease spread simulations (generally American-centered), the effect of adding hospitals, family doctors and potential flu centers was also examined [1, 2].

Although social distancing is not a behavior that can be directly influenced by public health officials, governments can spend money on advertising campaigns encouraging the population to stay home when sick. The expectation is that more money spent on a campaign will result in higher compliance. Therefore, each social distancing scenario is considered a specific mitigation strategy. The percent of the population engaging in social distancing was tested from 0 percent to 100 percent in increments of 10 percent. As expected, increased social distancing resulted in fewer infections and fewer deaths. Generally, a 20 percent difference in strategies was statistically significant at the 90 percent confidence level, indicating to public health officials that extra funds spent to gain less than 20 percent change in population behavior may not be cost effective. Interestingly, social distancing rates from 0 percent to 80 percent all showed similar incremental improvements, but a huge improvement occurred at 90 percent, followed by an obvious nearly instant remission of the disease at a 100-percent social distancing rate.

The question of the effect of vaccine uptake rates also showed that faster vaccination dissemination is statistically better than slower vaccination dissemination. The two specific scenarios tested in the morPOP model both considered 60 percent of the population receiving vaccinations without prioritization, but in one scenario, 15 percent of the population was vaccinated every two weeks, and in the other scenario, 10 percent of the population was vaccinated every two weeks. The 15-percent vaccination rate scenario resulted in about 20 percent fewer infections and deaths. This result should encourage public health officials to emphasize the importance of early vaccinations to the population, and to ensure the availability of vaccines as soon as possible.

Finally, the morPOP model tested healthcare scenarios that (1) had no healthcare, (2) had hospitals and family doctors, and (3) had hospitals, family doctors and government-run flu centers created to address pandemic-specific illnesses. Flu centers did in fact reduce the number of infections and deaths in over 96 percent of trials, but interestingly, were not statistically significantly different from no flu centers due to the fact that in a small number of trials, the disease died out almost instantly without the extra contact created by flu centers. Both scenarios with healthcare facilities resulted in an order of magnitude more infections and deaths due to the increased environments where disease could spread. However, at the time the healthcare scenarios were examined, workplaces were not implemented in morPOP, and so the new healthcare environment for an individual did not come with a balancing removal of a workplace environment.

Although the additional exposure in healthcare scenarios indicated to public policy officials that disease spread models neglecting healthcare usage will likely underestimate disease spread in Canadian cities (as Canada has very low barriers to obtaining medical care, and so many infected individuals will present at hospitals and other facilities), the lack of realism in workplaces made this result not directly useful from a public policy perspective. However, the result is useful as a reminder of the mechanical functions of a simulation model, and the importance of addressing all of the high-level activities of the population.

Reality Check

As with all simulations, it is important to remember that disease spread models are not crystal balls, in large part due to the uncertainty of the parameter data. The purpose of these models is to help policy-makers prepare for outbreaks by providing evidence-based assessments of the effects of interventions in real time, in isolation or in combination to lend plausibility to mitigation strategies. Like most predictions of future pandemic disease spread, simulation models cannot be reliably validated (validating a hypothetical 2012 outbreak to the 1918 Spanish Flu would ignore important differences in population movements and medical care), and therefore should be used as “what-if” machines to make comparisons among different scenarios. These scenarios can include various disease parameters and interventions to assist scientists in forecasting the relative effects of recommendations and interventions, enabling further planning even during a pandemic event.

Moreover, despite the clear benefits of using simulations based on physical virus shedding rather than the more abstract reproduction number, it is important to keep the reality of a pandemic in mind. One clear shortcoming of both the EpiSimS and morPOP models is that rates of transmission per unit time, as well as rates of virus shedding per unit time, are not readily available for most diseases and would be difficult to quantify quickly in an actual outbreak. However, models relying on R suffer from the same shortcoming because R is also not readily available during an outbreak. Even if it is assumed that R is an acceptable one-size-fits-all measurement of disease spread, that number is very difficult to calculate as demonstrated by the fact that R for the 2009 H1N1 pandemic required more than two months to determine. Whether or not virus shedding or transmission rates could be determined in less time is unknown, though approximations could be made by estimating a new disease to be a certain factor as virulent as a known disease.

What to Expect for the Next Outbreak

Should an outbreak occur in the near future, expect public health agencies to continue relying on R -based models. Models based on physical transmission suffer from a lack of comprehensive data that calls into question the accuracy of predictions, yet models based on R suffer from reliance on an abstract and often unreliable estimation of disease spread. In short, there is no “right” solution to disease spread modeling yet. But, as data collection becomes increasingly automated (e.g., use of cell phone tracking to understand movements), the scales will tip in favor of physical transmission models that can capture unique population behaviors.

Dionne M. Aleman ([email protected]) is an assistant professor in the Department of Mechanical and Industrial Engineering, University of Toronto, with a cross appointment with the university’s Institute of Health Policy, Management & Evaluation. She is also a faculty affiliate of the Centre for Research in Healthcare Engineering (CRHE), and director of the Medical Operations Research Laboratory (morLAB) within CRHE, a group of academicians at the University of Toronto dedicated to improving the quality of medical procedures using operations research techniques.

References

  1. D.M. Aleman, T.G. Wibisono, and B. Schwartz, 2011, “A non-homogenous agent-based simulation approach to modeling the spread of disease in a pandemic outbreak,” Interfaces (Special Issue on Humanitarian Applications: Doing Good with O.R.), 41(3).
  2. N.E. Lizon, D.M. Aleman, and B. Schwartz, 2010, “Incorporating healthcare systems in pandemic models,” Proceedings of the Winter Simulation Annual Conference, December 2010.
  3. L.A. Meyers, B. Pourbohloul, M.E.J. Newman, D.M. Skowronski, and R.C. Brunham, 2005, “Network theory and SARS: Predicting outbreak diversity,” Journal of Theoretical Biology, 232:71–81.
  4. M.E.J. Newman, 2002, “Spread of epidemic disease on networks,” Physical Review E, 66(1):016128.
  5. P. Stroud, S. Del Valle, S. Sydoriak, J. Riese, and S. Mniszewski, 2007, “Spatial dynamics of pandemic influenza in a massive artificial society,” Journal of Artificial Societies and Social Simulation, 10(4):9.