Fallacy of epidemiological modelling

by AM Zakir Hussain | Published: 00:00, May 23,2020


A laboratory assistant manipulates samples at a COVID-19 screening centre of Saint Andre Hospital in Bordeaux, France on May 20. — Agence France-Presse/Georges Gobet

MODELLING in Bangladesh, recently talked about publicly, took into account factors such as shutdown, physical distancing, awareness, susceptibility, infectiousness, recovery, etc. Optimistic and worst-case scenarios gave figures of 40,000–100,000 incidence cases respectively. The optimum-case scenario gave a mortality of 800 to 1000 – a 2 per cent case fatality rate.


A layman’s introduction to modelling techniques

MODELS use spatio-temporal and other related factors and algorithms. Disease structures and local intervention policies are also included in the modelling. The flexibility in modelling allows the inclusion of the latest developments to predict disease transmission at any given point of time. The number of new diseases, times of disease arrival and the travelling of infection carriers are taken into consideration. Bayesian computation is applied to estimate the posterior distribution of basic parameters of the model. Mitigation policies and timings are also entered into the model. To calculate the number of deaths, the models use estimates of disease severity. These models suffer from uncertainties because of changes in the number and type of parameters and also the lack of correct information to feed into the model.

The stochastic epidemic computational model is used when cases are fewer. It estimates probability distributions of outcomes, such as recovery, death, etc, when inputs show a random variation over time in data for a selected period using standard time-series techniques. A time series operates in a continuous state, ie non-static space and discrete time set and relates to moving averages, which are statistical processes. As most biological process are intrinsically stochastic (random), so stochastic models are usually more realistic than deterministic models. Properties unique to the stochastic models are: the probability of disease extinction, the probability of disease outbreak, quasi-stationary probability distribution, final-size distribution and the expected duration of an epidemic.

The deterministic model seems to have been used for prediction of COVID-19 related estimates globally. In deterministic SIS, susceptible infected and susceptible, and SIR, susceptible infected recovered, epidemic models, individuals in the population are classified according to disease status, either susceptible, infectious, or immune — recovered or removed-from spreading the disease. The variables considered for deterministic modelling are the total population size at time t; the number of exposed individuals at time t; the number of susceptible individuals at time t; the number of infected individuals at time t; the number of recovered (or removed) individuals at time t; the contact (or transmission) rate; the recovery rate; the average latent period (infection to appearance of clinical features); the crude death rate; the basic reproduction number; the equilibrium point of infection; the equilibrium value of S, I, R, E (awareness); the disease-free equilibrium; the endemic equilibrium; the Eigenvalue — an a lgebraic or geometric expression that measures linear transformation.

These models assume that all individuals are equally susceptible to the disease and that a complete immunity is obtained after recovery from infection. Moreover, they also assume that the duration of the disease is the same as the duration of infection with constant transmission and recovery rates, which are not exactly what happens though.

Unlike the susceptible-infected-recovered models, the susceptible-exposed-infected model assumes that a susceptible individual first undergoes an incubation period before becoming infectious. Other variables entered in the model are control measures, eg isolation, quarantine, vaccine, etc; time-delay (effect of the previous state on the current state of the disease dynamics, eg number infected previously); host-age structure; genetic variation of susceptible individuals, variation in infectiousness, disease spread; and migration.

Other variations of the deterministic models are also in use, eg SEIR (susceptible, exposed, infected and removed-recovered). SIRD is another extension where the parameters are susceptible, infected, recovered and dead.


Accuracy is epidemiological model

IN ONE meta-analysis, published in the British Medical Journal, 2,696 titles were screened, and 27 studies describing 31 prediction models were examined. Three models were identified for predicting hospital admission from pneumonia and other events in the general population; 18 diagnostic models for detecting COVID-19 infection; and 10 prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay. All studies were rated at a high risk of bias, mostly because of non-representative selection of control patients, the exclusion of patients who had not experienced the event of interest by the end of the study and high risk of model over-fitting. Reporting quality varied substantially between studies. The reviewers noted that most reports did not calibrate predictions.

Among the six studies that developed prognostic models to predict mortality risk in people with confirmed or suspected COVID-19 infection, the percentage of death varied between 8 and 59. This wide variation was partly due to severe sampling bias caused by studies excluding participants who neither recovered nor died yet. Additionally, there might be local and temporal variation in how people were diagnosed as having COVID-19 or were admitted to hospital. Of all prognostic models, two predicted mortality. When applied to new patients, their model yielded probabilities of mortality that were too high for low risk patients and too low for high risk patients.

Based on the official counts for confirmed cases in China, a simulation model suggested the cumulative number of infected to reach 180,000 (with a lower bound of 45,000) by February 29 while the number of confirmed cases was 84,388 keeping to the last report. Assumption was made by researchers that the actual cumulative numbers of infected and recovered cases in the population were most likely much higher than the reported ones. Thus, they took 20 times the confirmed number of infected and forty times the confirmed number of recovered cases. Accordingly they found a case fatality ratio of ~0.15 per cent in the total population. Based on this scenario, they suggested a slowdown of the outbreak in Hubei at the end of February.

Models in the United States predicted that the country is nearing, or has reached, the peak in daily deaths. In early April, Washington, DC, mayor Muriel Bowser said that modelling projects a surge in DC area hospitals during the summer. ‘Like all models, we hope this one will be proved wrong,’ she said. The prediction of the Institute for Health Metrics and Evaluation model in the United States was found to vary, when examined state-wise ‘between 43 and 73 per cent of the time (depending on the date the model was assessed).’

A total of 89,795 COVID-19 deaths (ranging from 63,719 to 127,002) were projected on April 8 in the United States, through May 18. By April 29, the United States had 67,067 deaths. The total cases of death predicted were 451,174 (ranging from 274,989 to 770,979), with an attack rate of 49.2 per cent (ranging 36.4–65.1 per cent), if no mitigation measures are taken. The number of deaths would be 71,015 (ranging from 49,090 to 99,201), with an attack rate of 4.6 per cent (ranging 3.3 per cent to 6.2 per cent) when mitigation measures are taken or adopted.

These projections were made based on the Global Epidemic and Mobility Model, an individual-based, stochastic, and spatial epidemic model. It uses the number of newly generated infections, times of disease arrival in different regions, and the number of travelling infection carriers and the timeline of mitigation interventions. Approximate Bayesian computation was used to estimate the posterior distribution of the basic parameters of the model, for onward projection. To calculate the number of deaths, the model used estimates of COVID-19 severity from available data.

The Centre for Disease Control and Prevention model predicted a total death of 200,000 in the United States from COVID-19, as the best-case scenario, while an Imperial College London model predicts about 2.2 million US deaths if nobody changes behaviour.

An overall symptomatic case fatality risk (the probability of dying after developing symptoms) from COVID-19 in Wuhan was found to be 1.4 per cent (0.9–2.1 per cent), while the actual confirmed case fatality risk was 4.5 per cent based on February 29 data.


Source of inaccuracy of model

AVAILABLE input data vary by accuracy. Even in the United States, doctors are reportedly under-reporting. Who is tested also varies haphazardly. In some situation, anyone asking for test can get tested; in other situation only distinctly identified suspected patients are tested. This is what is happening in Bangladesh too. These variations do not give the actual attack rate when the denominator varies so much or when it is incomplete. In the Diamond Princess, the luxury boat, the case fatality rate was 2.3 per cent but when even asymptomatic cases were put in the denominator, it was 1.2 per cent, as half of the people on board were asymptomatic. In Iceland, a company, deCODE Genetics, by March 29 screened 8,694 asymptomatic people and found 71 infected with COVID-19 among them. These infected cases are not usually entered into the models. The outcome of COVID-19 also depends on the capacity of hospitals, which may vary widely, which also is not entered into the models.

The other factors that need to be included in the model are the rate of reinfection, those who are at higher risk and their size, super-spreader by type and size, survival period of the pathogen in the atmosphere and on the fomite under different atmospheric conditions or in different weathers, transmission rate variations under different conditions, severity of the diseases (bed-ridden/ hospitalised, which limits mobility and transmission), comorbidities and types and life style (using handkerchief, hand towel, sanitizer, hand washing, etc) or health behaviour (smoking, drinking, obesity), occupation of the people and their working or resting locations, duration of infection/disease and outcome (immune or dead), and epidemic doubling time, test procedure used, its validity, etc.

Definition used may result in different rates. The process of sampling and the validity of data and sample collection or their variability by data or sample collectors and sample testers also are important for the accuracy of data, which may vary from location to location. There is hardly any model which takes all these into account.


AM Zakir Hussain is a former director, Primary Health Care and Disease Control, former director of IEDCR, DGHS, former regional adviser of SEARO, WHO and former staff consultant, Asian Development Bank, Bangladesh.

More about:

Want stories like this in your inbox?

Sign up to exclusive daily email