AUTHORS: Steven C. Quay, MD, PhD (1) and Martin L. Lee, PhD (2)
ABSTRACT. The earliest death in the United States currently documented to be related to SARS-CoV-2 was a 57-year old woman in California on February 6, 2020. She had no travel history suggesting a source of infection. As in the US, until recently the COVID-19 epidemic in France was thought to have begun no earlier than late January 2020. However, the recent finding of a French citizen with no travel history who presented to the emergency room on 27 Dec 2019 with documented SARS-CoV-2-related pneumonia brings into question whether the virus was spreading earlier in the United States. As it is expected that a novel virus will initially be detected as excess deaths attributed to pneumonia or influenza (P&I), not otherwise specified, until such time as a diagnostic test is available, we looked for the earliest signal of such “excess P&I deaths” in the United States. A metric was developed to test for early outbreaks and is the actual number of deaths from P&I/100,000 people divided by the mean of the CDC defined epidemic threshold and the expected P&I incidence. For the entirety of calendar 2019 the statistics of this metric were 0.976 ± 0.043 (mean + SD). An analysis of CDC deaths recorded as pneumonia and/or influenza in the United States for the four weeks of January indicated there was a significant increase to a Z-score based on this metric of 2.8 to 1.8 for weeks 1 to 4 respectively. These Z-scores had probabilities associated with them of 0.002 to 0.03, respectively. The Z-score peaked in week 1, 2020 and declined over the month which is consistent with a “burst” of infection during the last two weeks of December, a major holiday travel time, followed by only localized spread. A total of 983 nationwide and 442 (204-680; 95% CI) in California excess P&I deaths were identified by this statistic as potentially COVID-19 related. Distinguishing clinical characteristics of COVID-19 not shared with influenza are changes in taste (ageusia) and/or smell (anosmia). An analysis of an internet search engine showed no searches in the State of California for these terms in December, a spike in the weeks of January 5 and 12, and then nothing until March, when searches for these terms again became common. This is strong evidence for community spread in CA of SARS-CoV-2 during late December with both symptomatic patients and excess P&I deaths appearing after an incubation period in early January.
In conclusion, this hypothesis-generating report provides population-based internet search behavior data as well as a statistical approach to P&I death patterns which together strongly suggest a late December seeding of SARS-CoV-2 in at least the State of California. An examination of retained blood samples from early January deaths in California attributed to P&I should find that SARS-CoV-2 was present in the US during the weeks following the typical year-end international travel season and significantly sooner than previously thought.
This study has significant policy ramifications for the containment of novel infectious diseases, including frontier/border control, and especially for international air travel.
INTRODUCTION. As of 26 May 2020 a total of 5.7 million people have been infected with the SARS-CoV-2 virus since the first documented case on 1 Dec 2019 in Wuhan, China. The death toll stands at over 350,000 worldwide. In the United States there have been 1,723,888 cases with 100,497 deaths.
The current earliest case in the United States occurred on 19 Jan 2020 in a man returning to Seattle, WA from a family visit to Wuhan, China. The current earliest known death in the United States occurred on 6 Feb 2020 in a 57-year old woman in Santa Clara County, CA who had no travel history outside of the US in the immediate period before her death. Her family documented that she experienced an influenza-like illness in early January and so most likely was infected in the community as early as late December 2019.
There are other anecdotal data of COVID-19 in patients outside of China in December 2019, as shown in this Text-Table below.
The earliest date a detailed paper (3) on county level emergence data, titled, “Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters,” was able to find statistical evidence of COVID-19 in the US was February 29, 2020. The two counties that emerged in a western US cluster in this analysis were King County, WA where the first US case has been documented in the above Table and Santa Clara County, CA, where the earliest death has been documented as in the above Table.
Given this evidence of international circulation of SARS-CoV-2 in December 2019 and cluster analysis pointing to a west coast initiation, we sought to determine if there was evidence of COVID-19 symptomatic patients and/or mortality in aggregated data of pneumonia and/or influenza deaths during January 2020 in the United States.
MATERIALS AND METHODS. All raw data was taken from public sources and used directly. The primary source of data is the Centers for Disease Control’s (CDC) website on weekly surveillance of pneumonia and influenza: https://www.cdc.gov/flu/weekly/index.htm.
After the first report (4) of anosmia as a likely symptom of COVID-19 on March 20, 2020 and the previously documented utility of using Google Trends (GT) keyword analysis for epidemiology, specifically, COVID-19 country-level outbreak identification, (5) we used a GT keyword analysis at the state level. We established the statistical relationship of GT keyword searches to country-level COVID-19 outbreaks.
National Center for Health Statistics (NCHS) mortality surveillance data. NCHS collects death certificate data from state vital statistics offices for all deaths occurring in the United States. Pneumonia and influenza (P&I) deaths are identified based on ICD-10 multiple cause of death codes. NCHS surveillance data are aggregated by the week of death occurrence. To allow for collection of enough data to produce a stable P&I percentage, NCHS surveillance data are released one week after the week of death. The NCHS surveillance data are used to calculate the percent of all deaths occurring in a given week that had pneumonia and/or influenza listed as a cause of death. The P&I percentage for earlier weeks are continually revised and may increase or decrease as new and updated death certificate data are received from the states by NCHS. The P&I percentage is compared to a seasonal baseline of P&I deaths that is calculated using a periodic regression model incorporating a robust regression procedure applied to data from the previous five years. An increase of 1.645 standard deviations above the seasonal baseline of P&I deaths is considered the “epidemic threshold,” i.e., the point at which the observed proportion of deaths attributed to pneumonia or influenza is significantly higher than would be expected at that time of the year in the absence of substantial influenza-related mortality.
Statistical Methods. To evaluate the weekly results from the CDC mortality data, we developed a simple metric to test for early outbreaks which is just the actual number of deaths from P&I/100,000 people divided by the mean of the CDC defined epidemic threshold and the expected baseline P&I incidence. This statistic was calculated for the 2019 calendar year and the mean and standard deviation determined. From this, we computed the Z-score for the first 5 weeks of 2020 ([actual value of metric – mean]/standard deviation). Assuming a Gaussian distribution for the distribution of the metric, which an evaluation of the 2019 data supported, the probability of the extremeness of the Z-score could be calculated.
RESULTS AND DISCUSSION. The initial observation the authors made was by visual inspection of the small “excess” deaths attributed to pneumonia and/or influenza (P&I) during January 2020 in this Figure from the CDC’s website.
This Figure can be read as the baseline weekly deaths from P&I (light black) as a percent of all deaths, epidemic threshold weekly deaths (heavy black), and actual deaths (red, beginning with the 2015-2016 influenza season. The summer and winter episodic changes are noted.
URL for above Figure: https://gis.cdc.gov/grasp/fluview/mortality.html
The small excursion noted for P&I deaths in January 2020 is the subject of this analysis.
CDC criteria for epidemic was met in January 2020 and the first week of February. The following Text-Table was obtained from the CDC website. For the first five weeks of January the actual percentage of deaths due to pneumonia and influenza was above the predefined threshold for declaring an epidemic. The Table also contains the actual deaths from P&I. The final column is the “excess P&I deaths,” that is, the number of pneumonia and/or influenza deaths above the epidemic threshold level.
Some of these 983 deaths are hypothesized to be attributed to COVID-19.
The Z-statistic defined herein shows low probability results for P&I deaths compared to the 2019 norm for January but not for the first week of February. The Z-statistic, essentially an outlier test, was calculated for each week of January and into February. The following Text-Table contains the results as well as the calculated probability for each week.
As can be seen, there was a highly significant increase in cases for the weeks of January. However, the first week of February, unlike for the previous CDC’s criteria of an epidemic discussed above, was not significantly out of the norm by this metric.
The chronology of the 2017 Influenza epidemic is different from 2019-2020. The influenza epidemic of 2017-2018 was the last major epidemic before COVID-19 and so its pattern of development is worth comparing to this years for similarities and differences. The following Figure shows this analysis. As can be seen in the below Figure, the 2017-8 influenza epidemic (brown line) was characterized as first appearing the third week of December and continuing above epidemic levels thereafter. For the 2019-2020 season (blue line) the activity in December was actually below average. It was only beginning the first week of January that epidemic levels of deaths were occurring.
The 2019-2020 pattern of infection is different from the 2017 documented influenza season.
International travel patterns suggest the State of California should be examined for early COVID-19 patients. Based on estimates of inbound China to US travel in January 2020, (6) the State of California received approximately 40% of all such travel. Data for December 2019 was not available but was assumed to be similar, leading to a suggestion to examine data from California for early symptomatic patients.
GT keyword search in California for loss of taste or smell. It is well established that one of the distinguishing features of SARS-CoV-2 are changes in smell or taste which are not generally seen with influenza. A previous study examined the use of GT keyword search trends for loss of smell and/or taste at the country-wide level and determined that in the US, the earliest trend for GT searches was 18 Mar 2020. Because the data for excess P&I deaths showed state-level differences this GT keyword search approach was performed in CA State. The Figure below contains the data by week in California for searches of ‘why can’t I smell or taste’ or ‘can’t taste or smell’ combined.
As can be seen, there is a spike in Google searches the first two weeks of January related to this key finding in COVID-19 in the State of California. No such spike was seen for the states of NY, CT, TX, WA, or IL, all states with either substantial travel from China or large populations of recent Chinese immigrants. (7)
Using the term ‘can’t smell’ and a five-year time search in California, beginning in 31 May 2015, the peak week for the keyword search was 29 Dec 2019.
Excess P&I deaths in California during January 2020. To identify the “excess P&I deaths” in the state of CA in January the actual number of P&I deaths was recorded and compared to the P&I deaths that would have occurred if the frequency was at, but not above, the CDC determined threshold rate for identifying an epidemic. The following Text-Table contains this data for the first five weeks of 2020.
These data suggest that approximately 442 deaths in the State of California during January 2020 could be potentially attributed to COVID-19 (approximately 95% confidence interval of 204 to 680 based on usual Gaussian distribution theory). This is about 45% of the total excess P&I deaths for the entire US during January and is similar to the estimated 40% of in-bound flights from China landing first in California.
CONCLUSION. Using GT data it appears symptomatic patients were identified in the State of CA as early as the last week of December 2019 and the excess P&I deaths in January for CA suggest that there may have been over 440 COVID-19 deaths during the first two weeks of 2020.
IMPLICATIONS. With the advent of widespread international travel the rapid dissemination of novel infectious diseases is a likely expectation. In fact, one of us (S.C.Q.) has written a blog (8) in which he created a parlor game to demonstrate this phenomena. The game’s purpose is to demonstrate that there appear to be no two cities in the world that are more than 72 hours apart, using air travel.
In the case reported here with COVID-19, the time from the first documented case in Wuhan, China, 01 Dec 2019 to symptomatic patients in California, the week of 29 Dec 2019, and to excess P&I deaths in California, the week of 07 January 2020, is literally a matter of weeks. The average flight time for Hainan Airlines HU 451 from Wuhan to Los Angeles is 13 hours 18 minutes and there are five flights per week. (9)
The authors offer no simple solution to this new, unprecedented public health challenge.
LIMITATIONS OF THE STUDY. This study tested the hypothesis that international travel from China to the United States during the typical year-end holiday season, approximately December 15, 2019 to January 2, 2020, could have started the COVID-19 breakout in the US in late December to early January.
While the study makes observations related to symptoms typically seen in COVID-19 patients and identifies excess P&I deaths, it does not contain any patient blood or tissue testing for SARS-CoV-2 that would be necessary to affirmatively define the disease presence at these early times. The study also did not examine death records for documentation of laboratory-based influenza or other pneumonia diagnoses, which should be done in a more detailed study. It is hoped that this report will stimulate a retrospective search in retained samples from clinics or retained tissue from autopsies to provide the definitive answer to the early spread of SARS-CoV-2.
FINANCIAL DISCLOSURE. The authors received no third party funding for this study. SCQ is an employee of Atossa Therapeutics and receives cash and equity compensation. M.L.L. is a consultant to Atossa Therapeutics and receives cash compensation for his work. Atossa Therapeutics (NASDAQ: ATOS) is developing pharmaceuticals to treat COVID-19 and breast cancer.
(2) UCLA Fielding School of Public Health, UCLA, Los Angeles, CA; ORCID: 0000-0002-8421-7295
(4) Loss of sense of smell as marker of COVID-19 infection [press release],
March 20, 2020 loss of smell in COVID-19