By Vince Giuliano and Steve Buss
Intro by Vince
With the COVID-19 pandemic, the human species faces the greatest crisis it has faced in 80 years. Since my own possible encounter with COVID-19 as outlined in the previous blog entry, I have been puzzled with how to orient my thinking, writing or research so as to make the largest difference I could. Is there any way I could arise to the challenge of the situation? The more I learned about the virus and its spread, the more I appeared impotent to do anything significant about it. Then about 12 days ago, Steve Buss called me. There emerged the bright spark of a new and different societal and scientific approach. We have interacted furiously since, and I now join Steve in believing that a possible breakthrough avenue exists, the Third Way that we outline here. We think it is different than almost everything we have seen yet in public discourse. Because vast numbers of people are getting sick and facing death every day, Steve and I want to get our ideas out to others ASAP. So I am publishing updated versions to my Blog day-by-day, at lease for a while. Our first version was published 4-28-2020. This updated version includes a short section linking to introductory videos on Machine Learning and Deep Learning. The new section is for readers unfamiliar with these artificial intelligence techniques – which in turn are key to the Third Way we are proposing. And we intend to expand on and further our ideas in a series of additional blog entries very soon to follow.
THE CURRENT SITUATION
Our public information channels are consistently telling us that the COVID-19 pandemic situation is dire, that it will likely get worse, and that it not likely to go away for years. As of April 28 the official confirmed number of people in the United States killed by COVID-19 has reached about 60,000, and more than 210,000 people have died from the coronavirus globally, and case totals have reached over 3.0 million. Rest assured that by the time you read this these numbers will have worsened considerably. This blog entry is concerned with what we could do about it beyond the options being hotly contested in public.
Society desperately needs a better alternative than the two main currently-advocated approaches to the Covid-19 Pandemic. Both these approaches have compelling reasons both for and against them.
The two approaches are basically 1. Open society and the economy up again now and go back to normal, get people back out there working and playing again so the economy can recover, We can close society back down again in the future if we need to; and 2: keep the society mainly shut down and slowly let it open up smartly region-by region and function-by function, and perhaps individual by individual. Use massive testing to assess safety at each point and contact tracing to identify infectious hot spots.
The first “start re-opening our society now” approach is advocated by a number of southern and rural-state Governers: to encourage businesses to resume operations, shops to re-open and healthy people to go out and start participating again like they used to. Some of our States never shut down, others are being encouraged by their leaders to re-open; people should be free to go out and participate, whether at work, at church, in a bar or restaurant or being in the grandstand at a demolition derby. They say failing to do so puts the country at risk of: businesses failing, tens of millions unemployed, vulnerable marginal subpopulations suffering incredibly, society collapsing, and a worldwide depression that could last a half a century. And the worst that could happen is that a 1% to 5% of the population would get sick enough to die while the population in general slowly gains herd immunity. We think they may be correct about all this. Most medical people, scientists and liberals say NO, NOT YET. Up to 30 million Americans could get so sick as to require hospitalization. Our health care system would collapse under this weight and perhaps 10 million or more Americans would die as a result. We think they are probably right too.
A CNN screenshot on April 28.
The second approach, “be smart about re-opening society carefully” is advocated by most health professionals. It involves using massive testing to assess safety at each point and contact tracing to identify infectious hot spots. May be that parts of the country don’t fully open up till mid or late summer or even winter. Proponents say this is necessary to minimize deaths and keep the health care system operational. We think they are right. Opponents say the economic and personal damage could be worse than damage from the virus, and institutions and business and families could be destroyed. Further, not opening up now is likely to result in subsequent even-worse waves of the pandemic. We think they might be right too. And “massive testing” – it does not exist for the US and won’t for some time. There ain’t no such thing as a standardized and definitive testfor the virus There are multiple kinds and brands of PCR tests for presence of the virus used in the US. And the tests we have been using are crummy, giving high percentages of false positives and false negatives. The tests are so bad that many emergency room doctors view them as only secondary evidence when deciding whether a very sick patient actually has COVID-19 or not. As Michael Osterholm and Mark Olshaker write in a new Op-Ed in the New York Times: “Far too few tests are available in the United States. Some are shoddy. Even the ones that are precise aren’t designed to produce the kind of definitive yes-no results that people expect. “– “Governments throughout the world and the research, medical-supply and clinical-lab industries must unite to vastly increase global production of reagents and sampling equipment. Achieving this will take months and require building new capacity, presumably with public subsidies. The time and costs involved will be considerable, but such an effort is the only way to test large populations for this infection (and for others in the future).”
Both approaches have the same fundamental problem. The virus reappears in multiple contexts. It is out in our free society and the genie can’t be stuffed back in the bottle. We won’t have a reliable way of knowing who has it for some time. We could open the society up some and then shut it down again, so the pandemic is experienced in waves. It doesn’t matter. “flattening the curve” of infections means postponing cases of it, not averting them. The virus is clever and unstoppable, and is likely to lie in wait until it affects everybody. There is no normal to go back to. And by itself government can’t solve the problem. A governor can force a major shopping mall or racetrack or meat processing plant to re-open and tell people it’s safe to go and hangout or work there. But shops in the mall or the racetrack can’t stay in business unless a lot of people decide to go there, which is unlikely. Same for big sporting events, scientific meetings, restaurants, salons, religious services and bowling and knitting clubs. And meat-plant and prison guard workers cab decide its just too dangerous to go to work.
The 1918 viral pandemic Image source
We think the Third Way outlined here may well be the best way to go from here. We need a must faster way of implementing the Immunity Passports idea, not requiring a vaccine. It requires us to be we are willing to think outside of the box a bit and use new technical and knowledge resources in new ways. You never thought of Machine Learning in connection with health? Well, please fasten your seat belt and read on
THE THIRD WAY PROPOSAL
We desperately need a way that gets the society and economy running at full speed again as soon as possible. And at the same time we must act so that those most vulnerable face minimal risk of serious sicknesses and death from COVID-19 infection.
We think a key part of the solution is to find a way for individuals to know their personal vulnerability if they are infected by the virus. If people are to start again hanging out with relative strangers, it is important that they are confident about what their actual safety from COVID-19 harm status is. Most in our society are unlikely to accept a government-authority proclamation that contradicts their personal experience. A significant percentage of us now know of a relative, colleague or friend who has been or is now struggling for life in an ICU with a COVID-19 infection. Or who is now hospitalized and might soon be put on a ventilator. Or who is already dead. And we follow the local news. Therefore we are unlikely to accept a government spokesperson assuring us now that “All is now clear; it is now safe for you to go back out and work and play.”
We believe a COVID-19 Hazard Score (CoHS) can probably be calculated for individuals (for everyone) with a high degree of accuracy which would indicate the probability of getting seriously sick or dying upon coronavirus infection. That is the novelty-essence of our proposal
A CoHS would range from 1 to 100. And if yours was 0.2 or less, you would know with high certainty that if you contracted the virus, you would have less than one chance in 500 of getting seriously sick or dying. From a personal safety point of view, you could go back out into active participation at work, church, meetings, restaurants, bowling, etc. with that confidence. On the other hand, if you had a score of 30, you would know that you had a 30% chance of getting seriously sick or dying if you contracted the virus, so you would probably want to keep yourself safe closeted up at home. And if government authorities and businesspersons have the same data, they will know who can safely be encouraged to go back to work and participate with others to re-boot the economy.
If we had a good CoHS system, existing data suggests that something greater than 95% of the population would be found invulnerable to serious sickness from COVID-19, and could go back out and re-open the society. Medical and social efforts would focus then on the 5% or so of the population tested to be probably vulnerable. Instead of sequestering everybody, only these vulnerable people in the society would have to be sequestered to protect them, standing the present approach where everybody is sequestered on its head.
Note that our focus in CoHS is on safety from serious illness or death, not on immunity which is extremely difficult to measure reliably and which might come and go.
The stakes for an approach like this to be realized are enormous – hundreds of thousands or millions of deaths averted, tens of millions of jobs recovered, trillions of dollars in economic benefits, years cut off the time for world recovery from the pandemic, and probably avoiding a worldwide depression or worse.
We can imagine a CoHS for individuals to evolve to yield summarized information that would look something like the data in the table below in some future phase. (Data shown is invented, not to be taken for real.)
This kind of CoHS system does not yet exist, but we think:
- It’s inevitable that something like it will eventually be created,
- We know how to go about creating it, and
- it can be created relatively quickly. (See the brief discussion below)
We think the suggested CoHS system can be based on analytical studies and machine learning applied to existing actual-data sets. We further think the CoHS system could be made “hard” and statistically reliable by incorporating actual experience data for tens and hundreds of thousands of individuals.
We think the approach will work because we already know that there is great variability of consequences for most people who contract the COVID-10 virus. Several studies have indicted that around 95% of infected people don’t get seriously sick and that around 50% don’t have symptoms or know they are infected. It is a highly personal difference that we think can be much more precisely nailed down by studying personal health-characteristics of infected people. That is how we will get to the CoHS scoring system. Look at the real data.
As a prime source, data bases worldwide specifically related to COVID-19. We suspect hundreds of such collections are being developed worldwide now. Many of these will result in publications.
- A good example is : a Chinese study published on March 26, 2020, of 323 COVID-19 patients, entitled Risk Factors Associated with Clinical Outcomes in 323 COVID-19 Patients in Wuhan, China
- Data typically gathered for COVID-19 patients on hospital admission and in the course of tretment. Many such data collections exist but names and patient-identifying data must be stripped out for them to be used
Also, among the kinds of data we are thinking about are ones routinely included in individual clinical records, such as
- Disease history data (like kidney, heart, pneumonia, hypertension, obesity, etc)
- Demographic data (like age, gender, and race)
- Biological markers in the blood circulation data (IL-6, TNF, IL-1B, etc.)
- Individual blood type – known to be influential for assessing susceptibility to Noroviruses
- Individual inflammatory indices – such as CRP and levels of some cytokines like IL-6 and IL-1beta
- Markers of obesity
- Blood profiles on file including lipids
- History of other chronic viral infections, HPV, HIV, EBV, CMV, etc.
- History of certain diseases, such as pneumonia or kidney diseases
- Smoking history
- Plasma antibody COVID-19 test results if available
- Previous COVID-19 test results if available
- Allergy history
- Measures of frailty, such as loss of body mass in old age
There is 24-7 data individuals gather using smart-watch and smart ring wearables such as body temperature, resting heart rate, and HRV.
Our approach to data heterogeneity is to start out being very be inclusive, that is, to start out including a number of data types as exemplified above that we think could possibly be relevant TO A CoHS. Then narrow the list down to data types established to be relevant based on data analysis, And finally to compute CoHS scores based those data types for extremely large numbers of people. Major issues are identifying relevant data bases, respecting personal confidentiality and data integration from different studies.
We will go into these further in blog posts to follow. Among the Data Science Methodologies we see as potentially applicable are
- Multi-variable regression analysis
- Principle-axis data analysis
- Data clustering techniques
- In particular, many new approaches to machine learning
Ready-to-use software is available for each of these techniques and they are known well to those who practise them..
We think there is strong epidemiological evidence to support the third-way approach outlined here
A significant example is the Chinese study cited above Risk Factors Associated with Clinical Outcomes in 323 COVID-19 Patients in Wuhan, China. Many biological variables were measured and documented in the study; some we had never heard of but are now intrigued by. It is a study we are extremely interested in, and we would like to include it in our Phase 1 analysis. We think you will find it important too once you see a couple of their study images.
Background – With evidence of sustained transmission in more than 190 countries, coronavirus disease 2019 (COVID-19) has been declared a global pandemic. As such, data are urgently needed about risk factors associated with clinical outcomes.
Methods – A retrospective chart review of 323 hospitalized patients with COVID-19 in Wuhan was conducted. Patients were classified into three disease severity groups (non-severe, severe, and critical), based on their initial clinical presentation. Clinical outcomes were designated as favorable and unfavorable, based on disease progression and response to treatments. Logistic regression models were performed to identify factors associated with clinical outcomes, and logrank test was conducted for the association with clinical progression.
Results – Current standard treatments did not show significant improvement on patient outcomes in the study. By univariate logistic regression model, 27 risk factors were significantly associated with clinical outcomes. Further, multivariate regression indicated that age over 65 years, smoking, critical disease status, diabetes, high hypersensitive troponin I (>0.04 pg/mL), leukocytosis (>10 x 109/L) and neutrophilia (>75 x 109/L) predicted unfavorable clinical outcomes. By contrast, the use of hypnotics was significantly associated with favorable outcomes. Survival analysis also confirmed that patients receiving hypnotics had significantly better survival. (Steve and Vince to comment on this point in a later blog entry)
Conclusions – To our knowledge, this is the first indication that hypnotics could be an effective ancillary treatment for COVID-19. We also found that novel risk factors, such as higher hypersensitive troponin I, predicted poor clinical outcomes. Overall, our study provides useful data to guide early clinical decision making to reduce mortality and improve clinical outcomes of COVID-19.
Most people will have relatively low vulnerability scores
We think the majority will score so they will be able to get back into work and social participation: Studies of people who test positive for the virus indicate that…
- A large portion of them – 50% is a usual estimate – have no symptoms and don’t know they are sick.
- 10% to 30% will feel sick but not sick enough to seek medical attention.
- 5% to 20% will get sick and seek medical attention.
- A tiny percentage – estimates range between 0.5% and 3% — will get so ill that they die despite medical attention, depending on the population and the study.
We think the higher percentages mentioned for A & B and the lower percentages mentioned for C & D are probably the more accurate because of existing shortcomings of testing scope and bias in testing studies. Most people tested so far were tested because they were already sick to start with. This suggests that we start out knowing that the average vulnerability score for everybody is not above the very low digits. Hopefully, machine learning applied to large data collections will indicate far more-accurate low numbers for most individuals.
DEVELOPING THE COVID-19 HAZARD SCORING APPLICATION, Phase 1
Developing a high-accuracy scoring system is important because people need to have confidence in it to bet their lives on it. Several Phases will be required to develop the processes and software required for the vision we have expressed above.
Several key process steps are implicated in the Phase 1 application, and these steps are summarized in the text and image below.
- Identify, contact, and gather anonymous COVID-19 patient data from scientific teams and health institutions.
- Transform and integrate that anonymous patient data into the CoHS application database. Rationalizing inputs from multiple sources which use different criteria for the same measurement is likely to be the biggest challenge associated with the project.
- Leverage com’s Machine/Deep Learning Competitions feature to get many smart and skilled data scientists to participate. Support them while they compete to create the best Machine/Deep Learning Agents to analyze the CoHS application data and provide COVID-19 CoHS Risk Evaluations by significant scientific study variables. (Follow this link for info about Kaggle and this link about Kaggle Machine Learning Competitions.)
- In Phase 1, we would write a report that summarizes the results of the Kaggle.com COVID-19 CoHS application competition.
- Our hope is that future studies will leverage what we learn as they create new COVID-19-related studies.
We estimate that it will take at least two months to deliver the Results and Interpretation Report from the date we have enough scientific study data sets to begin. So if we have data by the end of May, we might be finished with Phase 1 by August 1st.
Who does all this? As it forms, a tentative name for the group could be The Covid-19 Data Science and Machine Learning Consortium. Steve and Vince would want to call ourselves IN this group
ON LEARNING SCIENCE AND DEEP LEARNING
The Third Way approach we are suggesting depends in tools and approaches well developed and commonly practiced in the fields of Learning Science, Machine Learning and what is known as Deep Learning. A computer can analyze a large database of factual data not only to test hypotheses about variables relate, but also to formulate new hypotheses. And can assign statistical significance to its findings. If these kinds of techniques are unfamiliar to you, you can get the gist of them in only minutes by viewing a few YouTube videos. We suggest:
- A 5 minute overview
- Try these two short videos for a hands-on tutorial
- Interpretable Machine Learning
- https://www.youtube.com/watch?v=BmVUb72PluE (9 minutes)
Thes links will lead to dozens of other links showing example applications, providing usable tools and tutorials, and access to learning science community resources.
The 2020 COVID-19 virus has unprecedented capabilities to bring humanity to its knees. But 2020 humanity has tools of unprecedented power to use against the virus. Are we willing to put them to work?
SIMILAR PROPOSALS THAT HAVE BEEN MADE
A vaccine and Immunity Passports?
There is the idea of developing a vaccine against the virus, which would protect people and of issuing covid-19 “Immunity Passports” to people who have antibodies to the virus and are safe to send out back into the society. Terrific idea in that it has less-vulnerable people moving back into re-opening the society. But as it has stood so far it can’t have a big impact for years because it requires a good test for such immunity and a plurality of people to have such immunity which in turn requires availability of a reliable vaccine. At least a year and a half wait we are told. And perhaps never. We have been trying to get an effective vaccine for the HIV virus for three decades now, and success at that still eludes us. We need a must faster ay of implementing the Immunity Passports idea, not requiring a vaccine. We think the Third Way as we outline here might be that.
We think our suggested machine learning approach is a good way to start now. We could have our first rough Hazard Score system up and running in a couple months and this could reveal very large numbers of safe individuals for going back out and rebooting V2 of US society. And the system could be continuously improved after that. And modifications of the Third Way could be used to protect us from pandemics yet to come
Dr. David Katz has proposed an approach very aligned to what we are proposing in terms of opening up the society. We like what he has said and written about the pandemic. He rightly points out “If all we do is flatten the curve, you don’t prevent deaths, you just change the dates.” We arrived at the ideas presented here independently. The further contributions of our proposal is that we are suggesting 1. the use of massive application of machine learning approaches to existing and emerging databases to significantly speed up the process of deciding who is safe to go back 0ut in society, 2. Computation of Hazard Scores which indicate probabilities of becoming seriously sick if people get the virus, 3. Development of a personal app for cell phones and computers which tell individuals their hazard scores and possibly consul them of how to reduce their hazard scores and 4. use of aggregated Hazard Score data to drive public policy as well as inform businesses and private institutions as to how and when it is safe to re-open.
We would like to join with Dr. Katz and with the Immunity Passport advocates for moving forward at this point. We need build on each other’s contributions if we are going to effectively take on the challenge of this and future viral pandemics.
More detail and discussion is yet to come in future blog entries as well as updates of this one.