I introduced the PPPPM concept a year ago in a blog post Harnessing the engines of finance and commerce for life-extension. I wrote that post just after having attended the 2010 version of the 2010 Bio-IT World Conference & Expo. I speculated that PPPPM will create a basic shift that profoundly affects our health and longevity. In that blog entry I characterized PPPPM this way:
“1. The objective of PPPPM is not so much to cure diseases as it is to detect and predict disease susceptibilities before a disease starts or at very early stages of disease progression and initiate personalized interventions to prevent the progression of the disease before it becomes symptomatic or does damage.
I was excited when writing this a year ago because I saw PPPPM as an important emerging paradigm for medicine. I am even more excited now because a) I see the emerging PPPM paradigm as being much more overarching. It applies to universal health, wellbeing and longevity, not just to the body-repair business that most medicine is engaged in today, b) because the eventual components of PPPPM are becoming better identified, c) because concrete things are happening on a significant scale that are helping PPPPM become reality, and c) because the pace of these developments is accelerating. Things are happening faster than I thought they would.
This year’s Bio-IT World Conference & Expo was a hybrid event of a major professional conference with six simultaneous tracks of presentations, 43 poster presentations together with an industry trade show featuring some 111 exhibitors. The title of the conference suggests that it is about where information technology meets life sciences and biological research and the front of the conference program suggests that it is about “Enabling Technology, Leveraging Data, Transforming Medicine,” but these only begin to suggest what the conference is ultimately about. I believe that taken together the conference is about developments that will lead to an expanded vision of PPPPM.
Here is my take on that expanded version of the PPPPM vision:
2) Suppose that we could multidimensionality structure and organize that information and data in flexible ways into databases that that reflects our evolving best scientific models of health and disease states – such as according to molecular pathways and gene activation networks as well as cell and organ systems impacts. It is important that the data structuring and organization be flexible and allow constant modification as our state of knowledge expands.
3) Simultaneous with this, suppose we develop increasingly sophisticated computer models of healthy body states, aging and disease states – models that relate the mountains of general and individual information to human health outcomes, aging and specific diseases. Identification of disease biomarkers becomes just part of the modeling effort that is required.
4) Then, so the hypothesis goes, we could do amazing things like;
a. generate individualized health and disease susceptibility predictions,
b. identify individualized lifestyle regimens and interventions for optimizing health and assuring longevity,
c. create a situation where health maintenance is the main thing to be concerned with in life and where attention to diseases become the exception rather than the rule,
d. vastly speed up the process of drug discovery and approval,
e. customize medical and drug treatments to the specifics of an individual when a disease does occur, and
f. even allow drugs to be custom-compounded so that a drug is tailored to meet the particular needs of an individual in a particular condition at a particular moment.
Emerging concepts and trends pertaining to PPPPM
The big pharma crunch
Major pharmaceutical companies have been sitting on large cash reserves but are facing into a crunch having to do with declining productivity of traditional approaches to new drug discovery and development. While more and more money is being spent on drug discovery and R&D and more and more drugs are in the pipeline, fewer and fewer new drugs are making it through the development process and the FDA approval pipeline. At the same time, more and more of the traditional money-making blockbuster drugs are going off of patent and becoming low-margin generics. Written back in 2009 in an opinion piece Crunch time for pharma: “To illustrate my concerns, let’s look at the treatment of heart disease. Many important cardiovascular drugs have been invented: statins, ACE inhibitors, beta blockers, fibrinolytics. But in the last 10 years, few of significance have emerged, even though the pharmaceutical industry has spent unprecedented amounts of money on research and development: in each year of that decade, Pfizer spent about $6 billion, Eli Lilly $3bn, and GlaxoSmithKline $2.5bn.” “I believe that there is a real risk that the big pharma industry might collapse.” The 2010 Pharma R&D Annual Review said “Outside of the cancer arena, in a striking and concerning trend for 2009, innovation was conspicuous by its absence…” Acknowledging this situation, there appears to be an increasing openness in the pharma industry for new approaches to drug discovery, sharing of data and collaboration with research groups and other parties, forming and joining health research consortia, and even for sharing of earlier-stage basic research information with competing pharma companies. In other words, big pharma companies are to some extent joining into the overall collaborative game that will create PPPPM.
Diversity of participating parties.
Diversity in the parties working to create PPPPM can be seen in size as well as type. Included are international agencies, large and small pharma and biotech companies, health and research government agencies on the national, state, regional and community levels, world-scale computer, telecom and networking companies, software companies, data storage companies, supercomputer makers, data storage and chip companies, medical schools and schools of public health, university , private and hospital research labs, computational chemistry groups, trade associations, healthcare associations, clinics, ambulatory care centers, physicians’ offices, long-term care facilities, insurance companies, HMOs, PPOs, genome scanning companies, data mining organizations, document management companies, regulatory compliance companies, small specialized companies of many additional kinds, big and small consulting companies and individual consultants and even writers like myself.
Collaborative networks and consortia that are precursors of PPPPM include many different kinds of participants. I do not mean to suggest that there is a giant orgy of all of the kinds of organization I mentioned above working with all of the others. However, many of the relevant emerging networks and consortia are rich in the kinds of organizations that are participating in them An example is the PACeR consortium in New York State that combines hospitals, medical centers, healthcare networks, medical schools, pharma companies and healthcare associations, with a purpose being to significantly accelerate the clinical trials process. I discussed another example of a multi-institutional health network last year in the blog entry The PROOF Centre of Excellence. “The PROOF Centre is a cross-disciplinary engine of devoted partners including industry, academia, health care, government, patients and the public focused on reducing the enormous socioeconomic burdens of heart, lung and kidney failure and on improving health.”
Biological R&D has traditionally taken place in-vitro (in the laboratory) and in-vivo (in living organisms). In-silico narrowly refers to R&D that takes place in computers. For example, a great deal about safety of a proposed new drug can be learned through applying computer models of toxicology and molecular biology to drug molecular structures, thus offering the possibility of significantly abbreviating Phase I clinical trials. In-silico research can be used to correlate clinical outcomes information from numerous databases with analyzed gene and chromosomal information and gene expression data. It can be used to “Examine mutation, copy number, expression and DNA methylation data from a wide variety of projects” and for a wide variety of other research purposes(ref).”
More broadly, much of what will make PPPPM possible will take place in-silico including database creation, collaborative R&D, technical information-sharing among parties, distributed databases, online conferencing, etc. The information storage and processing challenges of PPPM are mind-boggling. Moore’s Law is still in operation after more than 50 years and should be good for at least another 10-15 years. The law describes how computer power at any given price point doubles every two years. Roughly the same is true for data storage capacity. These two underlying factors are fundamental driving forces that are making PPPPM feasible. The task is absolutely daunting and formidable, but the constant increase in silicon power that is being brought to bear on the task is equally formidable.
Next generation sequencing
Next generation sequencing refers to a set of different high-throughput technologies for genome sequencing, recently extending to sequencing of transcriptomes, proteomes and epigenomes. Although the concept of next-gen sequencing technology has been around for some 5 years now, it continues to evolve through new generations This video provides a nice background on sequencing technologies and applications. The Illumina platforms provide a current standard for much current genome scanning. CLC bio offers a turnkey system which sequences 32 full human genomes or 600 full human transcriptomes per week.
Petabyte storage requirements.
The data volumes required to realize PPPM are staggering, unthinkable by traditional standards, today measured in petabytes. “A petabyte (derived from the SI prefix peta- ) is a unit of information equal to one quadrillion (short scale) bytes, or 1000 terabytes.” To hold a petabyte you would need 250,000 4-gigabyte thumb drives. The text in all the 33.32 million books in the library of congress is only a small fraction of a petabyte. Yet a major genome-scanning center can easily generate a petabyte of data every week or so. A typical genome scan may generate 15 terabytes of raw data. However, “in research facilities, raw sequence data is commonly kept for reinterpretation, and often includes redundant sets of data for the same genome (“fold coverage”). This increases the data storage and manipulation hardware needed for the already considerable output of a single sequencing run from the newest machines(ref).” In a few years large data collections relevant to PPPPM will probably be measured in exabytes, where an exabyte is 1,000 petabytes. The very large amounts of data storage required for genomics data has interested companies that specialize in providing superscale storage solutions. For example, a news release a few days ago related “DataDirect Networks (DDN), the world’s largest privately-held information storage company, today announced the Stanford University Center for Genomics and Personalized Medicine selected DDN technology to provide massive scientific workflow scalability to its gene sequencing research.” Big storage companies like EMC Corporation and smaller ones like Teradata participated in the Bio-IT World Expo.
Processing the immense data streams required to realize PPPPM requires application of unprecedented computer power, particularly to crunch genomic and other omic data. Popular approaches to getting the job done include massively parallel processing, networked computing and cloud computing.
Massively parallel processing
The supercomputer manufacturers are ready to step up to the plate. For example SGI had an exhibit at the trade show promoting the use of its massively parallel supercomputers involving very large numbers of processors. “Altix® UV scales to extraordinary levels-up to 256 sockets (2,048 cores, 4096 threads) with architectural support to 262,144 cores (32,768 sockets). Support for up to 16TB of global shared memory in a single system image, enables Altix UV to remain highly efficient at scale for applications ranging from in-memory databases, to a diverse set of data and compute-intensive HPC applications(ref).”
Networked computing, sometimes called distributed computing, involves splitting a major computing tasks over many machines in a network, possibly up to thousands of such machines. Many distributed computing approaches have been applied to biologic data, a commercial example of which is Digipede.
Although still fuzzily defined, cloud computing involves computing where either or both of data resources and processing software may lie out somewhere on “the cloud,” that is on Internet or on a private Internet network. Only a browser or proprietary interface may be required on a user’s own computer. As more and more data is being generated, it becomes less and less feasible to have it all on a user’s own computer or even on the computers in his/her’s own organization. Also, it becomes more economical to have the processing software in a center where a great part of the data is located. Advantages of cloud computing can be not having to invest in your own massive databases and elaborate software, improved all around economics, and facilitating collaboration. The US Office of Management and Budget has mandated that federal agencies go to the cloud first whenever possible as an alternative to installing a new system of their own. A great many biomedical databases and software applications are already available on the cloud and genomic data is increasingly being analyzed on the cloud.
From medical records to life-pattern records.
Medical records have traditionally been gathered as a result of hospital stays or HMO participation and are largely mute with respect to what happens between hospital stays or visits. There is very little information documenting the longer-term health effects of prolonged drug treatments or hospital procedures or medical radiation exposure. A true individual health record would start with a whole-genome and epigenomic characterization at birth, periodic rescanning of the epigenome, and careful documentation of the consequence of every major medical or health intervention and important personal event. There is increasing recognition of this necessity, but implementation remains largely in the future
Multidimensional scalability becomes of increasing importance as PPPPM precursor systems expand in amounts of data, from gigabytes to terabytes to petabytes, in varieties of data, in the number of users and kinds of users. For example, a research consortium might start out with a requirement for linking to the patient record systems in two or three hospitals, even if the patient record systems are not compatible in data organization or scope of content. If the approach is scalable it should allow incorporation of more and more hospitals and additional varieties of patient record systems. True scalability would allow incorporation of thousands of hospitals and clinics with millions of patients. Considerations that affect scalability are discussed below and include data curation, data ontology, and semantic normalization of data elements.
Integrating data resources
There are hundreds if not thousands of specialized electronic genomic and gene association study databases out there. A challenge for the PPPPM vision is identifying how to bring the information in all of these databases together within a single data architecture so comparative analyses can become possible across databases. The challenge is outlined in the 2008 publication Genomic Data Resources: Challenges and Promises. Approaches to this problem were described at the Conference & Expo in a number of presentations by researchers. And several vendors featured software products and interface tools that can be used to tackle facets of the problem. For example, “NextBio Enterprise is a secure web-based solution for integrating corporate and public data from next-gen sequencing and microarray technologies. Our unique “correlation engine” pre-computes billions of significant data connections and enables researchers to intelligently mine this data in real-time. With NextBio Enterprise, corporate experimental data can be easily integrated with public data and explored within relevant biological and clinical contexts(ref).”
A few key concepts related to integrating data resources are data mining, data curation, data ontology, semantic normalization, metadata, desktop virtualization, and computational knowledge engines.
Data mining is “is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. ” Effective mining of biomedical data often requires an ontological approach and use of semantic data normalization. See for example Ontology-assisted database integration to support natural language processing and biomedical data-mining. “Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes and diseases as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme.”
There was much mention of data curation at the Conference. While the idea of curation seems to have been borrowed from the world of museums, it does seem to well-characterize what has to be done if we are going to see the PPPPM vision realized. “Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets Digital curation is the process of establishing and developing long term repositories of digital assets for current and future reference by researchers, scientists, and historians, and scholars generally(ref).”
Ontology is recognized branch of philosophy and a wonderful abstract concept. I found it amazing to hear so much mention of it among both scientists and vendors in booths at a trade show. Ontology (from the Greek á½„Î½, genitive á½„Î½Ï„Î¿Ï‚: “of that which is”, and -Î»Î¿Î³Î¯Î±, -logia: science, study, theory) is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations. Traditionally listed as a part of the major branch of philosophy known as metaphysics, ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences(ref).” The term is an excellent one: what kinds of entities are represented by data elements in an epigenomic database, genomic disease association database or other component of an eventual vast PPPPM system? How do we know those entities really exist? What is their nature? How do they relate?
The idea of ontology-based data management for biomedical research databases goes back a number of years. For background, you could have a look at the 2007 publication Ontology based data management systems for post-genomic clinical trials within a European Grid Infrastructure for Cancer Research and the 2010 publication The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.
Semantic normalization and metadata
Closely relate to the data ontology issue is that of semantic data normalization, that is, establishing equivalencies of meaning of data elements in different databases so meaningful analyses can be conducted across them. Again closely associated is the concept of using metadata to establish such equivalencies, metadata being data about data. From the previous citation: “Data management in post-genomic clinical trials is the process of collecting and validating clinical and genomic data with the goal to answer research questions and to preserve it for future scientific investigation. Comprehensive metadata describing the semantics of the data are needed to leverage it for further research like cross-trial analysis. Current clinical trial management systems mostly lack sufficient metadata and are not semantically interoperable. This paper outlines our approach to develop an application that allows trial chairmen to design their trial and especially the required data management system with comprehensive metadata according to their needs, integrating a clinical trial ontology into the design process.”
Desktop virtualization is an approach used when effective analysis requires much more to be displayed on your computer “desktop” screen than can fit at any one time. In its simplest form it is a way of rapidly shifting between views of information or data or data models. Desktop virtualization is of increasing importance in health applications and is often combined with cloud computing. And it is another component of the IT systems that are predecessors of PPPPM. For example, an announcement last week was Dell Launches Meditech Desktop Virtualization. “The mobile clinical computing system uses VMware technology to allow desktop healthcare applications to be accessed in the cloud and delivered as a managed service. — Dell has unveiled a mobile clinical computing (MCC) program that will enable healthcare organizations using the Meditech Health Care Information System (HCIS) to more easily and securely retrieve clinical information on their virtual desktops, as well as reduce desktop management, support, and deployment costs. — Announced Monday, the Meditech MCC program relies on a virtual desktop infrastructure (VDI) that is cost efficient, improves data management, and provides flexible deployment of virtual desktops in hospitals and their extended communities. — The program uses VMware vSphere, a cloud operating system, and VMware View, which allows desktop applications to be accessed in the cloud and delivered as a managed service. Additionally the solution includes Imprivata’s OneSign application and Forward Advantage’s API integration with the Meditech HCIS for using advanced authentication devices for e-signature actions. –In providing a technology solution for Meditech users, which include more than 2,300 hospitals, ambulatory care centers, physicians’ offices, and long-term care facilities, the MCC program is Dell’s latest attempt to explore new ways to offer technology to the healthcare. sector.” Desktop virtualization is a particularly important issue for mobile applications where interfaces are displayed on small smartphone or tablet screens.
Movement from text searching to use of “computational knowledge engines”
Google and free text searching has done a wonderful job of bringing the world’s literature to every desktop and soon to every smartphone and other mobile device. A new paradigm of knowledge retrieval (in contrast to information retrieval) may be emerging as exemplified by the WolframAlpha service now on the web. WolframAlpha links a retrieval interface to the Woldfram Mathematica automated mathematical resources to provide answers to quantitative queries. There is an attempt to decode the meaning of a query and arrive at an answer through analysis. Right now WolframAlpha’s capabilities are severely limited. It will, however, answer some questions like “How many base pairs in the human genome?” and “Number of deaths from malignant neoplasms?” and “Average age of Parkinson’s Disease patients?” Try it! As time goes on we can probably expect to see more and more sophisticated “knowledge engines” as applied to biomedical data. Some commercial and academic engines of this kind exist today.
Linguamatics’ text mining software provides an example. “Linguamatics I2E text mining software “provides that missing link – presenting the user with a text mining solution which enables very specific searches to be created on the fly (e.g. MAPK interacts with which targets?) and tabulated results to be returned very quickly. As well as generating data for analysis, this functionality enables I2E to be used in real time data-gathering tasks such as ontology creation which are becoming increasingly important in this sector.”
Everything I have discussed here is furthering the materialization of the expanded vision of PPPPM. I believe that we are in the initial stages of creating a tectonic shift that profoundly enhances our health and longevity. That shift probably won’t be called PPPPM, but whatever it is called it will propel us beyond the current stage where health maintenance is equated with taking drugs or medical treatment. The shift will be made feasible by advances in information technology and come about through massive collaboration among multiple sectors of our society.
Note that I have no formal relationship with Cambridge Healthtech Institute, the organization that sponsors the Bio-IT World Conference and expo. I have no relationship with any vendor or service-provider organization mentioned here and I receive no form of compensation for any such mention.