Human Genome Project

2013, interview with Jill Trewhella

“An Insider’s Account of the Human Genome Project

…”The project was stimulated, in part, by large-scale initiatives to win the second world war. Its initial proponents felt themselves to be the scientific descendants of the Manhattan Project – a research and development project that produced the first atomic bombs.

As with the Manhattan Project, Los Alamos Laboratory in New Mexico – constructed for the development of the atomic bomb – played a strong role in development of the Human Genome Project. Having been recruited to Los Alamos in 1984, and ultimately leading its bioscience division from 1999 to 2004, I was witness to this exciting time in the history of biological research.

Early days

The HGP was officially founded in 1990 by the US Department of Energy’s Office of Health and Environmental Research – led for a period in the 1980s by the Boston University scientist Charles DeLisi – and was the culmination of many years of work and debate.

DeLisi had come to the Department of Energy from the National Institutes of Health (NIH) and had the view that understanding human susceptibility to environmental energy emissions could benefit from knowledge of the genome and genetic mutations linked to such susceptibility.

DeLisi asked the then-head of life science at Los Alamos, Mark Bitensky (who gave me my job at the laboratory), to convene one of the landmark early workshops, in 1986, leading up to the project.

Following the workshop, the Office of Health and Environmental Research’s advisory committee recommended a major project to map the entire genome and identify all gene sequences.

An insider's account of the Human Genome Project.

HGP origins, 1985 (and earlier) Santa Fe 1986: Human genome baby-steps | Nature

[Charles DeLisi writes] “The delegates, who gathered in Santa Fe on 3–4 March 1986, included many who had been involved in the earlier Office of Technology Assessment report, as well as several who had attended the Alta [Utah] summit the previous year. Other prominent geneticists were also present, as were representatives from industry…

I also developed a rapport with the Republican senator from New Mexico, Pete Domenici. Domenici, being from a state that housed two major national laboratories, Sandia and Los Alamos, was accustomed to dealing with abstruse physics projects, and was pleased to have before him a project whose relevance could easily be explained to his constituency. As a member of the Senate’s budget committee and a ranking member of the powerful appropriations subcommittee on energy and water development, we needed Domenici to obtain the support of Congress and Administration to move the project forwards.

With support from the Secretary of Energy and the OMB, a $13-million line item initiating the genome project appeared in President Reagan’s budget submission to Congress in January 1987. It subsequently passed both Houses, and 1988 saw the first official expenditures on the Human Genome Project…

I left the DoE in the summer of 1987, feeling naively certain that the project was in safe harbour, and that a complete sequence would be ready by our target date of 2001. We got the date right, but for the wrong reasons. The 2001 date was based on an assumption that the economy would be relatively normal. In fact, the mid-1990s was an incredibly vibrant period economically and stimulated investments from venture capitalists, some of whom made possible the formation of Celera Genomics, the company led by Craig Venter that was at the head of a private-sector sequencing effort. Without the ensuing public versus private competition, it is unlikely that the complete sequence would have been ready by 2001 because the target date was reset to 2006 after I left Washington.

Mark Wolfe Bitensky, 1982

“We must learn enough about the brain and spinal cord to be able to replace damaged parts of the nervous system with prostheses, to be able to reconnect isolated neuronal compo-
nents, to be able effectively to replace sense receptors. We must attempt to learn enough about data processing by the brain to use such knowledge for computer science… Instead of having perpetually to depend on the dinosaur era for our fuels, we could relax energy needssomewhat by innovative combinations of living organisms and biochemical reactions. Suppose we begin with a photosynthetic organism that efficiently utilizes solar energy. And suppose that organism releases amino acids or other nutrients. A second organism, perhaps a genetically engineered one, might take the metabolizes made by the first and convert them into something we need. Imagine combining microorganisms and sunlight with sewage effluent, as a nitrogen source, and obtaining starting materials for the synthesis of plastics and fertilizers or amino acids for cattle feed. Japanese scientists are working effectively with microorganisms to produce amino acids by fermentation technology.
At Los Alamos we combine expertise in recombinant DNA technology, genetic engineering, bacterial fermentation, solar ponds, and waste management in a fledgling effort to explore these possibilities. ”


Charles DeLisi did pioneering work in theoretical and mathematical immunology. He received his Ph.D. in physics and did postdoctoral studies in the chemistry department at Yale University researching RNA structure. He became a theoretical physicist at Los Alamos National Laboratory and then moved to the National Institute of Health, where he worked on molecular and cell immunology for ten years.

DeLisi is currently director of the Biomolecular Systems Laboratory, Chair of the Bioinformatics Program, Metcalf Professor of Science and Engineering and Dean Emeritus of the College of Engineering at Boston University.

Charles DeLisi develops computational methods for high throughput genomic and proteomic analysis. His laboratory is helping to develop technologies for fingerprinting the complete molecular state of a cell. He is interested in finding computational methods for determining protein function and researches the structural basis of signal translation by membrane bound receptors, the structural basis of voltage gating, and the docking of peptide hormones and neurotransmitters at their sites of action.

In 1986, DeLisi and Watson met at a CSHL meeting and spoke about their interests in sequencing the human genome.

The animated presentation below seems very occulted to me:

The Los Alamos Center for Human Genome Studies
by Larry L. Decn’en and Robert K. Moyis



1 Like

Oh my. That image is highly disturbing. These people are gross. I had noticed the vagina dentata imprinting, but this image blew by me.

1 Like

ENCODE: Deciphering Function in the Human Genome

Genome Advance of the Month

ENCODE: Deciphering Function in the Human Genome

Roseanne F. Zhao, Ph.D.

NIH Medical Scientist Training Program Track 3 Scholar


From uncovering the double helix of DNA to sequencing the roughly 3 billion letters of code that make up the complete genetic blueprint of humans, our inward journey of discovery has been filled with historic milestones. Achieving an understanding the human genome — for example, what information is encoded in the human genome, and how it functions and interacts with the environment — is an exciting scientific undertaking because of its potential to reveal key insights into how our DNA gives rise to all of the proteins required for building a human being. Such knowledge would have broad implications for a myriad of cutting edge questions in biology and medicine, including gene regulation, natural variation between individuals, disease susceptibility, and human evolution.

However, reading and interpreting the human genome sequence has proven to be very challenging. Scientists have been able to identify approximately 21,000 protein-coding genes, in large part by using the long-ago established genetic code. But these protein-coding regions make up only approximately 1 percent of the human genome, and no similar code exists for the other functional parts of the genome. Evidence has accumulated over the years that at least some of the remaining 99 percent of the genome is important for regulating gene expression, yet we lacked a global view of how much of the genome was functional, where these other functional regions were located, and in what cell types they were active.

To address this gap in our knowledge, the Encyclopedia of DNA Elements (ENCODE) was launched in 2003 as one of the next steps to understanding how to interpret the information locked within our genomes. Funded by the National Human Genome Research Institute (NHGRI), the ENCODE Project set out to systematically identify and catalog all functional elements — parts of the genetic blueprint that may be crucial in directing how our cells function — present in our DNA. Initially established as a pilot project focused on 1 percent of the human genome, ENCODE was scaled to whole genome analysis in 2007; that same year, a related project named modENCODE was initiated to map all of the functional regions in the worm (C. elegans) and fly (D. melanogaster) genomes. In its scale-up phase, the ENCODE Project was a massive collaborative effort by a consortium of 32 research groups, comprised of more than 400 scientists.

The main results of this ambitious effort have now been reported in 30 coordinated papers published in the September 6, 2012, issues of Nature, Genome Research and Genome Biology, along with additional ENCODE-funded papers in Science, Cell and Nucleic Acids Research. Together, they highlight an initial analysis of 15 trillion bytes of raw data, generated from 1640 datasets that involve 147 cell types.

Within this treasure trove of data, researchers found that more than 80 percent of the human genome has at least one biochemical activity. Although it is currently unknown whether all of this DNA contributes to cellular function, the majority can be transcribed into RNA. Furthermore, nearly 20 percent of the genome is associated with DNase hypersensitivity or transcription factor binding, two common features used to identify regulatory regions. Both of these measurements are a much higher percentage than the previous estimates that 5-10 percent of the genome was functional.

Significantly, more than 4 million regions that appeared to be regulatory regions, or “switches,” were identified. These switches are important because they can be used in different combinations to control which genes are turned on and off, as well as when, where and how much they are expressed. Effectively, this provides precise instructions for determining the characteristics and functions of different cell types in the body. Changes in these regulatory switches, especially those regulating critical biological processes, can thus influence the development of disease. The astounding amount of gene-regulatory activity uncovered in the human genome is striking, as more of the genome encodes regulatory instructions than protein, and prompts an assortment of complex questions on how the genome is involved in health and disease.

As a foundational information resource for biomedical research, the data put forth by the ENCODE Project is openly accessible and available through the ENCODE portal ( More than double the amount of data used in these analyses has now been generated and made available through this portal.

In addition to the individual papers, results have also been organized along “threads” that explore specific scientific themes ( This new approach of incorporating, organizing and presenting data from relevant sections of different papers, in different journals, helps to facilitate better user navigation through the immense amount of data and analyses generated.

The ENCODE results are already influencing the way scientists are thinking about both new and existing data. For example, Thread #12 in the NatureENCODE site focuses on the impact of functional information in understanding genetic variation within the human genome. Genome-wide association studies (GWAS) have previously been used to comb the genome for regions that are associated with specific human diseases or other traits. By comparing DNA sequences from hundreds to thousands of people either with or without a given disease, researchers have been able to identify regions containing variants that are associated with disease. Interestingly, more than 90 percent of these variants have been found in non-coding regions. However, because genetic variants within a given region may be linked to many other variants within the same region, it has been difficult to determine which variants have a causal contribution to increased disease risk.

But when researchers compared the locations of non-coding functional elements identified by ENCODE with disease-associated genetic variants previously identified by GWAS, they detected a striking correlation between the two: genetic variants associated with diseases or other traits were enriched in regulatory switches within the genome. This is exciting because it provides an overarching framework for looking at many different diseases (including Alzheimer’s, diabetes, heart disease, and cancer) — and identifying the numerous genetic variants that cause them — beyond the context of DNA that code for proteins.

Even outside its extraordinary scientific contributions, the structural model of the ENCODE Project is fundamentally changing the way large-scale scientific projects are being conducted. Resources such as the ENCODE analysis virtual machines ( provide access to various stages of analysis, including input data sets, methods of analysis and code bundles. ENCODE software tools, data standards, experimental guidelines and quality metrics are all freely available at the ENCODE portal. This allows other researchers to independently assess and reproduce the data and the analyses — with a focus on scientific access, transparency and reproducibility — or to use similar methods to analyze their own data.

To date, 170 publications from labs that are outside of ENCODE have used ENCODE data in their work on human disease, basic biology, and methods development (see: Through the establishment of a basic reference data set, along with accompanying analytical resources, scientists expect that further breakthroughs will be forthcoming in the upcoming years.

However, this is just the beginning and much work remains to be done before we are able to extract all of the functional and disease-related readouts from a genomic sequence. A glance at the various threads will show that the future challenges are numerous and range from computational and analytical challenges to uncovering the complex mechanisms of gene regulation. Understanding how the linear 2D sequence of DNA code correlates with the intricate 3D fractal patterns of folded DNA, which can be important in shaping regulatory network interactions, will also be essential.

The foundations laid down by ENCODE will be invaluable in helping us to figure out how genetic variation influences gene regulation, human health, and disease. To expand and build up a more comprehensive understanding of the human genome, NHGRI has renewed funding for the ENCODE Project for an additional four years to deepen the catalog of functional elements through the study of additional cell types and factors; this build-out phase will also focus on new methods of data analysis (see: By achieving an improved understanding of genetics in normal and diseased conditions, we will eventually be able to realize the full potential of bringing individualized genome sequencing and personalized, genomic medicine into the clinic.

Further reading and resources

Posted: November 8, 2012

1 Like

My relevant comments from April 2020 at POM:


April 7, 2020 at 6:45 am

This is a history of the Human Genome Project


April 7, 2020 at 6:47 am

Defining “mutation” and “polymorphism” in the era of personal genomics
Roshan Karki, Deep Pandya, Robert C. Elston, and Cristiano Ferlini
Hopefully this helps to explain further the research into SNPs…


April 7, 2020 at 2:54 pm

HapMap Haplotypes are very pertinent to this, as it would explain the reason why the US “decided” to use a different PCR test than China (hint: they “needed” a different test – one that can detect SNPs in our population). Hope this link provides enough background info to explain.


April 7, 2020 at 3:13 pm

One more link for background on the HapMap Project.International HapMap Project.
To reiterate, my theory only makes sense if you take the “virus” narrative out of the equation, since it was there to obscure all along. If the PCR test is testing for SNPs encoded in DNA/chromosomes…then this is all an extension of a massive epigenomics R&D operation (albeit highly classified and involving those with high security clearances). Kary Mullis, himself, set forth that the PCR test should only ever be used in R&D, and NOT as a diagnostic tool. Hence, I am suggesting that this is precisely how it is being utilized – even though it appears to be for diagnosis of said viral infection. With that said, the entire lot of “frontline” doctors and nurses shown all over the mockingbird media are in reality part of a massive, global simulation (functional pandemic preparedness exercise)…

1 Like

Is this sort of transplant procedure an indication of where this human genome project will ultimately lead with the real potential of actual building a trans-human? Not sure if they can actually do this, but seems like that is their goal.

Yes, sounds very plausible, the PCR test was the means to get the sample need to detect SNP. Now what? How will Empire use this information and what are the negative consequences to us as a people?

My very short answer – for nowPrecision Medicine.

(Note involvement of Seattle, WA; that is where the first American “sample virus” was extracted/mined and uploaded to the online database for all use cases/frame of reference in the U.S.)

From the standpoint of biomedical research, a number of large, data-intensive collaborative projects, such as the International Hap Map project (7, 8), The Cancer Genome Atlas (TCGA) (912), the 1000 Genomes (1000G) study (1316), the GTEX consortium (1719), and the Human Cell Atlas (HCA) (20, 21), and others are establishing novel frameworks for the molecular study of health and disease. Such frameworks are firmly supported by robust database management and integration strategies that are allowing them to develop into central tools for basic and translational biomedical research.

To elaborate a bit on my short answer – RE: Precision Medicine . . .

Next up is Gene Blockchain Smart Contracts (digital global gene blockchain platform; tokenized genomics)

My supposition is that the COVID project entailed going GLOBAL with a precision medicine R&D initiative, with aims to blockchain all of this genomic data for all of the Empire’s ‘omics’ R&D projects – which I have spoken to in my writing at POM (and Biden’s new Executive Order speaks to this as well).

I have a highly pertinent PDF that I do not know how to upload (?), so I had to copy/paste it here for now – I suggest this is super crucial, as it potentially relates DIRECTLY to the Ethereum Merge presumably occurring some time tonight or tomorrow (see the Ethereum references below):




1 2

3 4



7 8

Abstract … 2

Background Introduction… 3

  1. 2.1 Background of Human Genome Project … 3
  2. 2.2 The Development of Gene Sequencing Technologies … 4

Our Vision … 5

Business model… 6

  1. 4.1 Gene Blockchain Smart Contracts… 7
  2. 4.2 Gene BTC … 8
  3. 4.3 GeneLab … 10
  4. 4.4 GeneNetwork … 11

Our Technology … 12

  1. 5.1 Advanced genome sequencing technology … 12
  2. 5.2 Powerful cloud analysis platform for gene data … 13
  3. 5.3 Elliptic curve cryptography… 16

Our products… 16

  1. 6.1 Digital genome distributed data warehouse… 16
  2. 6.2 Healthcare platform … 19
  3. 6.3 Gene entertainment platform… 20
  4. 6.4 Gene social platform … 21

Development plan … 22 Team introduction… 22

1 Abstract

The 21st century is the century of life sciences. Human cognition has entered the deepest secret of gene sequence. The 23 pairs of chromosomes, made of 3 billion base pairs, created our colorful and brilliant human civilization. These gene sequences encode us from birth to death, and all the varieties among us. For instance, in health care area, we have begun to move towards personalized, precise medicine. It also shows that human beings truly realize that the future will be personalized civilization based on the gene difference among all human beings.

However, the current gene data is highly fragmented, lack of uniform standards, and unable to be applied to use. Our team aims to build a standardized, digital global gene blockchain platform. On this platform, ordinary people can correctly understand, use and enjoy the benefit and fundamental changes brought by the application of genetic data.

New blockchain technology / Distributed Ledger Technology(DLT) offers an ideal solution for our gene blockchain platform. Since blockchain holds an immutable ledger and assured contracts, it can easily accommodate the recording and storing of life cycle events of all gene blockchain based product, usage of gene data, verify patient identification, assure payments, and more. Gene Blockchain seeks to create a crypto-token built on smart contracts on blockchain specifically for the worldwide gene blockchain platform. This can solve the largest problem of gene data, that is, how to standardize gene information and price. The Gene Block Chain is the first community in the world dedicated to leveraging smart contracts on blockchain to create and facilitate the standardization of big gene data and pricing correctly. In this platform, genetic data is no longer government or institutional privilege. It will help ordinary people to correctly understand the importance of genetic data and trade, receive medical treatment, enjoy entertainment, make friends and etc. on this personalized gene blockchain platform.

2 Background Introduction
2.1 Background of Human Genome Project

The Human Genome Project, which is jointly organized by USA, England, German, France China and Japan’s scientists, is the “Human Genome Project”, with the Manhattan Atomic Program and the Apollo Moon Plan, known as the “Three Plan” in the history of natural sciences. The impact of “HGP” will be far more than the other two programs. Commonly known as DNA, the human genome contains approximately three billion chemical base pairs., distributed in the nucleus of 23 pairs of chromosomes. The core heart of “Human Genome Project” is the construction of DNA sequence diagram, that is analysis the order of base pairs of human genomic DNA molecules and drawn into a sequence diagram. 3 billion is an astronomical figure, this plan can also be said to read the book of life. All top scientists from the United States, Britain, Japan, China, Germany and France participate in the program.

The ultimate goal of this program is to determine the human genome of 3 billion basic chemical composition (called base pairs or nucleotides), and then reveal the secret behind the tens of thousands of gene which are related to human from birth to death.

The Importance of Human Genome Project and Its Impact on Bio - economy and Human Life

The Human Genome Project, together with the Manhattan Atomic Bomb Program and the Apollo Moon Plan, is known as the three major scientific projects of the twentieth century. During the turn of the century, the Human Genome Project, which is broader and more influential, will be the greatest scientific project and run through the 21st century. The Human Genome Project is a plan for human beings to recognize itself, bringing mankind into a new era. From the perspective of life sciences, this era is based on DNA sequences, bioinformatics-led bioscience and biotechnology in a new era.

The Human Genome Project aims to solve the problem of human health and to promote the development of the bioinformatics industry. The human genome project interprets the human genome bible, which has led to the rapid development of medical science, medicine and health industry, brought the fundamental change of life sciences and biology industry, led to the sequencing of technological innovation, accelerated the scientific discovery and industrial transformation, achieve the breakthrough among a large data-oriented science industry, forming a super “Moore’s Law.”

For the completion of the human genome project, its impact to the development of the biological economy is enormous. For example, in the United States, since the start of the human genome project, the US federal government’s investment on the human genome project-related industries is equivalent to each US Citizens invested $ 2 a year, however, it generates about $ 1 trillion in economic benefits. If only the federal government’s total investment in HGP was taken into account, HGP had a return on investment of 178: 1; if subsequent investments were taken into account, the return on investment Is 65: 1, that is, each investment of $ 1 on the human genome project and related genomics research from the federal government can bring $ 65 economic benefits, which will enhance public health, create jobs and promote the fundamental change of health care industry.

The human genome project reveals the framework of the human genome, which is far more than the gene itself, and has great potential in the fields of medicine, agriculture, industry, environment, and energy, triggering new technological revolutions and possible to fundamentally solve the world population, food, environment, energy and other major problems affecting the survival and development of mankind. In the medical field, a huge breakthrough in innovation has been foreseen. Based on the breakthrough of core technologies such as genetic engineering, cell engineering and biochip, the widespread application of emerging biotechnology in the medical field not only greatly transforms the traditional pharmaceutical industry, but also makes the medical technology from disease treatment to gene diagnosis and prevention, opening the future of personalized medical care, precision medical door.

Following the Human Genome Project, human genetic research has begun to develop in the direction of genetic testing of diseases associated with human reproductive health, personalized cancer treatment, pathogenic microorganisms, hereditary diseases, and blood diseases. In the future, medical technology will move from the disease treatment to the gene diagnosis, disease prevention, personalized medical and precision medical care. Human beings will provide more effective and targeted treatment through the use of genetic testing techniques to predict the risk of potential diseases through personalized diagnosis, to prevent the occurrence of a disease.

We will rely on the world’s leading genomics technologies and help tens of millions of families away from hereditary birth defects, early detection and diagnosis of tumors and can be panoramic, regular monitoring of personal health dynamics. We hope everyone is able of know their own genes and can control of their own health. In the first phase, our research will mainly focus on hereditary birth defects, cancer, cardiovascular and cerebrovascular diseases, precision medical and so on.

2.2 The Development of Gene Sequencing Technologies

With the advancement of the human genome project, gene sequencing technology is also rapid development. Gene sequencing is no longer a mystery technology. It is gradually walk into the lives of civilians. In the latest decades, several giant sequencing companies grow up, such as Roche, Illumina and ABI.

At the same time, the cost of gene sequencing is also rapidly declining, breaking the Moore’s Law, which is the basis of our project operation. According to NIH statistics, in 2001, the whole genome sequencing needed to cost up to 100 million US dollars; in 2011, after the successful launch of the second generation of sequencing technology, the whole genome sequencing price has dropped to 10,000 US dollars; 2014, with the launch of Illumina X Ten, the price once again largely reduced to 1 thousand dollars. In January 2017, at the JP Morgan Health Conference, IIIumina NovaSeq 6000 stand out in the biotech gene sequencing market. Illumina’s president said: "The launch of NovaSeq is one of the most important turning points in Illumina’s innovation history. As we released HiSeq X for the first time in 2014, the instrument reduced the cost of each genome to $ 1,000 through the HiSeq® architecture, and we believe that the system developed today by the NovaSeq architecture will one day reduce the price of a single genome to $ 100. "

Figure 1 The development of genome sequencing

3 Our Vision

Based on current blockchain technology, we are building human genome blockchain, aiming to organize a new platform for exploration, application, contribution and equal sharing. It will bring fortune to every participates. genetic data is no longer government or institutional privilege. It will help ordinary people to correctly understand the importance of genetic data and enjoy the benefit brought from big gene data.

The genetic background data is the gene mark since our birth, and the 3 billion base pairs encode us from birth to death. At the same time, our living habits, the choice of spouse and the choice of work are all related to our inner gene sequences. To deepen the understanding of the human gene sequence will help us to better understand ourselves, to avoid trouble.

The Gene Blockchain Platform committed to be:

  1. We dedicate to deepen the understanding of gene sequence: establish the most abundant andstandardized gene functional databases. It will lay a foundation for all of our activities, such as health plan, living activities and social network, in our platform.
  2. We support the gene blockchain community. We will try to use as much GeneBTC tokens and as little fiat currencies as possible while implementing our ideas.
  3. We are standardizing and unifying a fragmented industry. We will build blockchain smartcontract solutions for many of the industry’s weaknesses. The industry needs help with labtesting, supply chain, ID verification, compliance, etc. All these respond well to smart contracts.
  4. We seek to create an open and equal platform for all participates. More participates will bringmore ideas and value.
  5. We are transparent. Funds will be escrowed and our books will be audited by reputable companies, such as Deloitte, Grant Thornton, or other well-recognized, internationa laccounting firms experienced with the nascent blockchain industry.
  6. We are responsible. We limit our own benefits, putting the cause first. This is our passion. We are committed to the legalization efforts and the community.
  7. We give our members a voice. Pitch your ideas, help others, get funded, support other ideas, and decide our future through a voting system.
  8. We are groundbreaking. We will take advantage of the media’s interest in the hot topics ofgene applications. We will become the first and largest platform for standardized genetic data and its application.

Gene blockchain platform stand in the edge of the new century. Gene Blockchain’s passion, committed community, strong technology, and experienced leadership make this cryptocurrency the natural choice for the genome industry.

4 Business model

We started Gene Blockchain to resolve many issues currently faced by the gene data industry. Blockchain-built smart contract technology is ideally suited to organize, systematize, and bring verification and stability to a traditionally unchecked industry. For example, genomic data is now highly fragmented and non-standardized. Pricing of genomic data is also a key problem. Now only the government and large companies have the gene data, ordinary people are lack of understanding, and completely deprived of ownership and use rights of genome data.

Gene blockchain will use blockchain smart contracts to:

  • :black_small_square: Create an immutable ledger for all industry related data via GeneChain
  • :black_small_square: Offer payment for industry related services and supplies through GeneBTC
  • :black_small_square: Establish advanced labs for human genome data analysis via GeneLab
  • :black_small_square: Organize and unite global platform for health, entertainment, social network and etc. throughGeneNetworkThe Gene Blockchain model starts with a crypto-token and blockchain technology. We will create an open, equal, shared gene community (GeneNetwork). In this community, members can explore the meaning of genetic data, share genetic data resources, develop personalized products, create disease treatment options or health programs. This will solve the problem of gene data standardization, large data analysis, how to apply, how to promote and so on.On the GeneNetwork, participants can also quickly find personalized products of various types. Such as personalized disease treatment products, personalized health management methods, suitable entertainment experience, more matching dating patterns and so on.Ultimately, all products and services are personalized on GeneNetwork.

4.1 Gene Blockchain Smart Contracts

Using Blockchain Smart Contracts to Innovate the Gene Data Industry.

Gene Blockchain takes advantage of the digital and verifiable nature of blockchain to solve the fragmentation, un-standardization and non-correct pricing of the genetic data industry. Blockchain smart contracts are ideal for recording and facilitating the exchange of value, goods, services, and private data. Putting genome data and transactions on blockchain smart contracts will also increase the speed of service and save hundreds of thousands in reduced paperwork.

The Gene Blockchain smart contracts can immediately serve a number of businesses within the genome data industry. With legal changes, increased community regulations and acceptance, other business opportunities may emerge. Blockchain smart contracts can instantly and accurately register and record these events:

  • :black_small_square: Blockchain based smart contracts provide accountability in a way no other technology can offer.
  • :black_small_square: Provides an immutable ledger that offers permanent verification of every past transaction, so it builds trust
  • :black_small_square: Stores all product lifecycle events forever in an easy-to-retrieve system
  • :black_small_square: Lets multiple apps simultaneously interact with any piece of information stored in blockchain
  • :black_small_square: Offers anonymous patient identification
  • :black_small_square: Facilitates peer-to-peer transactions across the globeThe combination of decentralized encryption, anonymity, immutability, and global scale turns Gene Blockchain into the ultimate online community for the legalization of genome data across borders.Gene Blockchain is built with smart contracts on the Ethereum blockchain. It is an advanced, open and completely decentralized application platform. Ethereum uses all the strengths of Bitcoin’s original technology. Blockchain was first established as a digital currency for use in financial systems, but the second generation blockchain and its associated smart contract technology can be used for so much more.

Ethereum builds on Bitcoin to offer contracts and other kinds of verified transactions. Gene Blockchain adds another layer to Ethereum, letting it focus on solving problems unique to genome data growers, dispensaries, labs, doctors, and customers. Gene Blockchain builds on the strength of a well- established system to offer applications, perform financial services, create a new cryptocurrency, and form a messaging system.

As such, Gene Blockchain provides not only the groundbreaking Ethereum based crypto-currency called Gene BTC (GBC) but also a powerful, modular toolset to build applications that can track shipments, verify potency, identify medical patients and their prescriptions, and a host of applications not yet imagined. Ethereum gives users complete freedom to create their own applications on the Gene Blockchain platform.

4.2 Gene BTC

GeneBTC (GBC) can create a global ecosystem where businesses and consumers can quickly and easily verifiably transfer funds — business to business, business to consumer, and/or consumer to consumer. A cryptocurrency correctly implemented is the logical solution to an ongoing issue with an unregulated marketplace. It readily solves the most major impediment to standardization and pricing of genome data, which lay a foundation to the industry.

Gene Blockchain Distribution and Supply

Gene Blockchain will be the digital token that powers and incentivizes the Gene Blockchain user community and content platform. This digital token can be exchanged for fiat currency (dollars, euros, yen), in jurisdictions where such exchange is legal, or other cryptocurrency (e.g., Bitcoin, Ether) on various cryptocurrency exchanges after the initial ICO. The projected value is expected to be about USD $1.00 per GBC (fiat currency price throughout this White Paper is provided for illustrative purposes only; no fiat currency will be accepted during the GBC token crowd sale, only cryptocurrencies such as BTC, ETH, LTC IOTA and others).

Name: GeneBTC Ticker: GBC
Based on: Ethereum Technical data:

  • :black_small_square: A total of 200,000,000 GBC will be generated. There will be no further production of tokens so, over time, the tokens in circulation shall reduce in number and increase in demand.

  • :black_small_square: Desktop wallets for Mac OS, Windows, and Linux keep cryptocurrency safe while allowing foreasy transfers, balance viewing, and simple use.

  • :black_small_square: Tokens are created with an ERC20 token smart contract. The integrity of the system is built onthe self-interest of token owners. Owners of more tokens may have more say in the Gene Blockchain community and the direction Gene Blockchain takes.

  • :black_small_square: Fast network speeds with transactions settling in a minute or so.

  • :black_small_square: Ethereum backed voting feature: Gene Blockchain uses blockchain-based voting for a truedemocracy.

  • :black_small_square: Expanded notary and/or oracle service is possible on Ethereum blockchain with GeneBTC.

  • :black_small_square: Multi-signature accounts can be implemented in just two clicks.

  • :black_small_square: Fixed fees make it easy to know your costs. You don’t need to calculate how much eachtransaction will cost.

  • :black_small_square: Best Application Program Interface (API) and smart contract documentation available onblockchain.

  • :black_small_square: User-friendly Graphic User Interface (GUI) makes it easy to go from idea to implementationwithout a lot of tech skills or developers needed.Token distribution is an important part of a token crowdsale. The distributed value and frequency of token production influence token price. 200,000,000 total tokens will be generated. These tokens will be introduced in two ways. Token crowdsale will be conducted as follows.

  • :black_small_square: 100,000,000 tokens for sale valued at $1.00 USD each at stage 1

  • :black_small_square: 50,000,000 tokens for sale at stage 2 - no sooner than 2022, at market price (not the initial $1 of the first crowdsale )

  • :black_small_square: 40,000,000 tokens allotted for Gene Blockchain controlled reserve to maintain price support of the Gene Blockchain tokens. Tokens can be bought or sold to keep the tokens circulation stable

  • :black_small_square: 10,000,000 tokens community-controlled reserve to be used for the best startup ideas as voted on by the communityGene Blockchain is committed to a fully transparent process even beyond the open source coding. Here are other ways we will work for transparency and community control.

  • :black_small_square: Engage one of the “Big Four” accounting companies for annual third-party audits.

  • :black_small_square: Founders and team members who own Gene Blockchain will be prohibited from liquidatingthat Gene Blockchain at a rate of more than 20% of their position within the first calendar year. This is to prevent dumping and to keep a stable token price. It keeps their interests aligned with the Gene Blockchain community.

  • :black_small_square: Reserved token crowdsale funds will be inaccessible for any other purpose than future token crowdsale events. The Gene Blockchain price of the second and third token crowdsale events will be determined based on, but not limited to, the Gene Blockchain exchange price prior to the crowdsale event in question.

  • :black_small_square: Community approval will be used via smart contract voting. The voting may approve coin reserve unlocking, club membership policy changes, and other changes that affect the Gene Blockchain community.

  • :black_small_square: A minimal threshold amount will be required for a completed token crowdsale. The token offering will have a series of cap levels. If the token crowdsale does not reach its minimum cap of 10 million, any funds received during the token crowdsale will be returned to the original initiating wallets automatically. Assuming the minimum threshold is exceeded, but the maximum cap of 100 million tokens is not met, any unsold tokens will be burned. Any funds received after having reached the maximum cap of 100 million tokens will be automatically returned to the senders’ wallet.

  • :black_small_square: Third-party recognized escrow agent will ensure tokens deposited for a token crowdsale arekept secure until the token crowdsale is finalized and the tokens generated.

  • :black_small_square: Each token crowdsale will be designed to reduce the number of large buyers (whales) who may want to dump tokens. Instead, the token crowdsale will favor smaller investors who arecommitted to the genome data cause and plan on participating in the community.How to Get Gene BTCThe initial token crowdsale will take place between September 18, 2017 and October 17, 2017, and can be accessed via our website: Please register for the token crowdsale so you are notified of the opening of the event. Recent token crowdsales (also commonly referred to as ICOs) have sold out in minutes. Be sure to take advantage of notification and prior registration so you do not miss out.After the initial token crowdsale, opening a coin account with Gene Blockchain will be easy and free using existing Ethereum wallets. Because Gene Blockchain is truly decentralized, it uses peer- to-peer technology to operate with no central authority. The network collectively carries out the issuing of Gene BTC. It works everywhere, anytime, so business can be transacted 24/7 in any part of the world.Post-crowdsale, interested people will also be able to purchase and sell Gene Blockchain on exchanges, subject to applicable regulations in their country of residence.All transactions will be secured with state-of-the art cryptography, and the blockchain integrity will be protected by CPU-efficient, ASIC-resistant proof of stake. This unique model will allow us to speed transactions and satisfy banking needs for gene data businesses. In particular, it will offer genome data businesses and consumers a legal alternative to the current regulatory restrictions.Gene Blockchain is committed to the cryptocurrency community. We want to keep the value of Gene Blockchain strong and growing. We also do not want our token crowdsale to affect the Bitcoin price. To that end, we will be very careful as we convert the token crowdsale proceedings to fiat currency to pay for expenses. We will stage the conversion of the token crowdsale proceeds over time and through multiple cryptocurrencies and exchanges. This will dilute any impact that volume might make on either Gene Blockchain or other cryptocurrencies.4.3 GeneLabWe are creating high-tech advanced gene sequencing analysis laboratories. Part of the work of these labs are constant accumulation and refinement of the human genome database. Based on the genome data from the users, volunteers, paid anonymous and etc., we aim to establish the world’s largest human genome database. The sample distribution will be in accordance with the proportion of population coverage of the global region. These will be our core data. This part of the work will continue to improve in the next five years, and ultimately reach 10 million copies of human whole genome data. The second part of the work is to explore these human genome data in depth. Our knowledge on genomic data is very limited. The genetic information carried by each person, the disease, habit, hobby, height, looks, etc. are closely related. We will be in the laboratory to gradually establish and improve mining work on these.

Figure 2 Business model of GeneLab

4.4 GeneNetwork

We will build a gene blockchain platform based on the human genome database. As a gene coin owner, by uploading their own genomic data, you can get a detailed personalized report, and are able to real-time access to the latest personalized products and services. At the same time, various companies based on anonymous genomic data analysis, are able to develop personalized product solutions and services. Specifically, for example:

  1. To achieve real precision medicine. In the medical field, pharmaceutical companies can provide personalized disease treatment drugs and products according to the characteristics of genetic information, and users in our platform can obtain real-time information about their own drugs or products. On the one hand can prevent the occurrence of some potential diseases, and for the existing disease can get the most effective treatment;

  2. In the field of health, users can choose their own healthiest way of life according to their genetic background. And these personalized health programs are also developed by a variety of health companies based on genetic data analysis;

  3. Moreover, in the entertainment and social needs, users can also get their customized entertainment and dating products;

Our gene block chain platform will best understand our users themselves. And all the activities on this platform will be done through our Gene BTC to complete the trade and payment of products and services.

Figure 3 Business model of GeneNetwork

5 Our Technology
5.1 Advanced genome sequencing technology

We will work with the world’s strongest gene sequencing company and use the most advanced gene sequencing technology to rapid accumulation of human genetic data and complete the first phase of the human gene database construction.

The most powerful sequencer HiSeq X Ten will be our basic stand of platform for human genome database construction. This sequencing system consists of 10 HiSeq X sequencers, suitable for mass-scale sequencing projects. HiSeq X Ten uses the world’s most advanced design features to produce ultra-high throughput. A flow cell containing billions of nanopores, and a new cluster generation reagent significantly increase the density of the data. With the most advanced optics and faster reagents, HiSeq X Ten can be sequenced faster than ever before. Each HiSeq X instrument can generate 1.8Tb of data for three days, ie 600Gb per day. If you run 10 instruments at the same time, people can be sequenced every year more than 18,000 human genome.

Figure 4 The most advanced whole genome sequencing platform

At the same time, the most important thing is that HiSeq X Ten makes the human genome sequencing costs from the beginning of the $ 1 billion, down to the current $ 1,000. This allows us to sequence thousands or even millions of genomes. It lay a solid foundation for the establishment of human genome database, in-depth understanding of genes on our habits, diseases and other issues. At the same time, our genome sequencing platform is the world’s largest sequencing technology platform.

Figure 5 Develop the most appropriate scheme for human whole genome sequencing

5.2 Powerful cloud analysis platform for gene data

We have initially developed a series of software for the human genome data analysis, construction, and the data module for the future product analysis. At the same time, we also work with the US National Institutes of Health NCBI data platform, to establish a patented genetic analysis algorithm to for the human genome data analysis. We will establish a complete data for all sample genome of human 23 pairs of chromosome, each chromosome above the sequence will be genotyped analysis, corresponding to the body shape, disease, habits and so on specific characterization.

Figure 6 Genome Data Viewer

We have more than one hundred thousand gene analysis data, there are dozens of analytic software and platform. These software are able to conduct a comprehensive analysis and evaluation of the human genome within a week. E.g:

  1. BLAST software can compare and analysis the genetic sequencing of the original data, complete the construction of the human genome and eliminate errors, repeat the sequence. Figure 7 BLAST-gene data comparative analysis
  2. Genome Workbench software can conduct a preliminary analysis of human genome data. The software can integrate, view and analyze gene sequences. At the same time can use the public gene data and information to complete integration analysis of human genome data, which is greatly speed up the analysis. Genome Workbench is a toolkit built on the C ++ language that allows you to view and consolidate data across platforms. It can be used for Window, MacOS and various versions of the Linux system.

Figure 8 Genome Workbench – across platform genome analysis

  1. Genome ProtMap software can complete the genetic analysis of each individual’s signs, such as appearance, height, genetic disease, lifestyle, physical fitness and so on. The foundation of the study is that we can link the gene data to the protein. Protein is the basic component of our body and is the characterization of all structures and behaviors. Understanding the gene expression map is the key step of completing our gene database.

图 9 Genome ProtMap – disease related analysis

Then we will use ProSplin, Genome Remapping, gene tree, VecScreen and a series of software to help complete accurate analysis of all sequenced genomes. The final completion of each person’s genome mapping will include all information from gene sequence to the gene, to the gene tree, then to the related protein.

5.3 Elliptic curve cryptography

Elliptic curve cryptography (ECC) is an
approach to public-key cryptography
based on the algebraic structure of
elliptic curves over finite fields. ECC
requires smaller keys compared to
non-ECC cryptography (based on plain
Galois fields) to provide equivalent
security. Elliptic curves are applicable
for key agreement, digital signatures,
pseudo-random generators and other
tasks. Indirectly, they can be used for
encryption by combining the key
agreement with a symmetric encryption scheme. They are also used in several integer factorization algorithms based on elliptic curves that have applications in cryptography, such as Lenstra elliptic curve factorization.

6 Our products
6.1 Digital genome distributed data warehouse

All future product modules are based on this core databases. And the second part is digitalizing genome data by issuing gene blockchain currency. Each gene currency will represent an anonymous human whole genome data, including its all 3 billion pairs in 23 chromosomes. The owner has all rights to the profits generated from any person, organization, institute or enterprise by using these genome data.

Digital Human Genome Database is our core product. This product is divided into two parts. The first part is the genome-wide data that will include 10 million samples of human genome, and each person’s genome includes information of all 3 billion pairs of base sequences, which will be completed several phases. The basic data will adopt compression – encryption and fast storage technology. For re-designing distributed storage, each block contains SHA256 hash value in the first part, polymorphic labels, fixed length of the base pairs, and polymorphism contribution signatures. All future product models are built on these core data. In the second part, we will focus on the digital asset for genetic polymorphism. Digital gene currency will represent all genome data fortune in our databases. During our currency issuing, each gene currency represents one copy of anonymous human genome from our 10 million of human genome database. Each copy includes all sequence information of 23 pairs of human chromosomes. This is the reward as its distributed storage.

  1. The first part, 10 million copies of genome-wide data: Our goal is to build an encyclopedia of human genomes. In addition to building genomic maps, we also developed a method to descript genomic content in sequence level, including sequence variation and other description of the function and phenotype. Our human genome database provide genomic data in a tabular format, including gene units, PCR loci, cytogenetic markers, EST, Contig, repeat fragments and etc.; the genomic map, including cytogenetic maps, Linkage map, radiation hybridization map, cascade group diagram, transcription diagram, etc .; and polymorphism database like the allelic genes. In addition, the database also includes hypertext links to other network information resources such as the GenBank and EMBL, the Genetic Disease Database OMIM, the Document Summary Database MedLine, and other nucleic acid sequence databases. All these data are distributed in blockchain.

Figure 10 Construction of human genome core database

Figure 11 Genome Core Database - Gene sequence data model

Figure 12 Genome Core Databases – gene and protein analysis database

Figure 13 Genome Core Databases – gene and diseases analysis databases

  1. Digitize our genome data by GeneBTC

We will issue a gene currency to digitize the human genome data in the database, and each gene currency represent the right of a copy of human genome data and will enjoy the benefits from these genetic data.

6.2 Healthcare platform

Personalized medicine and health program are the general direction of historical development. Every person has a unique variation of the human genome. These differences made the difference treatment and healthcare plans when people get ill or need to maintain health. Based on the above-mentioned human genome database, we will build a platform for disease and health. Because human genes are associated with the health of each of us and all the diseases. Meanwhile, aiming to provide customized treatment plan, pharmaceutical companies and health management companies are eager to collect a large number of human genetic data for the development of drugs and health products.

Figure 14 Demonstration of gene healthcare platform

6.3 Gene entertainment platform

With the improvement of living standards, people’s demands for leisure and entertainment are getting higher and higher, and began to focus on self-entertainment experience. After we open the health care platform, we will further open the entertainment platform. All kinds of leisure and entertainment companies can use gene data on our platform to develop personalized entertainment products based on the “big data”. These products include personalized diet plans, personalized leisure products and so on. At the same time, our platform is also a more refined promotion service. It will be more accurate and effective to reach the target customers.

Figure 15 Demonstration of gene entertainment platform

6.4 Gene social platform

With the progress of society, the connection among people are more close and complex, forming the concept of the world village. But among all these “potential friends”, how to get friends with matched hobbies, temperament and etc., or even find the “date”, is unmet needs for all of us. The gene dating platform, in the future, can solve this problem. Life is too short, with our platform, you can reduce the cost of making friends, identify your own group and find your date in a more fast and efficient way.

After we accumulated enough gene data, we will open such data platform to all types of social networking platform companies. These companies can form a variety of personalized dating software and communities based on massive genetic data. It will be more convenient to let the people meet each other.

Figure 16 Demonstration of gene social network

7 Development plan

8 Team introduction

Andrew Dewar: Founder & CEO, engaged in the medical and health industry more than ten years, has a wealth of experience in the medical industry investment, has successfully invested three medical listed companies, more than 10 medical start-ups, access to more than 150% of the annual investment Return.

Ashley Beleny: Chief Operating Officer of the company, worked in the frontier of medical internet industry ofr over ten years. She has a very rich experience in the medical Internet, including bioinformatics, medical data construction and so on. Participated in a series of medical information projects, which are initiated by large listed information technology companies. She also worked with lots of fiance and insurance companies to develop products for medical services.

Caleb Cody: Chief Technology Officer, experienced in large-scale data collection, analysis and mining of human genome. Operating and Managing genomic centers for years. Has participated in a number of international cooperation projects, such as thousands of human genome, million vertebrates, five thousand insect genomes, global underground mouse research, cucumber genome, “Yanhuang one”, panda genome, rhesus genome, ants Genome and other projects, oyster genome, sea snail - seaweed symbiosis and other projects.

Michael Chou: Senior Technology Director, engaged in genomics and bioinformatics research, has been involved in a series of major research projects, including the International Association of thousands of people genome, the Diabetes Genome Project, Human Pan Genome Atlas, Plateau Genome Project, Disease genome plan and so on. Currently focused on bio-information cloud computing and genetic data.

Jeniffer Deasy: Vice president of Business Development, has long been engaged in the development of applications and solutions for medical information industry. Participated in the development of the hospital information system, regional medical information system and public health information system. Familiar with workflow technology, XML technology, database technology, good at system analysis, design and technical architecture. Now focus on cross-regional interoperability EHR platform and hospital information integration Platform design and development work.


Connor Olson:A founding member of the National Human Genome Center, has established the first high-throughput DNA sequencing facility to lead the genomics research. During the study of the Ph.D., he discovered the RIG-E gene, " human cloning new gene". He has published more than 90 academic papers in SCI journals, including Nature, Cell, Nature Genetics, Nature Biotechnology, PNAS and other journals, and has been involved in the compilation of the new edition of “Medical Genetics”.

Joshwa Mathai:Focused on the human genome and disease, life-related research. Especially in the gene sequence and telomerase changes in the correlation study. Found that the unique repeats on chromosome telomeres are key factors in protecting telomere degradation. Provides a series of theoretical guidance on the genetic research of the human genome associated with aging.

Abner Medina:Engaged in cancer research work. Its research in the field of human genes and cancer related to more than 30 years, discovered a number of cancer-related genes and pathogenesis. It also found the mechanism of cell destruction, degradation and re-use of the protein. By labeling the target protein, the cell can target the activation of protein degradation pathway, for a large number of cancer development research provides a theoretical basis.

RE: Elliptic Curve Cryptography



Ok, I think I was able to upload the PDF (that I copied/pasted above) . . .

This circles back to Eugenics and the Race Betterment Foundation:

> In addition to the Eugenics Record Office (ERO), several national organizations promoted eugenics at professional and popular levels. The American Breeders Association (ABA) was established in 1903 as an outgrowth of the American Agricultural Colleges and Experiment Stations. It was one of the first scientific organizations in the United States that recognized the importance of Mendel’s laws, and its Section on Eugenics was the first scientific body to support eugenic research.
> With a membership of about 1,000 established scientists and agricultural breeders, the ABA played a major role in legitimizing the American eugenics movement but avoided popular campaigns and legislative lobbying. However, it shared members and officers with several other organizations that had wider social agendas - notably the Race Betterment Foundation, the Galton Society, and the American Eugenics Society (AES).
> The Race Betterment Foundation was founded in 1911 in Battle Creek, Michigan with money from the Kellogg cereal fortune. The Foundation sponsored three national conferences on race betterment (1914, 1915, and 1928) and started its own eugenics registry in cooperation with the ERO. The Galton Society, founded in New York City in 1918, was the most overtly racist of the American eugenics organizations. Its members used physical anthropology to confirm their bigoted notions about the supposed superiority of the Nordic race.
> Formed in 1923, AES quickly gave rise to 28 state committees that worked to bring eugenics into the mainstream of American life. Under the direction of Mary T. Watts, the AES education committee used state fairs to popularize eugenics. Exhibits illustrated Mendel’s laws and calculated the societal costs of continued breeding by “hereditary defectives,” while the Fitter Families Contests showed the results of breeding good human stock. AES also lobbied for broader use of intelligence tests on immigrants and students. For many years, the AES co-sponsored Eugenical News with the ERO.

Part 1 of 3 (too long to share in one reply as per this platform) . . .

Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance

Robert Cook-Deegan, Rachel A. Ankeny, and Kathryn Maxson Jones

Author information Copyright and License information Disclaimer

The publisher’s final edited version of this article is available at Annu Rev Genomics Hum Genet

Go to:


Genomics has data in its DNA. The term “genomics” took root in 1987 when Frank Ruddle and Victor McKusick borrowed Tom Roderick’s neologism to launch a new journal (92). Genomics has since become a field, or at least an approach to biology and biomedical research. It generally describes a style of science and application that features measuring many genes at once, rather than one gene at a time; intensive use of instruments for mapping and sequencing nucleic acids; generation and utilization of large data-sets, including DNA sequences, their underlying mapping markers, and functional analyses of genes and proteins; and computation. The extension from more traditional genetics entailed greater scale and implied a need to share data, because no one laboratory could fully make sense of the deluge of data being generated while mapping and sequencing the genomes of humans and other organisms.

The Human Genome Project (HGP) was conceived in 1985, when Robert Sinsheimer, Renato Dulbecco and Charles DeLisi independently realized that it would be useful to have a reference sequence of the human genome as a tool for research and application (35; 103). DeLisi had the authority to fund a research initiative within the U.S. Department of Energy (DOE). As vigorous debate proceeded through 1988, the goals were broadened to include, in addition to humans, mapping and sequencing the genomes of model organisms: Escherichia coli (bacterium), Arabidopsis thaliana (plant), yeast (initially Saccharomyces cerevisiae, and then others), nematode, although not formally included in the HGP at first (Caenorhabditis elegans), fruit fly (Drosophila melanogaster), and finally Mus musculus (mouse) as the mammalian model. The imagined project also included research to advance—by increased speed, lower cost, and improved accuracy—the instruments and methods of DNA sequencing and mapping, as well as computational methods and algorithms. Finally, the project incorporated research on ethical, legal and social implications (ELSI). The ELSI program was novel and distinctive, setting a precedent for initiatives that had foreseeable impacts well beyond the technical community (155). It appears today that the greatest challenges of data sharing are indeed in law, ethics, and policy. The technical challenges are daunting; the social and legal complexities are even more so.

The HGP left many legacies. The most obvious is a reference sequence of the human genome that continues to be refined; but the Project also drove the creation of new instruments. A major avenue has been the Advanced Sequencing Technology Program, directed by Jeff Schloss of the National Human Genome Research Institute (NHGRI), which helped conceive and develop most of the technologies that gave rise to the hyper-Moore’s curve acceleration of DNA sequencing speed and plummeting costs from 2004 to 2014 (44; 68). The Program supported new mathematical approaches to analyzing data in bioinformatics, and the NHGRI became known as an institution that could manage large-scale, technology- and data-intensive research programs requiring more coordination than the distribution of peer-reviewed grants could provide. One of the signature features of NHGRI has always been an open science ethos associated with the pre-publication sharing of data (33; 60).

The HGP reached its goal of a human reference sequence earlier than predicted, in the period from 2000 to 2003. The initial milestone toward completion was achieved on June 26, 2000, when a draft human sequence was first announced by President Bill Clinton and Prime Minister Tony Blair at the White House and 10 Downing Street in London (97). Articles revealing the HGP’s “public” assembly of a genome and that produced by the private company, Celera Genomics, were then published one day apart in Science and Nature in February 2001 (83; 152). This initiated a cascade of publications on sequences for chromosomes that culminated in April 2003, marking the 50th anniversary of the canonical April 25th, 1953 publication of the double helical structure of DNA by James Watson and Francis Crick (30; 77; 98; 156).

The 2001 publications represented two scientific strategies and two modes of managing data. The publicly funded HGP sequence—authored by a group of laboratories from the U.S., the U.K., France, Japan, Germany, and China under the banner “International Human Genome Sequencing Consortium” (IHGSC)—represented the work of a global coalition. It entailed extensive and systematic data sharing, characterized perhaps most distinctly by the daily release of publicly funded DNA sequences into the public domain. Data from the HGP came primarily from high-throughput sequencing laboratories in the six partner countries, with leadership on assembly centered at the University of California, Santa Cruz and databases at the U.S. National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI). This public “hierarchical” sequence depended on genetic and physical maps, which situated DNA markers based on their chromosomal locations, and sequencing coordinated by chromosomal region, assigned amongst the global partners (23; 148; 153). Science, in contrast, published a sequence assembled by “shotgun” methods, characterized by a vertical organization that integrated data generation and computation within a single laboratory and also incorporated data from public sources (151; 154).

Access to the data was open and free from the HGP, and had restrictions at Celera. Celera’s data were available to subscribers for a fee, free for noncommercial use in packets up to 1 megabase (Mb), or by access agreement with the company (158). Divergent access policies were thus present even during the gestational phase of the human reference sequence, and controversy over access was intense (2; 65; 78; 95; 106). It is not an exaggeration to say that debates about sharing have been woven into the fabric of genomics itself, an inextricable part of the new field from its very beginning. In 2003, a National Research Council (NRC) committee was appointed to make recommendations about responsibilities accompanying publication, partly in reaction to the HGP-Celera race. Chaired by Nobel Laureate Tom Cech, this committee laid out the Uniform Principle for Sharing Integral Data and Materials Expeditiously (UPSIDE), and various other committees and reports on sharing have followed since (25; 38; 9799). The strongest embodiment of the HGP’s open science ethos, however, was not sharing sequences at publication, but rather long before, even as the data were being generated.

Go to:

Emergence of the Bermuda Principles for Prepublication Data Sharing

By early 1996, the HGP had been proceeding for six years. A workable human genetic linkage map was available, and physical maps of cloned DNA (and the bacterial and yeast cell lines that housed them) were available for most regions of the chromosomes (76; 85; 94; 107; 128). Reference sequences for several yeast chromosomes were published, and a complete genomic sequence was in the offing, with a network of European laboratories leading the way (5; 57). Progress toward a complete 19,000-gene, 97-Mb sequence of C. elegans bred confidence that it would soon be completed, as indeed it was in 1998 (12).

The initial goal of the HGP, however, a reference sequence of the human genome, was more distant, although it was visible over the horizon. The Wellcome Trust—the British biomedical research charity that by 2003 funded one-third of the public human genome project—had already upped its funding substantially, focusing on human sequencing and basing its efforts at the new Sanger Centre (founded in 1993) near the University of Cambridge (41; 51). The U.S. National Center for Human Genome Research (NCHGR, which would become NHGRI in 1997) announced in 1995 a competition for high-throughput sequencing parts of the human genome, and reviewed center-grant proposals later that year (85). The DOE, moreover, was constructing a Joint Genome Institute (JGI) to house its sequencing operations, which would open in 1997 (81). By early 1996, the HGP, with contributions from five national partners (China would join in 1999), was transitioning from mapping to large-scale human genomic sequencing (120). The stage was set for a hard push for the human reference sequence.

It was quickly becoming apparent that detailed coordination was required. Even after earlier mapping efforts, sequencing—of both the human and other large model genomes—would generate more data than any other discrete effort in the history of biology (103). This was predicted in the HGP’s founding reports, and was becoming a tangible reality by 1996. More specifically, the pragmatic tasks at hand included: (a) deciding which chromosomal regions to assign to each team, lest several centers sequence the same juicy bits and waste time and effort; (b) specifying targets for data quality, as the sequencing goals were also set; and (c) verifying sequencing outputs from individual laboratories, to measure progress and to help justify the Project’s large public investment.

As the NIH large-scale sequencing grants were about to commence, Michael Morgan at the Wellcome Trust and Francis Collins, then NCHGR director, decided to hold an organizational meeting amongst those funded (or expected to be funded) to plan the launch phase of large-scale sequencing (57). They looked for a relatively neutral venue, one that would not be perceived as U.S.-dominated (138). They settled on Bermuda, which was accessible from all the HGP centers and appropriately located in the mid-Atlantic between Europe and North America. The meeting took at the “Pink Palace,” the Princess Hotel in Bermuda, February 26th through 28th, 1996.

The weather was dreary, but the conference was not (90). The new NIH sequencing centers had been announced, but had yet to receive their funding. The attendees—including the Hinxton group funded by Wellcome, the leaders of the DOE’s efforts, working mostly on mapping and technology development, the leaders of the NIH centers, and mappers, technology developers, and administrators from Japan, Germany and France—were eager to get started, yet anxious about the long journey ahead. They were, frankly, not sure a human reference genome was yet attainable (19; 27; 138).

In addition to the practical needs to allocate work and measure progress toward a sequence, two intimately related issues loomed as the 1996 meeting was planned: patenting and data-sharing. The patent issue surfaced when a scientist at the National Institute for Neurological Disorders and Stroke, J. Craig Venter, filed patent applications on short segments of DNA that allowed for unique identification of sequences coding for genes in the human brain (1; 22; 123). “Expressed sequence tags” (ESTs) could be used to fish genes out of the genome by identifying unique sequences that are translated into protein. Genentech patent lawyer Max Hensley advised Reid Adler, the lawyer at the NIH’s Office of Technology Transfer, that the NIH should file for patents on Venter’s ESTs, to ensure they could be licensed for further development (35). Such patents could “protect” the DNA fragments, and might be important to preserve private investments in characterizing the corresponding full-length genes and related proteins. These incentives might be important for developing drugs and biologics as treatments, as well as genetic tests for neurological diseases. Controversy erupted in July of 1991, when at a Senate meeting Venter made public that the NIH had filed its first application and was planning others, making claims on more ESTs and the method for obtaining them. The EST method patent was later converted to a Statutory Registration of Invention, effectively preventing anyone from patenting it.

Patenting DNA molecules and methods had become common at universities and biotechnology and pharmaceutical companies, a practice that in the 1980s and 1990s often conflicted with more traditionally minded academic biologists not used to patenting their work (17; 36; 130). One of the foremost patent scholars, Rebecca Eisenberg of the University of Michigan Law School, observed:

The patent system rests on the premise that scientific progress will best be promoted by conferring exclusive rights in new discoveries, while the research scientific community has traditionally proceeded on the opposite assumption that science will advance most rapidly if the community enjoys free access to prior discoveries. (46)

According to some, the NIH patent applications—which claimed both ESTs (gene fragments) and corresponding full-length genes (in the form of complementary DNAs, or cDNAs)—could block downstream research and development requiring the use of many genes in tandem. The NIH Director, Bernadine Healy, supported the patents, though the then-NCHGR Director James Watson, of DNA structure fame, vigorously disagreed with them (69; 155). This became one of several bones of contention between Watson and Healy that culminated in his resignation as head of NCHGR in spring 1992, leaving the door open for Healy to recruit Collins, who assumed the leadership of the NIH’s genome efforts in 1993 (139).

Reflecting contemporary commercialization trends, the debate over gene patents echoed between the public and the private sectors. As the NIH’s applications were pending, Venter left the NIH to direct The Institute for Genomic Research (TIGR), a private nonprofit research institute (35; 131). Some of the patent rights from TIGR would be assigned to a for-profit corporation, Human Genome Sciences (HGS), which itself began sequencing genes and fragments and filing for its own patents while also drawing on TIGR’s output. Meanwhile, another small firm, Incyte, had also become interested in sequencing ESTs and full-length cDNAs. HGS, Incyte, and several other companies were building business models around discovering and sequencing genes and fragments, and patenting parts of the genome likely to contain sequences of keen biological interest and commercial value (104).

Concerns about patent impediments to research, created by thickets of broad EST and cDNA patents on genes of unknown function, haunted debates about genomics and patent policy (47). In 1993, the pharmaceutical giant Merck initiated a partnership with the head of the HGP center at Washington University in St. Louis, the C. elegans expert Robert Waterston (42; 159). The goal of the Merck Gene Index Project was to sequence human ESTs and release them with a minimal delay, usually of only 48 hours, into the public domain, with the logic that “making the EST data freely available should stimulate patentable inventions stemming from subsequent elucidation of the entire sequence, function and utility of each gene” (159). One reason for this policy was a spirit of open science; another was to thwart patents on short DNA fragments by companies like HGS and Incyte; a third was that the Gene Index was funded by a nonprofit unit of Merck, so Merck had to demonstrate it did not have privileged access. Meanwhile at NIH, Harold Varmus soon became NIH Director, appointed by Bill Clinton to replace Healy. Varmus sought expert advice on the NIH’s EST patent applications, which had been initially rejected by the USPTO in 1992 (5; 87). Eisenberg and another legal scholar, California Berkeley’s Robert Merges, drafted a detailed memo for Varmus noting that the NIH’s EST patent strategy made little sense, given that ESTs were primarily research tools (49). Varmus abandoned the applications. This was yet another turn along the tortuous path whereby genomics, and the rules about how and when genomic data should be shared and commercialized, were developing in tandem (71; 72; 79; 80).

The brouhaha over ESTs, coupled to the evolving controversies and worldwide negative press over the patenting of full-length genes like BRCA, colored the Bermuda meeting in 1996 (11; 58). The Bermuda attendees, who hailed from five nations, the European Molecular Biology Laboratory (EMBL) in Heidelberg, and the Brussels-based European Commission (EC), had to contend both with the pragmatic, scientific problems of getting the human genome sequenced on time and accurately, and the more principled problem of how the HGP’s soon-to-be produced deluge of sequence data should be shared, utilized, and commercialized. For the majority of attendees to the 1996 meeting, it was not surprising that developing a clear-cut data-sharing policy was a chief agenda item from the start (57).

The specific historical roots of the daily, online release of HGP-funded DNA sequences—under the policy that came to be known as the “Bermuda Principles”—are complex and contingent. So, too, was the process by which this radical policy, which ran against the norm in most biomedical research of releasing data at publication, was ratified within the HGP and justified as the Project proceeded. In a forthcoming historical article, the authors enumerate and assess these details at length (89). Several points about precedents and the measured agreement that HGP participants reached are relevant here.

The Bermuda Principles filled a policy lacuna, replacing a set of guidelines that had previously applied only to the NIH and DOE. The HGP’s founding reports, produced in 1988 by the NRC and the congressional Office of Technology Assessment (OTA), were notoriously vague about data sharing, noting only that data and materials must be shared rapidly for coordination and quality control, and admitting that this might create conflicts with commercialization (21; 101; 103). The 1990 joint plan for the NIH and DOE echoed this message, but similarly failed to provide a timeline for sharing amongst collaborators (23; 46; 62; 114; 133; 135; 148; 155). By the early-1990s, mapping and sequencing technology development were proceeding impressively around the globe, not just in the U.S. but also in Britain, France, Japan, and several other nations (21; 101; 103). Yet despite the founding of the Genome Data Base (an electronic medium for sharing mapping data) in 1993, international coordination of human mapping was disorganized, nucleated around annual meetings and single chromosomes, and complicated by competition and secrecy (18). A 1992 NIH and DOE policy required deposit of data from mapping (to GDB) and sequencing (to GenBank) within six months of generation (18). Aside from the 48-hour sharing policy of the Merck Gene Index Project, this was probably the most specific policy precedent for data sharing in genomics. Standard practice for GenBank, for instance, was to share unpublished sequences concurrently with accompanying journal articles, but not before (9; 133; 137).

The Bermuda Principles extended the 1992 policy, strongly recommending the daily sharing of all HGP-funded DNA sequences of 1 kilobase (Kb) or longer to GenBank, the databank of the EMBL, or the DNA Databank of Japan (DDBJ) (57). For the first time, the HGP had a Project-wide policy, designed to unite all the international contributors and not just those funded by the NIH or the DOE. This facilitated the development of the Human Sequencing and Mapping Index, a website linking laboratory webpages to GenBank and allowing globally distributed centers to “declare” regions for sequencing and avoid duplication (16). Especially in the U.S., moreover, the Principles helped to enforce quality standards (set in 1996 at 99.99% sequence accuracy) and output commitments, providing a means of checking whether the heavily funded centers were delivering on sequencing promises. In 2001, Elliot Marshall called the Principles “community spirit, with teeth,” and for good reason: those centers not producing their sequences, or failing to meet quality standards, could lose their competitive funding and potentially their places within the prestigious HGP (86).

Daily sharing, however, was not necessarily required for these tasks, and the clearest historical test case for this policy was perhaps unlikely source: the C. elegans research community. John Sulston and Robert Waterston, amongst many others in this network, had adopted daily (or as close to daily as possible) sharing in the mapping and early sequencing of the worm genome, beginning in the 1970s and facilitated, by the 1980s, by the rise of networked computing (9; 136; 138). By 1995, Sulston and Waterston (funded by the NIH, the Wellcome, and the U.K. Medical Research Council) had done more large-scale sequencing (first in C. elegans, and later in early human efforts) than anyone, and had become HGP leaders via connections to Collins, Watson, Maynard Olson, and other power players in the field (85; 160). At the Bermuda meeting, Sulston and Waterston co-chaired the final session, on data release policies, which led to the first draft of the Bermuda Principles (57). “It was agreed that all human genomic sequence information,” the statement from that session and a later NCHGR press release read, “generated by centres for large-scale human sequencing, should be freely available and in the public domain in order to encourage further research and development and to maximise its benefit to society” (96).

Daily release, however, was for some a bitter pill to swallow. The adoption of this policy in the HGP was driven by Waterston and Sulston as the C. elegans sequencing leaders, and strongly supported by leaders of the two foremost funders: the NIH and the Wellcome Trust. Debates raged about whether daily release would hurt HGP data quality (3; 15; 128). Perhaps the most significant hurdle for daily data release, however, was its apparent incompatibility with commercialization. The U.S. Bayh-Dole Act allowed universities and businesses the first right to title on inventions funded by government grants, and a German policy allowed HGP investigators three months’ time, before data release, to apply for patents drawing on HGP sequences, including patents on genes (36; 147). In the U.S., daily release did not prevent patents on genes of known function, but it did block patents of the kind HGS and Incyte were seeking. In Europe and other patent jurisdictions, the Bermuda Principles indeed endangered gene patents, because unlike in the U.S., no lag time existed between releases of data into the public domain and when investigators could still apply for patents drawing upon it. (Until the America Invents Act somewhat changed the rules, the grace period in the U.S. was one year (54).) Heidi Williams has since argued convincingly that the patent and database restrictions placed on Celera Genomics’ sequence data led to a 20–30 percent reduction in downstream innovation and development in diagnostics, relative to genes sequenced first by the open and public HGP (158). But in 1996, as today, the HGP policy of daily sharing was just one stance among many, in a spectrum of uncertainty about how best to foster progress in genomics and its applications.

By early 1998, the Bermuda Principles were the official data sharing polices of the HGP, a condition of the large-scale sequencing grants in all participating countries. In 1996, the NIH had put out a statement that while it opposed patents on “raw human genomic DNA sequence, in the absence of additional demonstrated biological information,” because of the Bayh-Dole Act it could only discourage this practice—as it surely did, with the suggestion of removing funding if such patents were filed—but not prevent it outright (96). A series of warning letters from the NIH, the DOE, and the Wellcome Trust helped shift the conflicting policy in Germany and another incompatible policy in Japan, moves which prevented a sharing delay from also materializing in France (12; 28; 29). American and British leaders were flexing their muscles here. The threat of removing other national contributors from the HGP, if they did not agree to daily sharing, held real weight, as NIH and the Wellcome could effectively sequence the genome without collaborators.

The Bermuda Principles continued to exert considerable influence, especially as scientists and administrators amended them to reflect changes in data and practice (21). Two more meetings were convened in Bermuda, in February of 1997 and 1998. Like the 1996 summit, these were intended to address evolving scientific issues in the HGP, but also to revisit data sharing polices and enact policy shifts as attendees saw fit. By the 1998 meeting, the Principles were revised to include a new daily release trigger, of 2,000 (2 Kb) rather than 1,000 (1 Kb) base-pair stretches of DNA, and extended first to mouse and later to all model organism sequences produced under the aegis of the HGP (61; 157). In 2000, the NHGRI extended the Principles again to include several new kinds of data, including those generated during the finishing phase of the draft sequence and through whole genome “shotgun” sequencing (97).

Go to:

Part 2 of 3:

Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance

After Bermuda: Broadening Open Science

The Bermuda Principles applied to a small group of laboratories, pulling together to produce a human reference sequence. Even as progress toward this goal accelerated in the late-1990s, it became apparent that studying variations—that is, individuals’ deviations from the reference sequence—and linking genomic data to other kinds of data was essential to making sequences meaningful: in terms of health outcomes, environmental exposures, genealogy, and family history. By 2000, it was clear that linkage amongst databases, through systematic data sharing, would drive the science and its applications.

The data and users were far more heterogeneous than for high-throughput sequencing, yet sharing was still paramount. As microarray technology became pervasive after 1996, identifying and cataloguing single-nucleotide variants (Single Nucleotide Polymorphisms, or SNPs) became possible. Fears of hoarding and patent thickets led to a novel effort to prevent SNP patents, the SNP Consortium (73; 140). This was a public-private partnership that applied for patents on SNPs in profusion, to establish formal legal and scientific priority. It then abandoned the applications, releasing data to the public domain unencumbered by patent rights. Because SNPs were tools in agriculture, pharmaceuticals, biotechnology, and elsewhere, industry supported the Consortium. Academics were involved because they wanted to use the tools, but commercial manufacturers like Affymetrix and Illumina manufactured most SNP chips themselves. These manufacturers, while certainly believing in and holding many patents, nonetheless foresaw a hopelessly dense patent thicket based on DNA sequences and hoped to avoid it. Microarrays routinely included hundreds of thousands of DNA fragments as probes. If SNPs and ESTs were patented, would it not take hundreds of licenses to build a chip? Would anyone be able to afford it? Access to data, including the ability to use DNA molecules that included SNPs, was essential.

Go to:

Ft. Lauderdale and Beyond

In 2003, as the HGP was nearing completion, the Wellcome Trust convened a meeting in Ft. Lauderdale to define rules for biological infrastructure projects. The meeting produced the first of several statements supporting the open science ethos for “community resource projects,” the likes of which dominate genomics today (98). The meeting focused on the value of prepublication data sharing, but relaxed some of the Bermuda Principles’ provisions. Like at the 1998 Bermuda meeting, Ft. Lauderdale attendees stipulated daily sharing for sequences longer than 2 Kb for any organism. This was very much in the spirit of Bermuda, but their statement also mandated that data generators be credited when data were used. Finally, it acknowledged that hypothesis-driven research did not have the same sharing obligations as those focused on producing “community resources,” like sequencing consortiums, the Merck Gene Index Project, or the SNP Consortium. This recognition, of a need for scientific attribution and credit, grew from the diversity of data-generators and users. Arias, Pham-Kanter and Campbell have beautifully summarized the evolution of data-sharing policies that evolved from the Bermuda Principles (10).

Open science continued to infuse the many projects developed at NHGRI, or in which NHGRI was a partner. These were highly diverse, but many were indeed “community resource” efforts. The HapMap aimed to identify common genomic variations and characterize them from global populations, enabling elucidation of co-inheritance of DNA markers in detail. HapMap policies vigorously discouraged patenting, to a degree that Rebecca Eisenberg questioned their wisdom and enforceability (32; 48; 52). The 1000 Genomes project expanded on the HapMap project, including sequences from a larger sample of populations and intended to identify less common genomic variants (26). Another contemporaneous project, ENCODE, was intended to understand the function of DNA elements, including regulatory sequences, enhancers, promoters and operators (77). Since these elements were by definition functional, however, they might also have practical utility and be proper subjects for patenting. The 2003 ENCODE pilot policy explicitly acknowledged this, another deviation from Bermuda driven by the nature of the work.

As new genomics tools enabled genome-wide studies, the Genetic Association Information Network (GAIN) was formed (50). GAIN built on tools made possible by the HapMap and SNP chips, probing hundreds of thousands of common variants in large population studies to identify genomic regions associated with diseases and specific traits. It was intended to ensure rigor, facilitate collaboration, forge academic-industry partnerships, encourage sharing, promote publication, and prevent premature intellectual property claims. The Foundation for the NIH, a nonprofit, nongovernment organization, led the effort and invited applications for genome-wide studies. An elaborate peer review process assessed the technical merits of the proposals, but also ensured that study design, computational methods, and data sharing complied with guidelines. GAIN had a set of principles, the first of which was to make results “immediately available for research use by any interested and qualified investigator or organization.” Yet because the data were associated with specific individuals, this was qualified “within the limits of providing appropriate protection of research participants.” The GAIN principles imposed a duty on users to respect the confidentiality of study participants, and also to ensure that uses fell within the terms of their informed consent. Another principle was to acknowledge data sources for any re-use. Finally, the original contributors would have a nine-month period during which they alone could submit abstracts or papers based on their data. This “embargo” period was a marked divergence from Bermuda, driven again by the needs of a different community of producers and users. Labs generating the data were disparate in size, type, geography, and purpose, and those differences needed to be accommodated.

Database structures also had to reflect the new realities. The NCBI’s Database of Genotypes and Phenotypes (dbGaP) was established in 2006 (50). Instead of one unitary bank (i.e., GenBank) for sequences, dbGaP had tiers: a public layer freely available to all, and a large dataset that might contain private or identifiable information, which in turn required a gatekeeper, a Data Access Committee, to make sure users had good reasons for access. They needed to agree to protect privacy and confidentiality and follow rigorous data security practices. Moreover, many studies needed to be approved by an Institutional Review Board to ensure compliance ethics rules. The purpose of dbGaP was to enable research through broad access, but the involvement of possibly identifiable people who had not agreed to post their data on the Internet necessitated new layers of review. NIH’s 2007 policy on Genome-Wide Association Studies (GWAS) mandated deposit of data into dbGaP for NIH-funded studies (99). This was open science, but the database had rules that were not as simple as free and open access.

NIH’s policy for data access was formally modified in 2014 (33; 100). The 2007 policy was only for human GWAS; in 2014 it was generalized to all NIH institutes and all organisms. It permitted a “validation” embargo of the data for several months, to ensure quality, and dbGaP could hold data in a protected private status for 6 months. After six months, however, human data would be freely available, and on animals it would be freely available upon publication. The new policy also made it easier for NIH to continue to tweak the policy in parts in the future, without formal review of the policy in its entirety. Finally, it stipulated that prospectively, studies should get broad consent for deposit and data use, and IRB review for uses of data contributed from past studies, to ensure compliance with the informed consent at the time the data were gathered.

Prepublication data sharing was soon broadened beyond genomics, first to proteomics at a meeting in Amsterdam in 2008, and then to other datasets at a Toronto workshop in 2009 (124; 125; 142). The Toronto statement applied to all datasets that were large-scale, broadly useful, infrastructural (i.e., intended to produce reference datasets), and had community buy-in. This meant respecting the interests of data-generators to publish “first global analyses of their data set[s],” citing the sources of data, and contacting the data-generators if re-uses might “scoop” them.

These iterations and adaptations of data-sharing norms and policies were efforts to keep data open while also respecting the rights and interests of those contributing data (participants) and those generating the data (researchers). Different contexts and uses led to databases with layers of process and rules. Some projects, such as the Personal Genome Project and Open Humans, had an informed consent process that enabled sharing of one’s genome on the Internet with no restrictions, but these were for “information altruists” who did not fear misuse of their own data, or who at least believed the benefits of open access were greater than the risks (7; 8; 25; 108; 127). Most data and studies, however, came from projects that entailed narrower informed consent, which generally promised participants efforts to keep the data secure and prevent identification of individual research participants.

Go to:

Building a Medical Information Commons: Theory

The Bermuda Principles and its successor statements were practical efforts to set rules and build infrastructure for increasingly diverse research uses. Generally, these rules were crafted while projects were being designed and carried out, and thus generated for and by a community—later including both research and clinical care users—hoping to draw on a “Medical Information Commons.” A 2011 report from the NRC and Institute of Medicine, Toward Precision Medicine, laid out a vision of layers of data oriented around individuals, to enable an evolving health care and public health system based in biomedical research (19; 32; 80; 102; 135; 137; 138). The central recommendations centered on building informational infrastructure, calling for an:

‘Information Commons’ in which data on large populations of patients become broadly available for research use and a ‘Knowledge Network’ that adds value to these data by highlighting their inter-connectedness and integrating them with evolving knowledge of fundamental biological processes. (19; 32; 80; 102; 135; 137; 138)

This language explicitly invoked a “commons,” and thus Garret Hardin’s classic essay from 1968 (63; 64). Hardin’s Presidential address to the American Association for the Advancement of Science, on which this essay was based, addressed problems that had no technical solution, but required collective action to solve. Hardin was mainly concerned with militarization and rising population, but the iconic example he used was drawn from an 1833 tragedy of the commons described by William Forster Lloyd. In a pasture open to many herdsmen, each herdsman would want to have his cattle feed as much as possible, but if each did so, overgrazing would deplete the pasture. Hardin pointed to the inadequacy of the “invisible hand” of a market to solve such problems, which required coercion in order to solve, preferably according to mutually agreed rules. His examples were laws (against bank robbery), taxes, and other exercises of state power. This left the solution options as binary: the market (no rules, free choices, and creation of property rights) or The Leviathan (the state).

Elinor Ostrom and others have also studied the tragedy of the commons. Yet they have expanded the potential solution set, by observing that real-world communities often craft rules preventing the depletion of resources or orchestrating services requiring collective action. Ostrom’s earlier work focused on natural resource depletion (114; 116). She examined how some communities prevented over-fishing, while others tragically depleted fish stocks for lack of a viable commons. She also studied collective interests in urban policing and managing water resources. In her later work, Ostrom and others extended thinking to a knowledge commons, a concept closely aligned to a medical information commons (70). Theoretically, the biggest difference between a natural resource commons and a knowledge (or data) commons is that natural resources, like biorepoistories of samples, can be depleted. Data and knowledge, in contrast, are not depleted no matter how many people use them.

Indeed, data and knowledge achieve network efficiencies that expand dramatically the more users there are, according to Metcalf’s Law (150). Ostrom and her collaborators devoted much of their attention to institutional arrangements that enabled a viable knowledge commons to form, and how to govern it (112; 113; 115; 119). Trust and reciprocity are central themes; and Institutional Analysis and Development (IAD) was the framework she proposed for addressing problems of collective action. In 2010, she concluded that, “rules related to the production of generally accessible data” include:

  1. Who must deposit their data?

  2. How soon after production and authentication of data do researchers have to deposit the data?

  3. How long should the embargo last?

  4. How should conformance to the rules be monitored?

  5. How many researchers are involved in producing and analyzing the particular kind of data?

  6. Should an infraction be made public in order to tarnish the reputation of the infringer? (115)

These questions encompass identifying who is (and is not) a member of the community of contributors and users, how to formulate contributions and use rules, and governance and enforcement procedures. The successive statements about data sharing, beginning with the Bermuda Rules in 1996, were all implicit or explicit responses to questions raised by Ostrom. Contreras’ work on genomics draws directly on this theory; and Madison, Frischmann, and Strandburg are extending theories of the commons into data and knowledge domains (34; 84).

Commons theory is not the only framework for open science, however. The Global Alliance for Genomics and Health appeals to Article 27 of the Universal Declaration of Human Rights, the right “to share in scientific advancement and its benefits” (146). The value of sharing data globally is reiterated in a series of UNESCO declarations, the recommendations of the Council of Europe, and guidelines from the Organization for Economic Cooperation and Development (37; 99; 109111; 143145). Much of bioethics has centered on protecting research participants from risk, but the Global Alliance also focuses on the right to benefit, including through data sharing as a means to advance science and improve medicine: the very purpose for which many people contribute samples and information to medical research. Yet the rights to benefit, and implementing it in the real world, are different things; moving from aspiration to realization is a constant struggle. Future statements will no doubt follow as international data sharing continues to take shape.

Go to:

Emergence of the Global Alliance for Genomics and Health

While the need to link data of many types, housed in many parts of the world, is obvious, the structure of a global commons of genomic and other data faces many practical problems. Some are technical: a need for application interfaces, standard formats, and interoperable data systems. But the legal and social challenges are even more formidable.

In January 2013, over fifty individuals from eight countries—roughly comparable to the first Bermuda meeting in size and national representation—met in New York City to develop some standards and articulate the need for infrastructure. The results were summarized in a June 2013 white paper describing a need for collective international action, and to avoid “a hodge-podge of balkanized systems—as developed in the U.S. for electronic medical records—a system that inhibits learning and improving health care” (55). Attendees proposed a global alliance, which came to be known as the Global Alliance for Genomics and Health (GA4GH). By 2016, the Global Alliance comprised 800 people, from 400 organizations, in 70 countries (53). This growth alone indicates how much more complicated it will be to achieve data-sharing among hundreds of diverse stakeholders, as compared to the organizational challenge when HGP leaders met in Bermuda, involving fewer than 100 people in (at the time) only five countries.

Beacon and Matchmaker

As the Global Alliance was being formed, efforts converged on three pilot projects, and in 2016 a fourth was added. The three initial projects were Beacon, Matchmaker, and BRCA Challenge. The Beacon project queried a network of genomic databases to see if they harbored information about particular genomic variants. Those who came upon a variant in research or clinical practice could send a single query, and find out which participating databases had relevant information, anywhere in the world. As of summer 2016, 25 institutions and 250 datasets were participating in Beacon (53). The Matchmaker Exchange was another data-brokering effort for those studying rare disorders, enabling users to find phenotype and genotype data pertinent to a genomic-clinical profile, again allowing a query of participating databases. The October 2015 issue of Human Mutation featured 16 articles from the Matchmaker demonstration project (118). Both Beacon and Matchmaker point to the places where data can be found, while the data themselves remain where they are, and users can seek access.

Case study: Sharing Data about BRCA Variants

The BRCA Challenge, in contrast to Beacon and Matchmaker, entailed a new database for variants in two of the most studied and clinically significant human genes, BRCA1 and BRCA2, and to pool publicly available data. The intent was to build out from these genes, establishing a precedent for expansion into other genes. The resulting database, BRCA Exchange, has three tiers. The top tier is fully public and lists variants interpreted by the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA), an international consortium founded at a May 2009 meeting in Amsterdam (135). ENIGMA is the expert vetting committee for BRCA Exchange variant interpretation. The next layer is a “research” dataset with links to the evidence base, including conflicting interpretations and un-vetted reports, with pointers to other databases containing further information. The third layer, still under construction as of 2016, will contain case-level data that might be linked to identifiable individuals, thus requiring higher levels of security and a gatekeeper to ensure compliance with informed consent and prevent misuse or unauthorized re-identification. BRCA Exchange shares data extensively with ClinVar, LOVD, and other variant databases.

The need for a BRCA Challenge was an anomalous artifact of the history of testing for inherited risk of breast and ovarian cancer. Mary-Claire King’s team found linkage to a putative risk gene in families in 1990 (62). This set off an intense race to identify, clone and sequence the gene likely associated with cancer in high-risk families (6; 10; 40). BRCA1 was identified in 1994 by a team led by Mark Skolnick at the University of Utah, and also associated with the genomic startup Myriad Genetics (58; 93). Linkage to chromosome 13 was found in 1994 by a team in Britain, and BRCA2 was cloned and sequenced in 1995 (161; 162). Two decades later, these remain the two genes most commonly mutated in families with inherited risk of breast and ovarian cancer, although BRCA mutations are found in other cancers, and mutations in another two dozen genes are also associated with carcinomas of the ovaries and breasts (but at much lower frequency).

Both genes were patented, and the story is complicated (24; 36; 58). The first BRCA1 patent was granted to OncorMed in summer 1997, and OncorMed sued Myriad. Myriad countersued the day after it got its first patent that December. The companies settled out of court, with OncorMed agreeing to exit the BRCAtesting market and assign its patents to Myriad. There were other BRCA patents, including one granted to the U.K. team that first published on BRCA2, but that sequence was chimeric, and the Myriad team filed a patent application on BRCA2 just a day before the U.K. team published in Nature. Myriad cleared the U.S. market of competitors for commercial BRCA testing, first by sending notification letters to laboratories offering the test, and even suing one such laboratory at the University of Pennsylvania. The laboratory quickly settled and withdrew from testing, beyond for patients in the University of Pennsylvania healthcare system.

From 1998 through 2013 Myriad had a service monopoly on American BRCA testing, yet this did not work anywhere outside the U.S. European patents were opposed and narrowed (88). In Australia, Myriad was forced to license to Genetic Technologies, Ltd. (GTG) to settle a patent infringement suit over use of intervening sequences, and GTG permitted labs in the Australian regional health system to offer testing as a “gift to the people of Australia.” GTG threatened to revoke this gift, yet a firestorm of criticism led to the replacement of the chief executive and a flurry of Senate activity and the “gift” was restored (149). In Canada, Ontario’s Prime and Health Ministers refused to recognize Myriad’s rights, and Myriad never sued in Canada, so most provinces continued to offer BRCA testing (58). In Great Britain, the National Health System (NHS) offered BRCA testing regionally, and largely ignored Myriad’s patents (67; 117). The patent monopoly only held in the U.S., but that was sufficient to give Myriad a dominant position until 2013, by which point it had administered over a million BRCA tests.

The monopoly ended at the U.S. Supreme Court, when Myriad lost an epic patent battle against the American Civil Liberties Union (132). The following month, August 2013, Myriad filed the first of several lawsuits against seven competitors, all of which ended when the Court of Appeals for the Federal Circuit invalidated Myriad’s patents in December of 2014. Myriad had dismissed the last of its suits by February 2015. Several new laboratories entered the BRCA testing market on June 13, 2013, the day of the Supreme Court decision, and several more entered in the following months.

Myriad shared its data on BRCA genetic variants until November 2004, and allowed selective access through its proprietary database to academic collaborators through 2006 (11). But it stopped depositing data at the locus-specific Breast Cancer Information Core at NHGRI, the largest repository of such data. It has since taken further steps to protect its trade secret database through click-through agreements on its website, precluding users from sharing data with third parties and explicitly claiming trade secrecy (data on file with author, R C-D). A decade of testing by the company has revealed thousands of BRCA variants, but Myriad alone knows what these are. While it publishes the names of its interpretive methods, Myriad neither shares them sufficiently for replication nor provides the underlying data, publishing in journals that do not require such disclosure (45; 118).

The BRCA Challenge of GA4GH was intended to address the anomalous situation occasioned by Myriad, which had both a patent monopoly for over a decade and decided to treat data as trade secrets. In order to interpret BRCA variants, the rest of the world had to catch up, since Myriad did not make its data available and did not participate in inter-lab comparisons of variants. The number of BRCA tests administered worldwide is comparable to Myriad’s experience. Yet the company itself had limited access to variants in Africa, Asia, Latin America, and other places where its pricing precludes the use of its tests, different founder mutations have taken root, and rare alleles will continue to crop up. In short, the data have not been shared, stored, curated, or interpreted in ways that can be used for clinical decisions.

BRCA Challenge was announced as ClinVar and ClinGen were getting started. The BRCA Exchange regularly shares data with ClinVar, and ClinVar regularly does variant comparisons with LOVD, ARUP’s BRCA database, and is adding more databases and data as those become available Several of the laboratories that started BRCA testing—Ambry, Invitae, GeneDx, Illumina, and others—promoted the “open science” framework of ClinGen and ClinVar and contributed their data on variants, including different degrees of clinical phenotype information. Another response to Myriad’s data-hoarding policy was an effort, the Sharing Clinical Results Project (SCRP), to secure the laboratory reports that Myriad sent back to ordering laboratories and physicians, or simply to get the same information from individual women who knew their BRCA status through the Genetic Alliance’s “Free the Data” project (14).

The two largest diagnostic firms in the U.S., Quest and LabCorp, started offering BRCA testing in 2013. In May of 2015, Quest announced it was contributing its data to the Universal Mutation Database (UMD) in Paris, where it would be well curated and interpreted (17; 112). Quest proposed that commercial labs pay for access to the UMD data and contribute to it, while researchers would have free access. LabCorp joined that effort, which came to be known as BRCA Share®. The UMD and ClinVar are now discussing how, and how much, data will flow into freely available databases. ARUP laboratories also established a BRCAdatabase, which is likewise contributing to ClinVar. This spectrum of data-sharing for BRCA variants spans from purely proprietary data-hoarding by Myriad to free and open sharing embodied by ClinVar, SCRP, and Free the Data, with intermediate models of research access and paid commercial storage and curation through BRCA Share®.

Part 3 of 3:

Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance

Cancer Gene Trust

The Global Alliance’s newest demonstration project is the Cancer Gene Trust, which will first focus on sharing data about somatic genomic cancer variants (53). It is starting with somatic genes in part for simplicity, until privacy issues with potentially identifiable germline data are resolved. One distinctive feature is packets of software that can move amongst linked servers, leaving the large repositories of data in place but allowing analyses and results to be returned to the user. The data stay in place; the analytical software migrates.

The Global Alliance’s projects entail a substantial technical component, and much of the work has centered on developing application programming interfaces (APIs). One group addresses data security, a highly technical domain. Many of the challenges in building a Global Alliance, however, center on ethical issues, law, and policy, the domains of the Regulatory and Ethics Working Group. Indeed, the legal and social impediments to global data sharing are amongst the most challenging obstacles to a medical information commons. One of the early efforts of the working group was to articulate a “Framework for the Responsible Sharing of Genomic Data,” a high-level agreement that has been translated into 13 languages (56). This framework emphasized the need to preserve the scientific value of data where possible, rather than anonymizing it and diminishing the ability to make inferences amongst diverse, individually oriented data types. Major issues include: (a) a need to protect privacy and confidentiality, and to honor informed consent agreements for data generated under diverse national laws; (b) data security; (c) the accommodation of clinical and research uses; and (d) a much more complicated and diverse set of commercial firms involved in genomic research, and incorporating the attendant data into clinical care. We review these briefly in turn.

Go to:

International Data Sharing Must Abide by National Laws

Privacy and return of results to participants have been the subjects of many articles in Annual Review of Genomics and Human Genetics (4; 59; 82; 91). One of the key findings from legal scholarship is that laws in different nations pertain to activities essential to data sharing. Branum and Wolf recently reviewed the international law on return of results from genomic analysis (20). Another relevant body of law concerns privacy, confidentiality, and informed consent. Many nations have passed laws encouraging the engagement of local researchers with research done in their countries, both to foster economic development and prevent “biocolonialism” and “helicopter research,” wherein foreign researchers extract value but leave little, violating the notion of reciprocity.

A symposium and two recent issues of the Journal of Law, Medicine and Ethics resulted from a massive effort, led by Mark Rothstein and Bartha Knoppers, to review laws pertaining to sharing of samples and data (126; 127). Forty authors surveyed the law in twenty countries. The focus was on biobanks, but included data sharing. Many countries have laws governing data security, privacy and confidentiality. Many require governmental approval and/or sanction from a local ethics review board to export genetic data. Dove notes that a full international harmonization of laws is unlikely, recommending, “foundational responsible data sharing principles in an overarching governance framework” (43). Thorogood and Zawati point out that while incompatible national laws can be impediments to sharing, there is also virtue in pluralism of approaches across nations and cultures (141).

National laws must be respected, and yet they are complicated, and require national, regional, and sometimes local legal expertise to identify and interpret. Rothstein and Knoppers conclude:

[R]elevant laws differ widely among countries engaged in biobank-enabled research in terms of substance, procedure, and underlying public policies. The lack of international regulatory harmonization has been shown to impede data sharing for translational research in genomics and related fields. The daunting task is to identify and characterize the biobank structure and applicable standards in each country and then to devise possible ways to harmonize policies and laws to enable international biobank research while still giving effect to essential privacy protections. (126)

Establishing the infrastructure for international data sharing—to reap the benefits of genomic variants around the globe—confronts drastically greater complexity than did organizing the sequences of mapped DNA segments from anonymized samples, the starting material for the human reference sequence. It is a long voyage from Bermuda.

Scholars are working assiduously to navigate these tumultuous waters. Vanderbilt’s Center for Genetic Privacy and Identity in Community Settings, led by Ellen Wright Clayton and Bradley Malin, is a Center of Excellence for Ethical Legal and Social Implications (ELSI) Research (134). Susan Wolf, Ellen Wright Clayton, and Frances Lawrenz are leading LawSeq, a prodigious accumulation of talent turning its attention to the legal foundations of translating genomics into clinical applications, focusing on U.S. federal law (31). Finally, Amy McGuire and Robert Cook-Deegan are co-directing a grant on “Building the Medical Information Commons,” also centered on American efforts (13).

Go to:

Clinical Laboratories Generate as Much (or More) Information about Genomic Variants as do Research Laboratories

Since Bermuda, the flow of human genome data has shifted decisively from publicly funded research to commercial laboratory testing, to help individuals make better-informed decisions about medical care, ancestry, or other personal issues. The uses of genomic variation data are likewise embedded in both research and clinical care. The hundreds of databases that nucleated around particular genes (e.g., the CF Transporter Receptor, or the Huntington’s locus) or medical conditions (e.g., epilepsy, Alzheimer’s, or various cancers) generally started from researchers contributing to the literature and depositing their data: sometimes in locus-specific databases, or sometimes in more general databases like the Human Gene Mutation Database in Cardiff, the Universal Mutation Database in Paris, the Leiden Open Variation Database in Belgium, or the many others maintained by the National Center for Biotechnology Information at the National Library of Medicine (e.g., GenBank, dbGaP, RefSeq, OMIM, or ClinVar). These now contain evidence used to make clinical decisions, but (with the exception of ClinVar) were generally not intended for that purpose. The information they contain needs to be validated before clinical use.

Data from commercial testing laboratories is mainly for clinical inference, yet the flow of data into public databases is highly variable, depending in part on business models, history, and how hard it is to get data into them. ClinVar is unique in having been constructed from the beginning for clinical determinations, although it draws heavily from research databases (66). Some of the major contributors to the ClinVar database, through the panel of collaborators that constitute ClinGen, are commercial laboratories(118). This trend towards data flowing primarily from clinical testing laboratories is bound to accelerate as genomic analysis is integrated into healthcare. The shift to “clinical grade” databases, with systematic vetting of variant calls, storage and curation of the data, quality control measures, and participation in proficiency testing, are all part of the nascent infrastructure for clinical genomics.

In most countries, genetic testing is incorporated into laboratory practices in national health systems. Not so in the U.S., where federal regulation of genetic tests has been the subject of debates, books, and several reports from federal advisory committeessince at least 1984 (74; 75; 97; 105; 129). Laboratories in the U.S. are currently regulated by the Clinical Laboratories Improvement Amendments (CLIA) of 1988, through the Centers for Medicare and Medicaid Services. The College of American Pathologists also accredits clinical laboratories. The Food and Drug Administration (FDA) floated draft guidance indicating intent to regulate laboratory-developed genetic tests(LDTs) as medical devices in 2014, proposing to phase-in such regulation over nine years(56). FDA’s entry into regulation caused a kerfuffle, and was opposed by many laboratories, their trade associations, and the Association for Molecular Pathology (121). FDA announced in November 2016 that it would back away from finalizing its guidance (122). But its draft rules on the importance of databases, and the need for independent verification of genomic interpretations used to guide clinical decisions, is nonetheless a clear statement of some of the challenges ahead in building the tools for clinical use, and the need for “regulatory grade” genomic databases(53). Indeed, clinical use requires far more formal oversight and regulation than creating and using data in research. The flow of data from commercial testing laboratories is likely to become the main source of new information about human genomic variation.

Go to:

The Diversity and Importance of Private Firms is Substantially Greater Than When Generating the Human Reference Sequence

Commercial genomics extends well beyond genetic testing. Even within genetic testing, it includes ancestry testing, personal genomic profiling with chips, exome sequencing, or even whole-genome sequencing, and a panoply of tests ranging from single-gene tests to multi-gene panels. Genomic sequencing has been used to study rare disorders and profile somatic mutations in cancers, to guide choices about treatment and prevention. Other firms specialize in integrating genomic data into medical records or providing bioinformatics tools for genomic data analysis. Some are dedicated to sequencing as all or parts of their business models. In a 2014 editorial, Curnutte documented this diversity of commercial software, hardware, and various services (39). A great deal of expertise in informatics and instrument manufacturing resides in the private sector, in companies of wildly different sizes, ages, financial health, and business models. Many are compatible with data sharing; many are not. The evolving pattern of sharing for BRCA1 and BRCA2, and the complexity of the genomic data commons, illustrate the challenges ahead.

Go to:


The Bermuda Principles for daily pre-publication data release set a strong foundation for open science. The Principles set a salutary precedent that enabled more rapid progress towards first assembling a human reference sequence and then interpreting the meaning of genomic variants in humans and other organisms. The initial community of fifty people in fewer than a dozen laboratories has broadened to a global endeavor involving hundreds of laboratories, spanning from pure research to clinical use, ancestry, and other applications. As the community of data-contributors and data-users has broadened, the idea of daily sharing has had to adapt, both to comply with the informed consent of people to whom the data pertain, to ensure data security, and to address considerably diversity in international laws governing privacy, confidentiality, and trans-border flow of genetic samples and data. In addition to these challenges, the commercial interests have both intensified and grown far more diverse. The spirit of open science, however, persists. The Bermuda Principles are the taproot from which a global medical information commons will grow.

An external file that holds a picture, illustration, etc. Object name is nihms908206f1.jpg


Participants at the first Bermuda meeting, February 1996

[1996 Bermuda group photo Jim Watson up front, accessed 25 November 2016, photo credit: Richard Myers, HudsonAlpha]

An external file that holds a picture, illustration, etc. Object name is nihms908206f2.jpg


The white board on which John Sulston scribbled the Bermuda Principles at the 1996 meeting’s final session. Robert Waterston was leading the discussion, and there was an (informal) vote to adopt the statement

[1996 John Sulston first draft Bermuda principles photograph, accessed 25 November 2016, photo credit: Richard Myers.]

An external file that holds a picture, illustration, etc. Object name is nihms908206f3.jpg


Adapted from ( 17), p. 1323.

Variants in BRCA1/2 genes associated with inherited risk of breast, ovarian and other cancers. Different databases have data on different variants, and none of these databases includes many cases from Africa, Latin America, Asia, or other populations expected to have different founder mutations and population frequency of uncommon alleles.

Go to:

Contributor Information

Robert Cook-Deegan, Arizona State University.

Rachel A. Ankeny, University of Adelaide.

Kathryn Maxson Jones, Princeton University.

Go to:

Works Cited (refer to linked PDF above)

Kathryn Maxson Jones is an assistant professor in the Department of History. Her interests span the histories of science, technology, and medicine and related policy issues. Her historical research focuses on 20th-century neuroscience, electronics, and regenerative biology and on intersections between biology and artificial intelligence. Her past historical work has examined the histories of genomics, DNA sequencing, and data-sharing policies. Dr. Maxson Jones also maintains an active research program examining challenges and concerns related to sharing human data within the US BRAIN Initiative.

Dr. Maxson Jones holds a B.S. in Biology from Duke University and an M.A. and Ph.D. in the History of Science from Princeton University. For the academic year 2022-2023, she will be completing a research position in the Center for Medical Ethics and Health Policy at Baylor College of Medicine in Houston, TX, where she helps run a neuroethics study, BRAINshare: Sharing Data in BRAIN Initiative Studies (NIH, R01MH126937). Beginning in August 2023, her teaching in the Department of History will focus on the histories of technology and science.

Robert Cook-Deegan

Professor, School for the Future of Innovation in Society > Full Professors

Senior Global Futures Scientist, Global Futures Scientists and Scholars

Long Bio

Dr. Robert Cook-Deegan is a professor in the School for the Future of Innovation in Society, and with the Consortium for Science, Policy and Outcomes at Arizona State University. He founded and directed the Duke Center for Genome Ethics, Law and Policy (2002-2012). Prior to Duke, he was with the National Academies of Science, Engineering and Medicine (1991-2002); National Center for Human Genome Research (1989-1990); and congressional Office of Technology Assessment (1982-1988). His research interests include science policy, health policy, biomedical research, cancer, and intellectual property. He is the author of “The Gene Wars: Science, Politics, and the Human Genome” and more than 350 other publications.


  • M.D. University of Colorado 1979
  • B.A. Chemistry mcl, Harvard College 1975


Curriculum Vitae (0 bytes)

Current Position

Curriculum Vitae

Robert Mullan Cook-Deegan, MD

Barrett & O’Connor Washington Center Arizona State University
1800 I (Eye) Street, NW Washington, DC 20006 202-446-0395

Professor, School for the Future of Innovation in Society and Consortium for Science, Policy & Outcomes, College of Global Futures, Arizona State University


B.A. Harvard College, chemistry, magna cum laude, 1975 M.D. University of Colorado, 1979

Academic Positions

Professor, School for the Future of Innovation in Society and Consortium on Science, Policy & Outcomes, College of Global Futures, Arizona State University, 2016-present

Research Professor of Public Policy, Sanford School of Public Policy, Duke University, 2003- 2016

Professor, Track V, Division of General Internal Medicine, Department of Internal Medicine, School of Medicine, Duke University, 2003-2016

Research Professor, Department of Biology, Trinity College of Arts & Sciences, Duke University, 2006-2016

Visiting Professor, School for the Future of Innovation in Society, Arizona State University, Fall 2015.

Director, Center for Genome Ethics, Law and Policy, Institute for Genome Sciences & Policy, Duke University, 2002-2012

Faculty Affiliate, Kennedy Institute of Ethics, Georgetown University, 2001-2014; Senior Research Fellow, 1986-2001

Seminar Instructor, Stanford-in-Washington (undergraduate tutorials and seminars on health and biomedical research policy), 1996-2003

Cecil and Ida Green Senior Fellow, Green Center for the Study of Science and Society, University of Texas, Dallas, Spring (January-April) 1996

Associate, Department of Health Policy and Management, School of Hygiene and Public Health, The Johns Hopkins University, 1988-2001

Nonacademic Positions

National Academy of Sciences, Washington, DC. Director, Division of Biobehavioral Sciences and Mental Disorders, Institute of Medicine, 1991-1994; Senior Program Officer, Committee on Allocation of Federal Funds for Science and Technology (“Press Report”), 1994-1996; Director, National Cancer Policy Board, National Research Council and Institute of Medicine, 1996-2000; Senior Program Officer, Committee on Human

August 10, 2018

Cook-Deegan CV


Research Protection Programs, 2000; Director, Robert Wood Johnson Foundation Health

Policy Fellowship Program, Institute of Medicine, 2001-2002
National Center for Human Genome Research, National Institutes of Health, Expert (consultant

to Center Director, James D. Watson) 1989-1990
Biomedical Ethics Board, U.S. Congress, and Biomedical Ethics Advisory Committee,

Washington, DC, 1988-1989. Acting Executive Director of a small congressional agency that operated for one year
Office of Technology Assessment, U.S. Congress, Washington, DC. OTA Fellow 1982-3, Project Director and Analyst 1983-4, Senior Analyst 1985-6, Senior Associate 1987-1988

State, National and International Committees and Boards

Member, Advisory Committee, Australian Open Stem Cell Research Network, 2020-2023 Member, Technical Advisory Committee, Canadian Network for Learning Healthcare Systems and Cost-Effective ‘Omics Innovation (CLEOnet), a project funded by BC Cancer, BC

Genomics, and GenomeCanada, 2021-2024
Member, Polaris (advisory committee to the Science, Technology Assessment and Analytics unit

of the Government Accountability Office, US Congress), 2020-
Member, Ethics, Legal and Social Issues subcommittee, Earth BioGenome Project (Hank Greely,

Stanford, chair; and Melissa Goldstein, Geo. Washington U., co-chair). Project to sequence the full range of biological diversity based at UC Davis (Harris Lewin, overall project chair), May 2020-

Member, Advisory Committee to TRANSGENE, University of Edinburgh project on history and comparative genomics (Miguel Garcia Sancho and James Lowe, PIs), 2019-

Co-chair, Embedded ELSI (ethical, legal and social implications) program, Human Pangenome Reference Consortium (HPRC), and member of the overall HPRC committee, based at University of California, Santa Cruz, and coordinated through Washington University, Saint Louis, 2019-

Member, International Scientific Advisory Board, GenomeQuebec, 2018-
Member, National Conference of Lawyers and Scientists, American Association for the

Advancement of Science, 2019-2023
Member, Advisory Committee to the Center for Scientific Evidence in Public Issues, American

Association for the Advancement of Science, 2019-
Member, Science and Technology Policy Fellowship Advisory Committee, American

Association for the Advancement of Science, 2015-2021
Member, Steering Committee, BRCA Challenge, Global Alliance for Genomics and Health,

Co-chair, Regulatory and Ethics Working Group task force on patient engagement, Global

Alliance for Genomics and Health, 2016-2020
Chair, Advisory Committee for “Making Genomic Medicine,” a project funded by the Wellcome

Trust and based at the University of Edinburgh (Steven Sturdy, PI) 2014-2018 Member (consultant), National Academies’ Google-funded pilot project to combat

misinformation on the Internet related to science and health, (Kara Laney, project officer)

Member, Committee on Science, Engineering and Public Policy, American Association for the

Advancement of Science (AAAS), 2011-2017
Member, Advisory Committee to MedSeq Project, Brigham & Women’s Hospital, Boston, 2015-


Cook-Deegan CV

Member, Steering Committee for Free the Data, an effort to establish and maintain a robust commons for scientific and clinical interpretation of human genomic variants, a consortium of nonprofits, non-government organizations, and firms committeed to data access and research transparency organized by the Genetic Alliance, 2013-2017

Member, Third Modality Advisory Group (Chair: Eric Meslin), Genome Canada, 2013-2016 Member, Human Genome Organization (HUGO) Committee on Genomics and Bioeconomy,

Member, The Hinxton Group, in particular, participant in meeting October 2010 at University of

Manchester that produced the Statement on Policies and Practices Governing Data and

Materials Sharing and Intellectual Property in Stem Cell Science (January 2011) Chair, External Advisory Committee, REVEAL study (four-site clinical trial of ApoE genetic testing for Alzheimer’s susceptibility), Robert Green– Principal Investigator, Boston University, 2000-2009 and Brigham and Women’s Hospital and Harvard Medical School, 2010-2015

Member, AAAS Committee on Council Affairs, 2009-2011
Member, National Research Council Committee on University Management of Intellectual

Property, Board on Science, Technology and Economic Policy and Committee on Science, Technology and the Law, The National Academies, June 2008-February 2010. (Committee produced a report, Managing University Intellectual Property in the Public Interest [Washington, D.C.: National Academies Press, 2010]) Managing University Intellectual Property in the Public Interest |The National Academies Press

Council Delegate and Section X Nominating Committee, AAAS Section on Societal Impacts of Science and Engineering, 2007-2011

Member, NAS-AAAS Committee on Assessing Fundamental Attitudes of Life Scientists as a Basis for Biosecurity Education, National Research Council Committee, 2007-2009 (Committee produced a report, A Survey of Attitudes and Actions on Dual Use Research in the Life Sciences, a collaborative effort of the National Research Council and the American Association for the Advancement of Science [Washington, D.C.: National Academies Press, 2009]) A Survey of Attitudes and Actions on Dual Use Research in the Life Sciences: A Collaborative Effort of the National Research Council and the American Association for the Advancement of Science |The National Academies Press

Member, Advisory Committee for the Graduate Student Forum on Science, Technology, and Health Policy, National Academies of Science and Engineering and Institute of Medicine, National Academy of Sciences, 2007-2010

Consultant, Gene Patents and Licensing Practices Task Force, Secretary’s Advisory Committee on Genetics, Health, and Society (SACGHS), Department of Health and Human Services, 2007-2010

Member, Task Force on Patent Reform, American Association of Universities, Association of American Medical Colleges, Council on Government Relations, American Council on Education, and Association of Public and Land-Grant Universities (APLU, formerly National Association of State Universities and Land Grant Colleges), 2005-2009

Member, Committee on Alternative Funding Strategies for Department of Defense’s Peer- Reviewed Medical Research Programs, Institute of Medicine, January-May 2004 (Committee produced a report, Strategies to Leverage Research Funding [Washington, DC: National Academies Press]) Login |The National Academies Press

Consultant, Comparative Approach to Genomics of Complex Traits, Department of Health and Human Services Grant, 2003-2004

Member, Project Advisory Panel on Reprogenetics: A Blueprint for Meaningful Moral Debate and Responsible Public Policy, The Hastings Center, 2001- 2003 (Report: Reprogenetics


Cook-Deegan CV

and Public Policy: Reflections and Recommendations by Erik Parens and Lori P.

Knowles, Hastings Center Report Special Supplement, July/Aug 2003)
Member, History and Health Policy Cluster Group, Robert Wood Johnson Foundation, 2003-

Member, Task Force for Genomics and Public Health, North Carolina Department of Health and

Human Services Office of Genomics, 2002-2005
Member, Expert Advisory Panel, Accessible Genetics Research Ethics Education (AGREE) grant, US Department of Energy, 2002-2005 (J Sugarman– Principal Investigator, The Johns Hopkins University)
Member, Ethical Issues in the Management of Financial Conflicts of Interest in Research in Health, Medicine, and the Biomedical Sciences, The Hastings Center, 2002-2005 Member, World Health Organization Advisory Group on Genomics, 2000-2001 Member, Working Group on Germ Line Gene Therapy, American Association for the Advancement of Science, 1998-2000
Member, Council for the American Association for the Advancement of Science 1998-1999 (as retiring chair of Section X)
Trustee and Secretary, Foundation for Genetic Medicine, 1997-2003
Chair, National Academy of Sciences’ Committee to Review Studies on Human Subjects

(Institutional Review Board for the National Academies of Sciences and Engineering, the National Research Council, and the Institute of Medicine), 1997-2000
Chair, Section X (Societal Impacts of Science and Engineering), American Association for the

Advancement of Science, 1997-1998
National Advisory Board and visiting faculty, Dartmouth College genome curriculum project,

1996-1998, 2000-2003
Advisory Committee, Human Genome Education Model, Project II: Collaborative Education for

Allied Health Professionals, Alliance of Genetic Support Groups and Georgetown

University, 1996-2002
Chair, Royalty Fund Advisory Committee, Alzheimer’s Disease and Related Disorders

Association (committee to oversee distribution of research funds derived from tacrine royalties) member 1994; chair 1995-2003
Member, Committee on Science and Human Values, US Conference of Catholic Bishops, 1995-

Advisory Board, Science + Literacy for Health, Human Genome Project (advised on productionof Your Genes, Your Choices: Exploring the Issues Raised by Genetic Research by Catherine Baker), American Association for the Advancement of Science, 1995-1997.

Consultant, DNA Patent Database, Kennedy Institute of Ethics, Georgetown University, 1994- 2004.

Chair, workshop on international genomics, Office of Technology Assessment, US Congress, March 1994

Participant, workshop on commercial genomics and patenting DNA sequences, Office of Technology Assessment, US Congress, January 1994

Member, Advisory Panel on Commercializing Emerging Technologies, Office of Technology Assessment, US Congress, 1994-1995

Member, Advisory Panel on The Human Genome Project and DNA Patenting, Office of Technology Assessment, US Congress, 1993-1995

Founding member, the Dana Alliance on Brain Initiatives, Charles A. Dana Foundation, 1992- present


Cook-Deegan CV

Joint Working Group on Ethical, Legal, and Social Issues, National Institutes of Health and Department of Energy, 1989-1995

Scientific Coordination Committee on the human genome to the Director General of the United Nations Educational, Scientific, and Cultural Organization (UNESCO), 1988-1991

Member, Technical Advisory Committee, Alzheimer’s Medicare Demonstration, Mathematica Policy Research, Inc. (under contract to Health Care Financing Administration), 1987- 1988

Member, National Advisory Committee, Robert Wood Johnson Foundation, Dementia Care and Respite Services Program, 1987-1991

Member ex officio, Human Gene Therapy Subcommittee, Recombinant DNA Advisory Committee, National Institutes of Health, Liaison member, 1987-1991

Member, Health Services Research Task Force, Alzheimer’s Disease and Related Disorders Association (also executive committee), 1987-88

Honorary Board Member, Family Respite Center, Falls Church, Virginia, 1987-2000
National Advisory Board Member, Alzheimer’s Research Center, Cleveland, Ohio, 1987-1990 National Coordinator, Health Professionals’ Network, Amnesty International USA, 1985-1988;

National Steering Committee 1988-1998
Board of Directors, Physicians for Human Rights, 1988-1996; Advisory Board, 1987-88;

Treasurer and Executive Committee, 1994-1996; Co-chair, Fundraising Committee, 1994-1996

University Service

Arizona State University, 2016-

Director, Masters in Public Interest Technology program, School for the Future of Innovation in Society, College of Global Futures, Arizona State University, 2021-2023

Member, Masters Degree Charter Awards Selection Committee, 2021-
Member, Justice, Equity, Diversity and Inclusion Task Force, School for the Future of

Innovation in Society, College of Global Futures, Arizona State University, 2020-2021 Masters in Public Interest Technology, admissions and advisory committees, 2020-2021 Masters in Science and Technology Policy, Admissions and advisory committees, 2015-2020 BioDesign Institute Advisory Committee, 2017-
ASU Leadership Academy, Behavioral Genomics initiative (Beate Peter, lead), 2017-2019 ASU Clinical Trials Network, 2019-2020

Duke University, 2002-2016

Member, Stem Cell Research Oversight Committee, Duke University Health System, 2010-2015 Member, Ad Hoc Committee on Genetics, Duke University School of Medicine, 2012-2015 Member, Faculty Advisory Board, Duke Human Rights Center at the John Hope Franklin

Humanities Institute, 2011-2015
Chair, Committee to review reappointment of Noah Pickus as Nannerl O. Keohane Director of

the Kenan Institute of Ethics, 2012
Member, Committee to Reappoint Anthony So, Professor of the Practice, Sanford School of

Public Policy (Phil Cook, chair), 2011
Faculty Committee, Genome Sciences & Policy Certificate Program, 2008-2016 Member, Baldwin Scholars Program Review (Sarah Deutsch, chair), 2008-2009 Member, Executive Committee, Duke Global Health Institute, 2007-2014
Faculty and Executive Committee Member, Ethics Certificate Program, 2007-2010


Cook-Deegan CV

Group Advisor, Universities Allied for Essential Medicines (UAEM), 2007-2010 Member, Fulbright Scholarship faculty interview panels, 2009-2012
Member, Truman Scholarship faculty interview panels, 2008-2013
Member, Rhodes/Marshall/Mitchell Scholarship faculty interview panels, 2007-2015 Faculty Member, History and Philosophy of Science, Technology and Medicine Program,

Departments of Philosophy and History, Duke University, 2006-2010
Member, Faculty Council, Kenan Institute of Ethics, 2006-2009
Faculty Affiliate, Race, Ethnicity, & Gender in the Social Sciences, Social Science Research

Institute, 2006-2016
Faculty Advisor, Physicians for Human Rights, Duke Undergraduate and Medical School

Chapters, 2004-2009
Faculty Associate, Trent Center for Bioethics, Humanities and History of Medicine, 2003-2015 Faculty-In-Residence, Alspaugh Residence Hall, 2003-2010
Faculty Member, Health Policy Certificate Program, 2006-2010
Faculty Member, Genome Revolution Focus program and selection committee, 2006-2008,

Member, Duke Translational Medicine Institute Leadership Group, 2006-2010
Member, Committee to review Provost Peter Lange for reappointment (Randall A. Kramer,

chair), 2007-2008
Chair, Search Committee, Associate Director, Kenan Institute of Ethics, 2007 (separate from

similar 2003-2004 search committee below)
Member, Committee to review appointment of Anthony So to Department of Public Policy

Studies, Fall 2007
Chair, Committee to review Graham Glenday for reappointment to Department of Public Policy

Studies, Spring 2007
Member, Center for Comparative Biology of Vulnerable Populations (Division of Pulmonary

Medicine and Nicholas School, D Schwartz and ML Miranda, PI’s), 2007-2009 Member, Campus Culture Initiative Steering Committee, 2006-2007
Member, Institute for Genome Sciences & Policy Executive Committee, 2005-2008, 2010-2011 Co-Chair (with Barton Haynes), Global Health Initiative Steering Committee, November 2004-

June 2006
Member, Global Health Panel, Inauguration of President Richard Brodhead, September 2004 Member, Search Committee, Director of Center for the Study of Medical Ethics and Humanities,

Chair, Search Committee, Associate Director, Kenan Institute of Ethics, 2003-2004
Member, Working Group, Neurosciences Microarray Center of Duke University Medical Center,

June 2002-2006
Member, Steering Committee, Making Meaning of Genomic Information grant, Howard Hughes

Medical Institute, September 2002-2005 (R Thompson, PI)


Postdoctoral fellows

Under P50 HG003391 (Center for Public Genomics, Duke University)

Colin Crossman, J.D., “The use of bioinformatics to quantify the reach of patents on DNA,” 2005-2006

Ilse Wiechers, M.D., M.P.P., “The role of intellectual property on development and dissemination of genomic technologies,” March 2005-Summer 2006


Cook-Deegan CV

Jennifer Reineke Pohlhaus, Ph.D., “Government and nonprofit funding of genomic research worldwide”, June-August 2006

Sapna Kumar, J.D., “Synthetic biology: the intellectual property puzzle,” and “FTC, the ‘other’ patent agency” 2006-2007

Subhashini Chandrasekharan, Ph.D., “The role of intellectual property on the growth and dissemination of genomics research and genomic technologies,” Oct 2006-Oct 2008

Wayne Beyer, Ph.D., “Constructing DNA biobanks to promote research into rare disorders,” Jan 2009-Jan 2010

Britt Rusert, Ph.D., “Patented DNA Sequence Methods” and “Patents and race in BiDil: a case study, June 2009-Aug 2010 (secondary mentor; primary mentor, Charmaine Royal)

Michele Easter, Ph.D., “Behavioral Genetics and Genomics,” Aug 2010-Aug 2012
Mollie Minear, Ph.D., “Translational genomics and clinical care,” Aug 2012-present Richard Yamada, Ph.D., “Bioinformatics and genomics technology,” Aug 2012-Jan 2013 Saurabh Vishnubhakat. J.D., L.L.M., “Intellectual Property and Genomics,” Jan 2014-March

2015 (secondary mentor; primary mentor, Arti Rai, Duke School of Law)

Under “The Sulston Project: Making the Knowledge Commons for Cancer Genomic Variants More Effective”

Janis Geary, ASU and University of Alberta, April 2019-
Amanda Gutierrez, ASU and Baylor College of Medicine, June 2020-

Postbaccalaureate fellows (research supervisor)

Alessa Colaianni, Duke, 2007-2008
Lane Baldwin, Duke, 2012-2015
Kathryn Maxson, Duke, 2010-2014
Abhi Sanka, ASU (undergraduate from Duke), 2017-2018
Sonia Dermer, ASU (undergraduate from William and Mary), 2018 Kara Hapke, ASU, 2019-

Resident medical fellows, K awards, Greenwall Scholars and other

Hassan Shanawani, M.D., “Race and Genetics in Biomedical Research,” fellow in pulmonary medicine, Department of Internal Medicine, Duke School of Medicine, 2004-2006 (primary mentor; secondary mentor, David Schwartz)

William Copeland, Ph.D., “Gene-Environment Interplay in Mental and Substance Abuse Disorders, K23 MH080230, Department of Psychiatry, Duke School of Medicine, 2008-2013 (secondary mentor; primary mentor, E. Jane Costello)

Anne D. Lyerly, M.D., Greenwall Scholar, “Ethical Issues Involving Women in Research,” Department of Obstetrics and Gynecology, Duke School of Medicine, 2003-2007 (co-mentor with Ruth Faden, Johns Hopkins Univ.)

Anne D. Lyerly, M.D., K01 Award in “Ethics of Biomedical Research,” 5K01HL072437, Department of Obstetrics and Gynecology, Duke School of Medicine, 2003-2010 (co- mentor)

Charmaine Royal, Ph.D., Greenwall Scholar, “Ethics as a Guide for the Use of ‘Race’ and Ancestry in Research and Clinical Practice” Institute for Genome Sciences & Policy and Department of African and African American Studies, Duke University, 2009-2012 (co- mentor with Huntington Willard)

Jennifer Wagner, J.D., Ph.D., K99/R00 HG006446, “Multidisciplinary Study of Race, Appearance, Ancestry, Discrimination and Prejudice,” 2012-2018 (co-mentor for K99


Cook-Deegan CV

component, 2012-2014 with Reed Pyeritz, Univ. Pennsylvania). AAAS congressional Science and Technology Policy Fellow 2014-15 with Sen. Edward Markey; now on her mentorship committee for Geisinger Health, 2015-2021


Arizona State University (2015-present)
“Malignant: Cancer Politics and Policy,” HON 494/HSD 598, co-taught Fall 2015 with Jennifer

Dyck Brian, Arizona State University for Barrett Honors College and several gradute

“Science and Technology Policy,” HSD 501/POS 571, Core course in Science and Technology

Policy Masters program at Arizona State University, Fall 2015-2018
“Health and Science Policy” FIS 494/HON 494/HSD 598, Spring and Fall 2017, Fall 2018-19 “Technology Assessment,” PIT 503, core in the PIT program, Fall 2020, Spring & Fall 2021 “Science for Policy and Policy for Science,” seminar course to accompany the STEAM

Washington DC internship program once it starts (hoping for Spring 2022-)

Duke University (2003-2016)
“Political and Ethical Conflict in Health and Science Policy,” PubPol 390A, Spring 2016, Spring

2015 (Duke in Washington, DC program), a seminar course in addition to the research

independent study for each student
“Cancer in Our Lives,” PubPol 641S/Genome 641S, Fall 2014
“Science, Law and Literature,” English 390S/PubPol 290S/Women’s Studies 290S, Fall 2014

(with Priscilla Wald, English and Women’s Studies)
“Cancer and the Genome,” Genome 590S.10/PubPol 590S.10, Fall 2013
“Science, Law and Policy,” PubPol 590S.07/Law 333.01/Genome 590S.07, Fall 2013 (with Nita

Farahany, Law)
Duke in Washington program, course listed as research independent study for each student,

supervising a project based on internship, senior thesis preparation, or a specific project,

Spring 2013, 2014, 2015, 2016
“The Genome and the Internet: Growing Up Together,” GENOME 108FCS/PUBPOL 81FCS.

For ‘Genomes in Our Lives: Science and Conscience’ Focus course cluster, Fall 2011 “Responsible Genomics,” Public Policy 240/Computational Biology and Bioinformatics 212,

spring 2004, fall 2005, spring 2007-2012 (except 2011)
“Health Policy Capstone,” Public Policy 255S: final course for Health Policy Certificate

Program. Undergraduate section, spring 2004. Graduate and professional student section (with Christopher Conover), spring 2007. Undergraduate and graduate/professional students, spring 2010

“Genome Sciences and Policy Capstone,” final course for undergraduate Genome Science and Policy Certificate Program, spring 2010, 2012, 2013 (with Hunt Willard, Director of the Institute for Genome Sciences & Policy, and Professor of Biology and of Molecular Genetics and Microbiology), Spring 2014 (with Lauren Dame, Law and IGSP), Spring 2015 and 2016 (with Misha Angrist)


Cook-Deegan CV

“Evolution in Science and Culture” (with Priscilla Wald, Professor of English), English 193S/Genome 178S, fall 2009

“Social and Political History of Genomics,” Public Policy 264, fall 2004. Taught a similar course of the same title for the Focus program, ‘The Genome Revolution and Its Impact on Society,’ as Public Policy 196S/History 105S (fall 2006) and Public Policy 81FCS/History89FCS (fall 2007), Public Policy Studies 190FS/History 190FS (fall 2012)

“Health, Science, and Human Rights,” Public Policy 195S, for ‘Humanitarian Challenges at Home and Abroad’ Focus program fall 2003, and for ‘Global Health’ Focus program spring 2006

House course faculty sponsor: “The Physician Activist,” fall and spring 2005

Other Teaching

University of Vienna, Summer graduate course, Univie Summer School—Scientific World Conceptions “Genomics: Philosophy, Ethics, and Policy,” 3-14 July 2017, with Jennifer Reardon, University of Calfornia, Santa Cruz, and Paul Griffiths, University of Sydney

Stanford-in-Washington Seminar and Tutorial director, “How Policy Decisions Get Made about Health Research and Health Policy,” Fall 1996-Winter 2003 (offered 13 quarters)

Summer Intensive Bioethics Course, Georgetown University, June 1987 and 1988

Student theses and supervision

Graduate and Professional Student Supervisor (Arizona State University)

Dina Carpenter-Graffy, MS Public Interest Technology, 2020-2021 (chair for applied project)

Pooja Chitre, PhD candidate in Human and Social Dimensions of Science & Technology, researcher on the Sulston Project, Spring 2021- (chair, Erik Johnston).

Amanda Arnold, PhD in Human and Social Dimensions of Science & Technology, School for the Future of Innovation in Society, ABD starting April 2020 (co-chair with Heather Ross), 2020-

Josh Massad, Masters in Public Interest Technology, second-year project (Katina Michael, chair) 2020-2021

Walter Johnson, Masters in Science and Technology Policy, May 2017
Theora Tiffney, PhD in Biology and Society program, School of Life Sciences,

December 2020 (co-chair with Jim Collins)
Nathaniel Wade, PhD candidate in Human and Social Dimensions of Science &

Technology, School for the Future of Innovation in Society, 2016-2019 (Barry

Bozeman, chair)
Neekta Hamidi, Masters in Science & Technology Policy, December 2015 (Andrew

Maynard, chair)
Nicole Frank, Masters in Science and Technology Policy internship, summer 2017

Graduate and Professional Student Supervisor (Duke University)

Jenae Emily Logan, M.Sc. Global Health, 2016 (Subhashini Chandrasekharan, chair) Julia Carbone Gold, S.J.D., 2009-2016 (Arti Rai, chair)

, MPP candidate, 2013-2014 (Anthony So, chair)

Andrew Darnell, Masters in Bioethics and Science Policy, 2015 (chair)


Sonya Jooma, Masters in Bioethics and Science Policy, 2015 (chair)

Jessica Ordax, Masters in Bioethics and Science Policy, 2015 (Misha Angrist, chair)


Guangyangzi (Gwen) Shu

Cook-Deegan CV

Lisa Warner Pfefferle, Biology PhD, 2008-2013 (Gregory Wray, chair)
Erin McCarthy, MPP candidate, 2009-2010 (Anthony So, chair)
Jeremy Block, Biochemistry PhD candidate, MPP project 2008-2009 (chair of MPP) Matthew DeCamp, Philosophy PhD, July 2002-May 2007 (MD 2008; co-chair of PhD

committee with Allen Buchanan)
Elana Fric-Shamji, MPP (already had her MD), May 2007-May 2008 (chair of MPP) Deirdre Parsons, MS in Molecular Genetics & Microbiology, May-December 2007 (Hunt

Willard, chair): Duke Center for Genomic and Computational Biology | Duke Department of Biostatistics and Bioinformatics genomic-technologies/bead-array/documents/DBPARSONS%20MS%20THESIS%20- %20ILLUMINA%20FINAL.pdf

Eric Hoefer, MPP, MBA, 2004 (chair MPP) Noah Perin, MPP, MBA, 2005 (chair MPP) Charles Mathews, MPP, 2004 (chair)

Undergraduate Thesis Supervisor (Arizona State University)

Sanjay Srinivas, Barrett Honors College, May 2017 (Daniel Sarewitz, chair)

Sidney Stoffer, Barrett Honors College, May 2016 (Jennifer Brian, chair)

Undergraduate Thesis Supervisor (Duke University)

Jennifer Zhao, May 2015 (Public Policy Studies)
Camille Peeples, December 2014 (Public Policy Studies); main supervisor was Jenni Owen, I

supervised one semester and was a final reader
Alexandra Young, May 2014 (thesis outside the disciplines): Stephanie Chen, Public Policy Studies, May 2014:
Shreya Prasad, May 2010 (International Comparative Studies) Swathi Padmanabhan, May 2010 (Public Policy Studies):

The Impact of Intellectual Property, University Licensing Practices, and Technology Transfer on Regional Manufacturing of and Access to the HPV Vaccine in Resource-Poor Regions.
Matthew Piehl, May 2008 (Biology, graduation with distinction) Sarah Wallace, December 2007 (Public Policy Studies)
Catherine Alessandra Colaianni, May 2007 (Biology and Philosophy) Joe Fore, December 2006 (Public Policy Studies)
Daidree Tofano, May 2006 (Program II)

Undergraduate Research Program supervisor (ASU)

Kelsey Beck, spring 2018-2020 Julianna Smith, spring 2018-2020 Jillian Leaver, summer 2020- Marina Filipek, summer 2020- Adriane Inocencio, summer 2020- Maya Shrikant, spring 2021 Zuzana Skvarkova 2021-

Imtithal Noor 2021- Britney Hill 2021- Venus Kapadia 2021-

Undergraduate Research Program supervisor or mentor (other universities)

Kavyaa Choudhary, University of Texas, 2021-2022 (co-mentor, Amanda Gutierrez, Baylor College of Medicine


Cook-Deegan CV

Sudhanvan Iyen, University of Texas, 2021-2023) (co-mentor Mary Majumder, Baylor College of Medicine)

Independent Study and Research Independent studies at Duke:

2009-2016: Biology (1), Public Policy (53), and Genome Sciences and Policy (3) (total = 57)

Program II supervision (Duke’s curriculum alternative to a disciplinary major) Sanjay Kishore, graduated 2013
Daidree Tofano, graduated 2006

Thesis supervision outside Duke and ASU

Knut Jorgen Egelie, May 2019 Ph.D. in Department of Biology, Norwegian University of Science and Technology (NTNU), “Access to knowledge - university management of intellectual property to govern knowledge dispersion” Trondheim, Norway

(refer to his CV for more . . . ran out of space here in this reply)

This language reminds me of telephone line switches, Claude Shannon and information theory.

1 Like