Scientists Are on the Cusp of Finally Deciphering the Entire Human Genome
After 20 years of work, the pursuit is nearly complete, but the team still has to sequence a Y chromosome
A human DNA sequence is made of four types of nucleic acid called base pairs, each represented by their first letter: adenine (A), thymine (T), guanine (G) and cytosine (C). Altogether, a list 3.055-billion-letters long across 23 chromosomes makes up the human genome. Nearly two decades ago, the Human Genome Project set out to map the genetic makeup of the human species. In 2000, scientists completed the first draft of the human genome, but eight percent still remained, reports Matthew Herper for STAT.
The unsequenced remaining portion was a dizzying array of repeating letters. These missing gaps were almost impossible to decipher with the technology available at the time. Now, in a preprint published on May 27, a group of scientists describe the first "nearly" complete sequence of the human genome, reports Sarah Zhang for the Atlantic.
The feat was completed with scientists in the Telomere to Telomere (T2T) Consortium, a collaboration consisting of about 30 different institutions, reports Sara Reardon for Nature. Together, they found 115 new genes and added 200 million base pairs to a version of the human genome measured in 2013. They named the newly deciphered genome T2T-CHM13.
One of the most challenging regions to sequence in the human genome is centromeres. Each chromosome resembles an X-shaped tangle, and centromeres are located close to the pinched, knot-like center of each criss-cross. In these regions, DNA difficult to sequence because it is so densely packed and contains nearly endless repeating codes, the Atlantic reports.
But on five of the 23 total human chromosomes, the centromere is not precisely in the middle, instead favoring one end over the other, per the Atlantic. The asymmetrical point creates one long arm and one short arm on the chromosome. The previously unsequenced, repeating letters are located in these "short arms." Now, the team behind T2T-CHM13 has deciphered them.
The sequencing was made possible using new technologies developed by two private companies: Pacific Biosciences (PacBio) of Menlo Park, California, and Oxford Nanopore of Oxford Science Park in the United Kingdom.
Previously methods for genome-deciphering required cutting DNA into tiny pieces and then reassembling stretches of DNA later in a long, tedious process. Two new methods take different approaches. The Oxford Nanopore technology pulls the DNA into a small hole where longer sequences can be read. The PacBio tech uses lasers to examine 20,000 base pair sequences of DNA at a time repeatedly to create a highly accurate readout, reports STAT.
Using the Oxford Nanopore technology, the T2T-CHM13 consortium found that it can map where proteins attach to the centromere during cell division, per the Atlantic.
The sequenced DNA was derived using a cell line taken from tissue that forms when sperm fertilizes a non-viable egg that lacks a nucleus, also known as complete hydatidiform mole, reports Nature. (In other words, the sample was not taken from a person.) However, DNA is stored in the nucleus of an egg, so an egg without a nucleus does not contain gentic material from a mother. Instead, the "mole" only contains chromosomes from the father. Using a mole makes sequencing easier because researchers do not have to differentiate two sets of chromosomes from the parents.
But the T2T-CHM13 genome only represents one genome, so the researchers plan to team up with the Human Pangenome Reference Consortium to sequence over 300 genomes in the next three years from humans worldwide, using T2T-CHM13 as a reference. They also plan to sequence a Y chromosome next since the sperm used to create the hydatidiform mole only carried an X chromosome.