Print 27 comment(s) - last by geronimo2013.. on Mar 23 at 11:07 PM

  (Source: AP)
DNA comes from sequencing of Neanderthal toe-bone found in Siberia

Three years after an international team of experts led by researchers at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany published a "draft" of the genome of Homo neanderthalensis (commonly known as the "Neanderthal"), the team has come back with a higher-quality "finished" map of the genome.

I. Getting to Know Our Neighbors, Distant Ancestors

Neanderthals are one of mankind's closest relatives.  Neanderthals and modern man (Homo sapiens) are though to have diverged from a common ancestor around 350,000 years ago.  That ancestor is thought to have evolved into humans in Africa, but into Neanderthals in Europe.  Another close relative, Denisovans, are thought to have diverged slightly earlier in Asia.

Nonetheless, humans and these relatives would eventually reunite and even have intimate sexual relationships, which led to some modern humans bearing pieces of Neanderthal/Denisovan DNA.

These "donations" from our close relatives are thought to have endowed people of European or Asian descent with hardier immune systems.

The DNA for the sequenced Neanderthal genome comes from a toe-bone found in the Denisova cave in southern Siberia.

Denisova cave
[Image Source: Current Biology/Science]

That cave is also home to preserved human and Denisovan remains; in fact the Denisovan remains are being used to carry out a similar sequencing project on that genome.

II. Contamination, Region Variance Leave Picture Only Mostly Complete

Svante Paabo, a geneticist who led the research, wrote in an email to the Associated Press, "The genome of a Neanderthal is now there in a form as accurate as that of any person walking the streets today."

Or it is for the Altal Neanderthals, at least.  Much like modern man, where people from different areas developed unique genetic makeups, Neanderthals are hypothesized to have subtle regional differences in their genomes.

As Ars Technica points out, it is misleading to suggest that the genome is a "complete" genome for the entirety of the Neanderthal population that once inhabited Europe, Asia, and the Middle East.  Rather, the Neanderthal genome represents a finished/complete picture of the Neanderthals in one region.

Family tree
Scientists think that the Altal branch of the "family tree" is complete with the finished sequencing. [Image Source: MPI]

Professor Paabo is preparing a paper on the work.  He enthuses, "We will gain insights into many aspects of the history of both Neanderthals and Denisovans, and refine our knowledge about the genetic changes that occurred in the genomes of modern humans after they parted ways with the ancestors of Neanderthals and Denisovan."

The finished genome is available here in "BAM" format, with chromosome file sizes ranging from 1.9-13 GB a piece, depending on the size, except for the small 'Y' chromosome, which is only 331 MB.

One other way that the genome is somewhat incomplete is contamination.  Analysis showed that approximately 1 percent of the DNA in the sample was contamination from the cave's later human residents.  Those gaps -- and the variations between Neanderthals in different regions -- will have to be filled in with future gene studies.

Source: MPI

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

What kind of idiotic file format...
By Shadowmaster625 on 3/21/2013 10:01:33 AM , Rating: -1
The entire human genome is 3.2 billion base pairs. That is 3.2 billion bits with a 0 representing a guanine-cytosine pair, and a 1 representing an adenine-thymine pair. 3.2 billion bits is only 400MB. So there is really only 400MB of raw DNA data in the human genome. So why are these files so much larger? I'm sure you need a few extra bits if you want to have indexes for individual genes, but that's it. Just a couple extra megabytes would take care of all the indexing overhead. Wow these guys are wasting hundreds of gigabytes on nothing! Must be a govt funded operation....

By retrospooty on 3/21/2013 10:09:24 AM , Rating: 2
RE: What kind of idiotic file format...
By freeagle on 3/21/2013 11:06:43 AM , Rating: 2
An expert on DNA indexing has spoken...

A binary seach tree, where each leaf would represent one base pair, would contain (3.2e9 - 1) vertices (roughly, the - 1 is there just to draw the relationship between vertices and leafs in general). Each vertex needs to somehow point to both of its children. Each such pointer requires at least 4 bytes, so thats 8 bytes per vertex, times 3.2e9 is 25.6e9 bytes, so 23.84 GB. Granted, you could compress such a data structure when storing into a file, but I assume you need more metadata for each vertex than just pointers to its children...

I'm not saying this is what they use to index the the bases, I'm no DNA data indexing export either, but it certainly is not "just a couple extra megabytes"

RE: What kind of idiotic file format...
By TheEinstein on 3/21/2013 1:06:28 PM , Rating: 2
3.2 billion base pairs is actually 3.2 billion bits.

Since effectively the data is RAND compression is highly unlikely to save many bits

3.2 /8 = 400 million bytes. He was correct in his 400mb size file.

Unless there is 'empty pairs' of sorts. There is no need for vertexes if you run sequential.

Now given if there is variable count per specie (I am no genetitsist (sp?)) then the solution is simply count til finished

You only need to run a tree if there is a variable placement based upon previous results and then only if the system is not self indexing.

By cashkennedy on 3/21/2013 6:40:18 PM , Rating: 2
I would imagine there is a bunch of extra information to allow the data to be accessed / searched faster, such as what proteins various sections lead to , and possibly a index column that says what numerical position each base pair is in. Im no expert on databases though, but there are plenty of retarded ways to make databases, and although text files of pure data are nice and small they take forever to search.

By MantisLion on 3/22/2013 2:10:38 AM , Rating: 1
Just so you know, .BAM files are alignment files, that is, they are stacks of individual DNA sequences from a sequencing machine aligned together to form a final sequence. This is done to ensure that each base is sequenced multiple times for maximum accuracy, as next-generation sequencing techniques sacrifice some quality of base call for quantity of sequencing. It's not just the raw end sequence they have there, otherwise you would be correct in your 400MB figure.

I've worked with .BAM files that have been >30GB, but only covering approximately 50 megabases worth of actual sequence data, simply because the coverage (sequencing depth) of each base was on average 250x.

"I want people to see my movies in the best formats possible. For [Paramount] to deny people who have Blu-ray sucks!" -- Movie Director Michael Bay

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki