I’m a post-doctoral bioinformatician in Mark Jobling‘s group in Genetics at the University of Leicester. I work on the “Sex, Genomes and History” project, funded by the Wellcome trust. I’ve recently developed my interest in communicating with the public, and written a few articles for various websites and publications.
Here’s a little about what I do when I’m not writing:
Today’s human populations have their roots in an amazingly complicated series of historical and prehistoric events. Most of these went unrecorded, or were even so slow as to be unobserved at the time. Others are mentioned in written histories, and inform people’s cultural identities, but rarely with any mention of how many people were involved. For example, we’re told that the Anglo-Saxons came to Britain, but did they provide the majority of the ancestry of the modern English, or just a small injection into the ancient British population?
However, there is another record of these events, and that is found in DNA. Variations in the DNA sequences of modern individuals can be used to draw conclusions about the historical relationships between the populations they are from. Similar sets of variations are found in populations that have arisen from a common ancestor relatively recently. A variant typical of one population found in another, may suggest some inter-breeding in the past.
We are studying human and ape DNA sequences to explore several questions in our evolution and history: How sequences on the X and Y (sex) chromosomes evolve differently from other regions of the genome; how Y chromosomes from around the world are related; and how the expansion and movement of ancient near-eastern populations after their adoption of farming contributed to the ancestry of modern Europe.
To do this we have collected samples from many human populations in Europe and around the world, as well as some great apes. DNA has been extracted from these and several regions of interest within their genomes are being sequenced using “Next Generation Sequencing” (NGS) technologies. The first DNA sequencing methods were developed decades ago, but with the emergence of NGS over the past few years, the price has dropped dramatically. Suddenly, projects such as ours, in which large chunks of the genomes of hundreds of individuals are sequenced, became affordable, and population histories are starting to open up like never before.
A pilot study on eight human, one chimp and one gorilla samples has shown that the sequencing works well. It has also shown that the way in which we have targeted the sequencing to our regions of interest has been successful. This is true even in the apes, although we designed it to recognise the human sequences. This goes to show how similar human and ape genomes are.
However, not everything is plain sailing, and NGS data comes with its own set of problems. It is prone to errors, which may look like real variation, and it is split into many short segments that need to be re-assembled. Amongst my jobs, as the project’s bioinformatician, is to write a programme using a string of pre-existing and bespoke tools to assemble these sequences and separate true genetic variation from error. The X and Y chromosomes share a common ancestral chromosome, and therefore have many similar regions. This makes it particularly difficult to be sure that each NGS sequence is being mapped onto the correct chromosome. This could potentially make differences between the X and Y look like variation on one chromosome. We are therefore looking into how well the software is able to differentiate between such regions.
When we are happy that we are able to accurately detect variation in our pilot data, the same methods can be used on hundreds of samples, and we can start to explore how they came to be the way they are.
Read about how our results suggest that a handful of Bronze-Age men could have fathered two thirds of Europeans.
From this project, we have published the following scientific papers so far:
Batini, C., Hallast, P., Zadik, D., Maisano Delser, P., Benazzo, A., Ghirotto, S., Arroyo-Pardo, E., Cavalleri, G.L., de Knijff, P., Myhre Dupuy, B., Eriksen, H.A, King, T.E., López de Munain, A., López-Parra, A.M., Loutradis, A., Milasin, J., Novelletto, A., Pamjav, H., Sajantila, A., Tolun, A., Winney, B., and JOBLING, M.A. (2015) Large-scale recent expansion of European patrilineages shown by population resequencing. Nature Comm., 6, 7152. doi:10.1038/ncomms8152. (PubMed)
Hallast, P., Batini, C., Zadik, D., Maisano Delser, P., Wetton, J.H., Arroyo-Pardo, E., Cavalleri, G.L., de Knijff, P., Destro Bisol, G., Myhre Dupuy, B., Eriksen, H.A, Jorde, L.B., King, T.E., Larmuseau, M.H., López de Munain, A., López-Parra, A.M., Loutradis, A., Milasin, J., Novelletto, A., Pamjav, H., Sajantila, A., Schempp, W., Sears, M., Tolun, A., Tyler-Smith, Van Geystelen, A., Watkins, S., Winney, B., and JOBLING, M.A. (2015) The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades. Mol. Biol. Evol., 32, 661–673. doi: 10.1093/molbev/msu327 (PubMed). You can download the vcf for this dataset here.