Improved draft of the Mojave Desert tortoise genome, Gopherus agassizii, version 1.1

School of Life Sciences, Arizona State University, Tempe, Arizona, United States
Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, United States
DOI
10.7287/peerj.preprints.3266v4
Subject Areas
Bioinformatics, Computational Biology, Evolutionary Studies, Genomics, Computational Science
Keywords
tortoise, scaffold, genome, assembly, contamination
Copyright
© 2018 Webster et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Webster TH, Dolby GA, Wilson Sayres MA, Kusumi K. 2018. Improved draft of the Mojave Desert tortoise genome, Gopherus agassizii, version 1.1. PeerJ Preprints 6:e3266v4

Abstract

Exogenous sequence contamination presents a challenge in first-draft genomes because it can lead to non-contiguous, chimeric assembled sequences. This can mislead downstream analyses reliant on synteny, such as linkage-based analyses. Recently, the Mojave Desert Tortoise (Gopherus agassizii) draft genome was published as a resource to advance conservation efforts for the threatened species and discover more about chelonian biology and evolution. Here, we illustrate steps taken to improve the desert tortoise draft genome by removing contaminating sequences—actions that are typically carried out after the initial release of a draft genome assembly. We used information from NCBI’s Vecscreen output to remove intra-scaffold contamination and trim heading and trailing Ns. We then reordered and renamed scaffolds, and transferred the gene annotation onto this assembly. Finally, we describe the tools developed for this pipeline, freely available on Github (https://github.com/thw17/G_agassizii_reference_update), which facilitate post-assembly processing of other draft genomes. The new gopAga1.1 genome has an N50 of 251 kb, L50 of 2592 scaffolds, and its annotation retains 17,201 of the original 20,172 genes that were unaffected by the scaffold processing.

Author Comment

In this version, we add NCBI accession numbers.

Supplemental Information

Commands uses to convert annotation

DOI: 10.7287/peerj.preprints.3266v4/supp-1

Python scripts used to process fasta sequences

DOI: 10.7287/peerj.preprints.3266v4/supp-2