Aim Comparative phylogeography across a large number of species allows investigating community‐level processes at regional and continental scales. An effective approach to such studies would involve automatic retrieval of georeferenced sequence data from nucleotide databases (a first step towards an ‘automated phylogeography’). It remains unclear if, despite repeated calls, georeferencing of nucleotide databases has increased in frequency, and if accumulated data allow for broad applications based on automated retrieval of sequence data and associated geographical information. Here, we investigated geographical information available in NCBI GenBank accessions for tetrapods, exploring temporal and geographical patterns in georeferencing, and quantifying data available for automated phylogeography. Location Global. Methods We developed Python and R scripts to (1) download metadata from GenBank (1,125,514 accessions, > 20,000 species); (2) geocode accessions from associated metadata; (3) map originally georeferenced and geocoded accessions and plot their frequency against time; (4) assess the size of intraspecific sets of homologous sequences and compare their geographical extent with species ranges, thus evaluating their potential for phylogeographical analyses. Results Only 6.2% of surveyed tetrapod GenBank submissions reported geographical coordinates, without increase in recent years. Our geocoding raised georeferenced accessions to 15.1%. The geographical distribution of georeferenced accessions is patchy, and especially sparse in economically underdeveloped areas. Automatically retrievable informative data sets covering most of the range are available for very few species of wide‐ranging tetrapods. Main conclusions Although geocoding offers a partial solution to the scarcity of direct georeferencing, the amount of data potentially useful for automated phylogeography is still limited. Strong underrepresentation of hard‐to‐access areas suggests that sampling logistics represent a main hindrance to global data availability. We propose that, besides enhancing georeferencing of genetic data, future research agendas should focus on collaborative efforts to sample genetic diversity in biodiversity‐rich tropical areas.

Gratton, P., Marta, S., Bocksberger, G., Winter, M., Trucchi, E., Kühl, H. (2017). A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography?. JOURNAL OF BIOGEOGRAPHY, 44(2), 475-486 [10.1111/jbi.12786].

A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography?

Gratton, Paolo;Trucchi, Emiliano;
2017-01-01

Abstract

Aim Comparative phylogeography across a large number of species allows investigating community‐level processes at regional and continental scales. An effective approach to such studies would involve automatic retrieval of georeferenced sequence data from nucleotide databases (a first step towards an ‘automated phylogeography’). It remains unclear if, despite repeated calls, georeferencing of nucleotide databases has increased in frequency, and if accumulated data allow for broad applications based on automated retrieval of sequence data and associated geographical information. Here, we investigated geographical information available in NCBI GenBank accessions for tetrapods, exploring temporal and geographical patterns in georeferencing, and quantifying data available for automated phylogeography. Location Global. Methods We developed Python and R scripts to (1) download metadata from GenBank (1,125,514 accessions, > 20,000 species); (2) geocode accessions from associated metadata; (3) map originally georeferenced and geocoded accessions and plot their frequency against time; (4) assess the size of intraspecific sets of homologous sequences and compare their geographical extent with species ranges, thus evaluating their potential for phylogeographical analyses. Results Only 6.2% of surveyed tetrapod GenBank submissions reported geographical coordinates, without increase in recent years. Our geocoding raised georeferenced accessions to 15.1%. The geographical distribution of georeferenced accessions is patchy, and especially sparse in economically underdeveloped areas. Automatically retrievable informative data sets covering most of the range are available for very few species of wide‐ranging tetrapods. Main conclusions Although geocoding offers a partial solution to the scarcity of direct georeferencing, the amount of data potentially useful for automated phylogeography is still limited. Strong underrepresentation of hard‐to‐access areas suggests that sampling logistics represent a main hindrance to global data availability. We propose that, besides enhancing georeferencing of genetic data, future research agendas should focus on collaborative efforts to sample genetic diversity in biodiversity‐rich tropical areas.
2017
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore BIO/05 - ZOOLOGIA
English
https://onlinelibrary.wiley.com/doi/abs/10.1111/jbi.12786
Gratton, P., Marta, S., Bocksberger, G., Winter, M., Trucchi, E., Kühl, H. (2017). A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography?. JOURNAL OF BIOGEOGRAPHY, 44(2), 475-486 [10.1111/jbi.12786].
Gratton, P; Marta, S; Bocksberger, G; Winter, M; Trucchi, E; Kühl, H
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
Gratton_et_al-2016-Journal_of_Biogeography.pdf

solo utenti autorizzati

Licenza: Copyright dell'editore
Dimensione 799.43 kB
Formato Adobe PDF
799.43 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/242040
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 42
  • ???jsp.display-item.citation.isi??? 40
social impact