有一个 较新的版本 该记录的可用。

数据集 开放存取

DADA2格式化了两种细菌的​​16S rRNA基因序列& archaea

阿里·阿里修姆(Ali Alishum)


都柏林核心出口

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:contributor>Ali Alishum</dc:contributor>
  <dc:contributor>Seersholm Frederik</dc:contributor>
  <dc:contributor>Greenfield Paul</dc:contributor>
  <dc:contributor>Christophersen Claus</dc:contributor>
  <dc:creator>Ali Alishum</dc:creator>
  <dc:date>2020-07-19</dc:date>
  <dc:description>These two combined bacterial and archaeal 16S rRNA gene sequence databases were collated from various sources and formatted for the purpose of using the "assignTaxonomy" command within the DADA2 pipeline.


	RefSeq+RDP: This database contains 14676 bacterial &amp; 660 archaea full 16S rRNA gene sequences.  It was compiled in 14/05/2018 from predominantly the NCBI RefSeq 16S rrna database (//www.ncbi.nlm.nih.gov/refseq/targetedloci/16S_process/) and was supplemented with extra sequences from the RDP database (//rdp.cme.msu.edu/misc/resources.jsp).
	Genome Taxonomy Database (GTDB): The new version of our dada2 formatted GTDB reference sequences now contains 21965 bacteria and 1126 archaea full 16S rRNA gene sequences. If you wonder why there are fewer species with 16S rRNA, that is because some metagenomics assembled genomes (MAGs) lack the 16S gene and thus cannot be extracted.  The database was downloaded from //data.ace.uq.edu.au/public/gtdb/data/releases/release95/ on 19/07/2020. Please read the release notes and file descriptions. 


The formatting to DADA2格式 of the databases was done using a simple awk bash scripts. The script takes as input a fasta file as provided by the core databases creators and then it outputs a fasta file with all 7 taxonomy ranks separated by ";" as required for DADA2 compatibility. Additionally, we have concatenated the unique sequence ID be it NCBI/RDP or GTDB ID to the species entry. We see this as an important QC step to highlight the issues/confidence associated with short read taxonomy assignment at the more finer rank levels.</dc:description>
  <dc:description>The RefSeq+RDP database was updated due to a quotation mark bug that was wrongly placed in front of some of the species names. A file with all the affected species names has been uploaded to review. This shouldn't affect any assignments but might have caused some issues reading into R.  

The GTDB was updated due to a new release with taxonomy changes has been made available. The core GTDB team advises that everyone using the GTDB to convert to the release 95. I have also formatted all the 16S rRNA sequences in the GTDBr95 that have passed QC. If anyone finds a need for them I can share outside of here because I do not want to confuse anyone. 

awk script can be provided on request.</dc:description>
  <dc:identifier>//americinnmankato.com/record/3951383</dc:identifier>
  <dc:identifier>10.5281 / zenodo.3951383</dc:identifier>
  <dc:identifier>oai:zenodo.org:3951383</dc:identifier>
  <dc:language>aig</dc:language>
  <dc:relation>doi:10.5281 / zenodo.2541238</dc:relation>
  <dc:relation>url://americinnmankato.com/communities/zenodo</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>//creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>DADA2 format</dc:subject>
  <dc:subject>16S rRNA</dc:subject>
  <dc:subject>Bacterial</dc:subject>
  <dc:subject>Archaeal</dc:subject>
  <dc:title>DADA2格式化了两种细菌的​​16S rRNA基因序列&amp; archaea</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
10,342
84,902
意见
资料下载
所有版本 这个版本
观看次数 10,342707
资料下载 84,902182
数据量 345.6 GB948.5兆字节
独特的景色 8,039610
独特下载 22,215119

分享

引用为