有一个 较新的版本 该记录的可用。

数据集 开放存取

DADA2格式化了两种细菌的​​16S rRNA基因序列& archaea

阿里·阿里修姆(Ali Alishum)


数据城 XML导出

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281 / zenodo.2541239</identifier>
  <creators>
    <creator>
      <creatorName>Ali Alishum</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-4498-2870</nameIdentifier>
      <affiliation>科廷科技大学趋势实验室</affiliation>
    </creator>
  </creators>
  <titles>
    <title>DADA2格式化了两种细菌的​​16S rRNA基因序列&amp; archaea</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2019</publicationYear>
  <subjects>
    <subject>DADA2 format</subject>
    <subject>16S rRNA</subject>
    <subject>Bacterial</subject>
    <subject>Archaeal</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2019-01-16</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">//americinnmankato.com/record/2541239</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281 / zenodo.2541238</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">//americinnmankato.com/communities/zenodo</relatedIdentifier>
  </relatedIdentifiers>
  <version>Version 1</version>
  <rightsList>
    <rights rightsURI="//creativecommons.org/licenses/by/4.0/legalcode">知识共享署名4.0国际</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;These two combined bacterial and archaeal 16S rRNA gene sequence databases were collated from various sources and formatted for the purpose of using the &amp;quot;assignTaxonomy&amp;quot; command within the DADA2&amp;nbsp;pipeline.&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;RefSeq+RDP: This database contains 14676 bacterial &amp;amp; 660 archaea full 16S rRNA gene sequences.&amp;nbsp; It was compiled in 14/05/2018 from predominantly the NCBI RefSeq 16S rrna database (//www.ncbi.nlm.nih.gov/refseq/targetedloci/16S_process/)&amp;nbsp;and was supplemented with extra&amp;nbsp;sequences from the&amp;nbsp;RDP database (//rdp.cme.msu.edu/misc/resources.jsp).&lt;/li&gt;
	&lt;li&gt;Genome Taxonomy Database (GTDB): our dada2 formatted GTDB reference sequence set contains 20486 bacteria and 1073 archaea full 16S rRNA gene sequences. The database was downloaded from (&lt;a href="//t.co/bIjprJsYUh"&gt;http://gtdb.ecogenomic.org/downloads&lt;/a&gt;)&amp;nbsp;on 20/11/2018.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The formatting to DADA2格式 of the databases was done using a locally written python 2.7 script. The script&amp;nbsp;takes&amp;nbsp;as input a taxonomy .txt file and a fasta&amp;nbsp;file as provided by the core databases creators and then these two files are matched according to a unique sequence identifier available in both files. Then it&amp;nbsp;outputs a fasta file with all 7 taxonomy ranks separated by &amp;quot;;&amp;quot; as required for DADA2 compatibility. Additionally,&amp;nbsp;we have concatenated&amp;nbsp;the unique&amp;nbsp;sequence ID be it NCBI/RDP or GTDB&amp;nbsp;ID to the species entry. We see this as an important QC step to highlight the issues/confidence associated with short read taxonomy assignment at the more finer rank levels.&lt;/p&gt;</description>
    <description descriptionType="Other">Python script can be provided on request.</description>
    <description descriptionType="Other">{"references": ["Parks, D. H., et al. (2018). \"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life.\" Nature Biotechnology.", "Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje. 2014. Ribosomal Database Project: data and tools for high throughput rRNA analysis Nucl. Acids Res. 42(Database issue):D633-D642; doi: 10.1093/nar/gkt1244 [PMID: 24288368]", "NCBI 16S RefSeq Nucleotide sequence records: //www.ncbi.nlm.nih.gov/nuccore?term=33175%5BBioProject%5D+OR+33317%5BBioProject%5D"]}</description>
  </descriptions>
</resource>
10,342
84,902
意见
资料下载
所有版本 这个版本
观看次数 10,3426,640
资料下载 84,90213,064
数据量 345.6 GB77.5 GB
独特的景色 8,0395,482
独特下载 22,2156,836

分享

引用为