软件 开放存取
罗布兰夫; 理查德·曼斯菲尔德
引用和重用
请将此版本引用为:
Lanfear,Rob(2020年)。来自GISAID的SARS-CoV-2序列的全局系统发育。 Zenodo 土井:10.5281 / zenodo.3958883
如果发布使用该树的论文,则仍必须遵循GISAID数据共享和归因规则。
细节此版本中的树是使用以下命令行生成的:
bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250
[gisaid.fasta]是从GISAID到发布标题中包括日期在内的高覆盖率和完整原始序列的fasta文件,由GISAID数据Feed上的“提交日期”过滤器确定
[previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv
and ft_SH.tree
files as the starting points of the current iteration.
sequences downloaded from GISAID
136871
//
alignment stats of global alignment
Alignment number: 1
Format: aligned FASTA
Number of sequences: 134414
Alignment length: 29903
Total # residues: 4005663633
Smallest: 29105
Largest: 29903
Average length: 29800.9
Average identity: 100%
//
alignment stats of global alignment after masking sites
Alignment number: 1
Format: aligned FASTA
Number of sequences: 134414
Alignment length: 29903
Total # residues: 3987748459
Smallest: 29036
Largest: 29675
Average length: 29667.7
Average identity: 100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number: 1
Format: aligned FASTA
Number of sequences: 134370
Alignment length: 29903
Total # residues: 3986445916
Smallest: 29036
Largest: 29675
Average length: 29667.7
Average identity: 100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number: 1
Format: aligned FASTA
Number of sequences: 134370
Alignment length: 29646
Total # residues: 3976491642
Smallest: 28498
Largest: 29646
Average length: 29593.6
Average identity: 100%
//
After filtering sequences with TreeShrink
Type: Phylogram
#nodes: 233128
#leaves: 134252
#dichotomies: 93693
#leaf labels: 134252
#inner labels: 87918
Number of new sequences added this iteration
5656 alignment_names_new.txt
此版本中脚本的重大更改
名称 | 尺寸 | |
---|---|---|
罗布兰夫 / sarscov2phylo-5-11-20.zip
md5:b00c585992c8af308fb968665a603718 |
10.0兆字节 | 下载 |