软件 开放存取
罗布兰夫; 理查德·曼斯菲尔德
引用和重用
请将此版本引用为:
Lanfear,Rob(2020年)。来自GISAID的SARS-CoV-2序列的全局系统发育。 Zenodo 土井:10.5281 / zenodo.3958883
如果发布使用该树的论文,则仍必须遵循GISAID数据共享和归因规则。
细节此版本中的树是使用以下命令行生成的:
bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250
[gisaid.fasta]是从GISAID到发布标题中包括日期在内的高覆盖率和完整原始序列的fasta文件,由GISAID数据Feed上的“提交日期”过滤器确定
[previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv
and ft_SH.tree
files as the starting points of the current iteration.
sequences downloaded from GISAID
146552
//
alignment stats of global alignment
Alignment number: 1
Format: aligned FASTA
Number of sequences: 143902
Alignment length: 29903
Total # residues: 4288314661
Smallest: 29105
Largest: 29903
Average length: 29800.2
Average identity: 100%
//
alignment stats of global alignment after masking sites
Alignment number: 1
Format: aligned FASTA
Number of sequences: 143902
Alignment length: 29903
Total # residues: 4269256702
Smallest: 29036
Largest: 29675
Average length: 29667.8
Average identity: 100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number: 1
Format: aligned FASTA
Number of sequences: 143858
Alignment length: 29903
Total # residues: 4267954159
Smallest: 29036
Largest: 29675
Average length: 29667.8
Average identity: 100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number: 1
Format: aligned FASTA
Number of sequences: 143858
Alignment length: 29646
Total # residues: 4257455014
Smallest: 28337
Largest: 29646
Average length: 29594.8
Average identity: 100%
//
After filtering sequences with TreeShrink
Type: Phylogram
#nodes: 249182
#leaves: 143782
#dichotomies: 99803
#leaf labels: 143782
#inner labels: 93540
Number of new sequences added this iteration
3299 alignment_names_new.txt
此版本中脚本的重大更改
名称 | 尺寸 | |
---|---|---|
罗布兰夫 / sarscov2phylo-11-11-20.zip
md5:17a276b142a2c13b8b482392ca06ba4f |
10.1兆字节 | 下载 |