数据集 开放存取

SRSF1和RNPS1识别的人和病毒RNA结合位点和位点簇的特征

罗根(PK); EJ穆卡基; 卑诗省雪莉


都柏林核心出口

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Rogan, PK</dc:creator>
  <dc:creator>Mucaki, EJ</dc:creator>
  <dc:creator>Shirley, BC</dc:creator>
  <dc:date>2020-12-10</dc:date>
  <dc:description>This dataset was developed for the following article:

 Rogan PK, Mucaki EJ and Shirley BC. A proposed molecular mechanism for pathogenesis of severe RNA-viral pulmonary infections [version 1; peer review: awaiting peer review]. F1000Research 2020, 9:943 (//doi.org/10.12688/f1000research.25390.1)

Section 1. Extended Data Tables

This archive contains the extended data tables for the research article "A proposed mechanism for molecular pathogenesis of severe RNA-viral pulmonary infections". These tables provide SRSF1, RNPS1 and hnRNP A1 binding site and information-dense cluster counts across various RNA viral genomes [including multiple SARS-CoV-2 and influenza strains] and the human transcriptome, the estimated SARS-CoV-2 doubling time necessary for viral genome SRSF1 binding site availability to exceed sites within the host transcriptome, and an analysis of influenza, dengue, and aplastic anemia patients misdiagnosed as irradiated by established radiation gene signatures.These tables are:

Section 1 - Table 1. RNPS1 and hnRNPA1 binding sites and Information-Dense Clusters for RNPS1 and
hnRNPA1 in RNA Virus Genomes
Section 1 - Table 2A. Detailed Analysis of Information-Dense Clusters for SRSF1 (Replicate 1) in RNA Virus
Genomes
Section 1 - Table 2B. Detailed Analysis of Information-Dense Clusters for SRSF1 (Replicate 2) in RNA Virus
Genomes
Section 1 - Table 2C. Detailed Analysis of Information-Dense Clusters for RNPS1 in RNA Virus Genomes
Section 1 - Table 2D. Detailed Analysis of Information-Dense Clusters for hnRNP A1 in RNA Virus
Genomes
Section 1 - Table 3. Binding Site Analysis of Multiple 新冠病毒 Strains (Both Strands)
Section 1 - Table 4A. Binding Site Analysis of Multiple Influenza A (H3N2) Strains (Negative Strand Only)
Section 1 - Table 4B. Binding Site Analysis of Multiple Influenza A (H3N2) Strains (Both Strands)
Section 1 - Table 5. SRSF1, RNPS1 and hnRNPA1 Binding Sites and Information-Dense Clusters by Gene
Section 1 - Table 6A. Transcriptome-Wide Information Dense Clusters Intersecting DRIP- and DRIPc-seq
Intervals
Section 1 - Table 6B. Exome-Wide Information Dense Clusters within DRIP- and DRIPc-seq Intervals
Section 1 - Table 6C. Transcriptome-Wide Scan of Strong Binding Sites Intersecting DRIP- and DRIPc-seq
Intervals
Section 1 - Table 6D. Exome-Wide Scan of Strong Binding Sites within DRIP- and DRIPc-seq Intervals
Section 1 - Table 7. Rate of False Positives for Influenza, Dengue Virus and Aplastic Anemia Using
Radiation Signatures
Section 1 - Table 8. Radiation Model Genes Contributing to False Positives for Patients with Influenza A,
Dengue Virus, and Aplastic Anemia
Section 1 - Table 9A. Doubling Time of SARS-CoV-2 Needed to Exceed Host Transcriptome SRSF1 Binding
Sites (Positive-Strand Sites Only)
Section 1 - Table 9B. Doubling Time of SARS-CoV-2 Needed to Exceed Host Transcriptome SRSF1 Binding
Sites (Both Strands Considered)

Section 2.  All SRSF1, hnRNPA1 and RNPS1 binding site tracks for human and viral genomes

We provide bedgraph tracks which provide the location and strength of binding sites (and binding site clusters) for SRSF1, RNPS1 and hnRNPA1 across the human transcriptome (GRCh37), the human exome (including +/-300nt surrounding the exon; non-intergenic only), and for all viral genome investigated in this study (Coronavirus, Dengue, HIV-1 [two strains] and Influenza [two strains]). Note that if no clusters were found for a particular viral genome, a file for said genome will not be present in the Zenodo archive.

Folder “Cluster-to-DRIPseq-Intersection-Tracks” contain tracks which indicate where binding site clusters have been identified, intersected with DRIP-seq and DRIPc-seq intervals which indicate where there is evidence of R-Loop formation in the human genome. The DRIP-seq dataset (GSE68845) is not strand specific. DRIPc-seq (GSE70189) is strand specific, and has been taken into account in the intersection (e.g. tracks only list positive strand clusters found in positive-strand DRIPc-seq intervals).

Due to sheer size, the human transcriptome and exome tracks which indicate the location of individual binding sites are split into two separate files (separated by strand). While the custom tracks containing human binding site information are designed to be uploaded to the UCSC Genome Browser, files containing transcriptome-wide binding site information may be too large to be uploaded and may require further filtering (i.e. by chromosome).

To be classified as a cluster, binding sites on the same strand must have Ri values which sum to &gt;50 bits, each binding site must have a neighboring site within 25nt, and all binding sites in the cluster must have Ri greater than a minimum bit threshold. For human transcriptomes and exomes, this bit minimum was set to Rsequence. The bit minimum for viral binding sites was set to 0.1 * Rsequence. The information density-based clustering algorithm utilized in this work is described in  Lu and Rogan 2018 (//f1000research.com/articles/7-1933/v2) and archived source code is available through Zenodo (//dx.doi.org/10.5281/zenodo.1892051).

Section 3. Binding site clusters - lollipop plots

Lollipop plots present the genomic coordinates and information densities of clusters across the human transcriptome, human exome, and viral genomes (Coronavirus, Dengue, HIV-1 [two strains] and Influenza [one strain]). The height of the "lollipop" corresponds to the information density of a cluster. Labels above "lollipops" present the start and end genomic coordinate (GRCh37) of the cluster followed by the number of sites in the cluster enclosed in brackets. Lollipop plots associated with human transcriptomes/exomes each contain a single gene. Influenza has 8 segments and each segment requires its own plot, other viral genomes examined are presented in a single plot.

File naming convention for human plots:


	RBP_Gene.png
	e.g. RNPS1_ADK.png


File naming convention for viral plots (elements in square brackets do not always appear):


	Virus[.InfluenzaSegment].RiThreshold.Strand.RBP.png
	e.g. Wuhan-Hu-1.complete-genome.4.2-bits.PosStrand.hnRNPA1.png


The specified Ri threshold indicates all binding sites which comprise a cluster have Ri greater-than or equal to the threshold.

Section 4. Ri(b,l) matrices for all binding sites scanned

The information theory-based position weight matrices for the following RNA结合蛋白 (RBP) used in this study: SRSF1, hnRNPA1 and RNPS1. We investigated binding using two different RNPS1 binding models. While similar, these two models contained binding site information on opposing sides of the binding site motif which is why we found it prudent to scan with both models.

Structure of each file:

Line #1: Start position, End position and Rsequence [average strength of sequences used to generate the model]

Subsequent lines describe the information on each position of the binding site:


	First four columns: Ri contribution of nucleotide at this position of the matrix [A, C, G, T]
	Row #5: Position of the matrix
	Last four columns: Number of binding sites used to generate model with a particular nucleotide at this position of the matrix [A, C, G, T]


Example:

-2.965775           1.282153            0.034225            -4.906891           0            1              19          8            0

At zero position of the matrix (first nucleotide), a ‘C’ would have a positive contribution to binding site strength, a ‘G’ would be relatively neutral, and an ‘A’ or ‘T’ would negatively contribute to binding site strength.

Generation of Ri(b,l) matrices and computation of Ri values and can be accomplished by utilizing the Delila package (//alum.mit.edu/www/toms/delila/delilaprograms.html).

Section 5. Ri and intersite distance - histograms

Two sets of histograms present Ri distribution and intersite distance distribution across the human transcriptome, human exome, and viral genomes (Coronavirus, Dengue, HIV-1 [two strains] and Influenza [one strain]). 

File naming convention for human plots (elements in square brackets do not always appear):


	[IntersiteDistancesThreshold-]Human-[DRIPc]-AllChrs-RBP[-RiThreshold].png
	e.g. IntersiteDistances500-Human-AllChrs-hnRNPA1-4.6-bits.png


File naming convention for viral plots (elements in square brackets do not always appear):


	[IntersiteDistancesThreshold-]Strand-RBP-Virus[.InfluenzaSegment][-RiThreshold].png
	e.g. IntersideDistances1000-PosStrandOnly-SRSF1-top50000sitesReplicate1-HIV-1-Strain-B.png


Intersite distance thresholds of 500 or 1000 were assigned for all intersite distance histograms. Any distances above the corresponding threshold were excluded from the plot. Plots presenting Ri distributions contain a dashed line indicating Rsequence if it is visible within the scope of the plot.

Section 6. Perl Scripts and Descriptions

This archive contains all Perl scripts discussed in this archive's associated manuscript and a document file which describes them ("Perl-Script-Descriptions-Page.docx"). The programs and their general functions are as follows:

“ClusterToDRIPseqAnalysisProgram.pl” – reports which information-dense clusters are located within DRIPc- and/or DRIP-seq intervals (individually and by gene)

“ClusterToDRIPseqAnalysisProgram.GeneDensityFinder.pl” – uses the output from script “ClusterToDRIPseqAnalysisProgram.pl” to determine the number and the density of information-dense clusters within a gene (total clusters within the gene and those within DRIPc-seq intervals)

“calculateIntersiteDistance.pl” – determines the distance between all binding sites in the same gene from a list of genomic coordinates

“removeOutliersHigherThanN.pl” – discards intersite distances computed by script “calculateIntersiteDistance.pl” that are greater than a specified threshold

“getStatisticsOnCol.pl” – calculates the count, geometric mean, median, arithmetic mean, and standard deviation of values from the output of script “removeOutliersHigherThanN.pl”

“ScanDataSummaryProgram.pl” – determines the number of binding sites (above a specified Ri threshold) found within known genes (the program also reports the total expression of those genes using external A549 and pneumocyte expression datasets) from binding site coordinate data

“TotalBindingSitePerCellCalculator.pl” – estimates the number of binding sites expressed in a single A549 or pneumocyte cell at any given time.</dc:description>
  <dc:description>Also see Infographic:
Rogan, Peter; Klesc, Ryan; Mucaki, Eliseos; C. Shirley, Ben (2020): A proposed molecular mechanism for pathogenesis of severe RNA-viral pulmonary infections. figshare. Figure. //doi.org/10.6084/m9.figshare.12718799.v1</dc:description>
  <dc:identifier>//americinnmankato.com/record/4315165</dc:identifier>
  <dc:identifier>10.5281 / zenodo.4315165</dc:identifier>
  <dc:identifier>oai:zenodo.org:4315165</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>doi:10.5281 / zenodo.3737089</dc:relation>
  <dc:relation>url://americinnmankato.com/communities/covid-19</dc:relation>
  <dc:relation>url://americinnmankato.com/communities/zenodo</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>//creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>SARS-CoV-2</dc:subject>
  <dc:subject>COVID19</dc:subject>
  <dc:subject>RNA binding proteins</dc:subject>
  <dc:subject>Coronavirus</dc:subject>
  <dc:subject>molecular mechanisms</dc:subject>
  <dc:subject>SRSF1</dc:subject>
  <dc:subject>RNPS1</dc:subject>
  <dc:title>SRSF1和RNPS1识别的人和病毒RNA结合位点和位点簇的特征</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
850
79
意见
资料下载
所有版本 这个版本
观看次数 850107
资料下载 7920
数据量 43.3 GB8.8 GB
独特的景色 69892
独特下载 4214

分享

引用为