关于选择性起始位点的新方法之SEASTAR: systematic evaluation of alternative transcription start sites in RNA

清华生物信息学

1MOE Key Laboratory of Bioinformatics, Bioinformatics Division TNLIST / Department of Automation, Tsinghua university 

实验室17年4月的在核酸研究上的文章,开发了shell script为主的选择性剪切方法,分析比较也得出了比较好的结果。

对AFE的定量以及功能解释有了很好的说明,同时对现存的统计模型进行了有优化和新的解释。

 首先把我们第一个外显子的所有分布情况做一个全面的展示和列举

用五种方法在细胞核内、细胞质中进行链特异性和非特异性AFE事件的辨认率的比较。

Performance assessment of the five methods for FE identication using the reference CAGE data. (A and B) The receiver operating characteristic (ROC) curves of the five methods on non-strand-specic RNA-seq data of the nuclear (A) and cytoplasmic fractions (B) of the KhES cell line. (C and D) The ROC curves of the five methods on strand-specic RNA-seq data of the nuclear (C) and cytoplasmic fractions (D) of the H1-hES cell line. The logistic regression model has the best performance in all cases.

Identication and features of FEs across multiple cell types,鉴定在不同的细胞系中FE的识别,依据CAGE数据作为reference data

我们在A图的相关性的图中可以看出,cor score都比较高。在B图的比较中可以看出,在TSS区域CAGE的曲线分布作为金标准,CAGE可检出的为红色,不可检出的为天蓝色,SEASTAR方法的已知的是深蓝色,新发现的是蓝绿色。可以发现SEASTAR的结果与CAGE的趋势基本保持一致同时有相差不大的average coverage;

具体举例:

Examples of differentially used AFEs and tandem TSSs between the GM12878 and K562 cell lines. (A) Differentially used AFEs in gene RPS6KA1.(正义链) (B) Differentially used AFEs in gene BIN1(反义链). (C) Differentially used tandem TSSs in gene ATP6V1E2(反义链). (D) Differentially used tandem TSSs in gene SLC35D1(反义链).

之后常规操作之后,AFE事件PSI定量完成。画出AFE的psi随着多能细胞分化过程的PSI分化图;同时画出126个特异性表达的转录因子热图,同样的也是change along with IPSC reprogramming process;热图展示AFE的PSI以及TF因子的PSI的传统pearson相关性参数,都有比较明显的cluster以及特异性的变化趋势。

右边的基因贴上去是top10 variable的candidates

 

 最后,针对一个例子,mycn。将其筛选出来的标准为 P-value=0.00028

we found multiple TFs known to be key regulators of reprogramming including the top ranked N- Myc (Mycn) gene (with P-value of 0.00028). AFEs containing the Mycn motif were signicantly enriched towards the top of the AFEs positively correlated with Mycn expression in our enrichment analysis. We further investigated the expression level of Mycn,as well as the average PSI values of AFEs that contain the Mycn motif and have strong positive correlation with Mycn expression (PCC > 0.5) (Figure 6C). The signicant increase of Mycn expression during iPSC reprogramming (P-value =8.9e–16, ANOVA test) was accompanied by an increase in the relative usage of these AFEs. The coordinated change in expression levels between Mycn and the differentially used AFEs containing the Mycn motif suggests that Mycn binds to and promotes the usage of these AFEs. Mycn is known to play an essential role in the maintenance of pluripotency. Mycn can cooperate with other TFs to reprogram adult cells into other differentiated cells or into iPS cells Msx2, another transcription factor identied in our  enrichment analysis, is a major driver of de-differentiation in mammalian muscle cells .

Collectively, these data imply that TFs with high scores from the enrichment analysis of differential AFEs play important roles in iPSC reprogramming and the regulation of the pluripotent state.#总结性陈述。

Mycn基因随着发育阶段的基因表达以及关于FEs事件的PSI值的变化的demo

原文地址:https://www.cnblogs.com/beckygogogo/p/10511562.html