DESeq2中estimateSizeFactorsForMatrix源码学习

https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/estimateSizeFactorsForMatrix

1.描述

Given a matrix or data frame of count data, this function estimates the size factors as follows: Each column is divided by the geometric means of the rows.

The median of these ratios (skipping the genes with a geometric mean of zero) is used as the size factor for this column. 

每列被每行的均值除?

一般不会直接调用它,而是通过调用estimateSizeFactors来间接调用它。

2.论文介绍

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-10-r106

  上式中,i是gene,j是cell,k矩阵是n*m的,分母相当于gene_i的所在行的均值。结合函数的描述,计算过程:每列/行的均值,在每列中选择除了0之外的数的中位数,作为每个cell的size因子。这就是比例中位数方法。

3.源码

function (counts, locfunc = stats::median, geoMeans, controlGenes, 
    type = c("ratio", "poscounts")) 
{
    type <- match.arg(type, c("ratio", "poscounts"))#"ratio",返回的是ratio
    if (missing(geoMeans)) {#默认进入这条判断
        incomingGeoMeans <- FALSE
        if (type == "ratio") {
            loggeomeans <- rowMeans(log(counts))#计算log后行的均值
        }
        else if (type == "poscounts") {
            lc <- log(counts)
            lc[!is.finite(lc)] <- 0
            loggeomeans <- rowMeans(lc)
            allZero <- rowSums(counts) == 0
            loggeomeans[allZero] <- -Inf
        }
    }
    else {
        incomingGeoMeans <- TRUE
        if (length(geoMeans) != nrow(counts)) {
            stop("geoMeans should be as long as the number of rows of counts")
        }
        loggeomeans <- log(geoMeans)
    }
    if (all(is.infinite(loggeomeans))) {#有0值很正常,到了这里就退出???
        stop("every gene contains at least one zero, cannot compute log geometric means")
    }
    sf <- if (missing(controlGenes)) {
        apply(counts, 2, function(cnts) {
            exp(locfunc((log(cnts) - loggeomeans)[is.finite(loggeomeans) & 
                cnts > 0]))
        })
    }
    else {
        if (!(is.numeric(controlGenes) | is.logical(controlGenes))) {
            stop("controlGenes should be either a numeric or logical vector")
        }
        loggeomeansSub <- loggeomeans[controlGenes]
        apply(counts[controlGenes, , drop = FALSE], 2, function(cnts) {
            exp(locfunc((log(cnts) - loggeomeansSub)[is.finite(loggeomeansSub) & 
                cnts > 0]))
        })
    }
    if (incomingGeoMeans) {
        sf <- sf/exp(mean(log(sf)))
    }
    sf
}

但是源码没有跑通。

原文地址:https://www.cnblogs.com/BlueBlueSea/p/13926249.html