haskell笔记二:foldl与foldr的效率(惰性计算与尾递归)

本文主要来自:http://www.haskell.org/haskellwiki/Stack_overflow

因为haskell经常使用到递归,我们知道递归常会引起堆栈(或堆)的内存分配,导致效率低下.通常会尽量写为尾递归,避免这一问题.做法是把普通递归中被保存在堆栈中的临时值(当前递归层次的结果或状态)当作下一个递归层次的参数传入.

//普通递归.
int fib(int n) {
    if(n < 2)
    return n;
    return fib(n-1) + fib(n-2);
}

//尾递归
int fib(int n) {
    if(n < 3)
        return 1;
    return tail(n, 1, 1, 3);
}

int tail(int n, int a, int b, int it) {
    if(n == it)
        return a + b;
    return tail(n, b, a+b, it+1);
}

但是haskell惰性计算的特性,有些代码看起来像是尾递归,却不一定总是高效的.

foldl的实现:

foldl step zero (x:xs) = foldl step (step zero x) xs
foldl _    zero []     = zero

foldl (+) 0 [1..10]展开:

foldl (+) 0 (1:[2..10]) ->
foldl (+) (0+1) (2:[3..10]) ->
foldl (+) (0+1+2) (3:[4..10]) ->
foldl (+) (0+1+2+3) (4:[5..10]) -> ...

最后我们将得到形如(0+1+2+3+...)的表达式.但是因为惰性计算的原因,这个表达式只有当foldl需要被计算时,才会执行.为了保存这个表达式,haskell把相关的参数值保存在chunk中.所以优化的办法是在每个递归层次中强迫求值.haskell中foldl的优化方法是:foldl'.其使用seq函数强迫计算最外层表达式的值.

foldr的实现:

foldr step zero (x:xs) = step x (foldr step zero xs)
foldr _    zero []     = zero

foldr (+) 0 [1..10]展开:

mysum (+) 0 [1..10] ->
foldr (+) 0 (1:[2..10]) ->
1+foldr (+) 0 (2:[3..10]) ->
1+(2+foldr (+) 0 (3:[4..10])) -> ...

同样是不高效的.此时的情况与命令式语言中的普通递归是差不多的.需要保存foldr的值,因为未计算下一个foldr的值时当前层次的结果是不知道的.

但是foldr在另一种情境下,却十分高效.

foldr (++) [] [[1],[2]...[10]]展开:

foldr (++) [] ([1]:[[2],[3],...]) ->
(1:[])++ foldr (++) [] [[2],[3],...] ->
1:([]++ foldr (++) [] [[2],[3],...])

为什么这段代码相比前一个是高效的?

前者(+)严格求值(立即计算)的函数,而(++)是惰性求值(推迟计算)函数.

(+)号的第二个参数是严格的,它要求计算foldr的值.于是递归将一直计算下去直到得出该值.(++)号是非严格的,它不要求立即计算foldr的值.你看它最后的展开结果是:

1:2:3...:10:[]

这实际上就是列表的构造式.因此foldr可以应用在无限列表中.它只会根据参数对列表中的有限元素进行运算.

So, concat runs in a constant amount of stack and further can handle infinite lists (as a note, it's immediately obvious foldl(') can never work on infinite lists because we'll always be in the (:) case and that always immediately recurses). The differentiator between mysum and concat is that (++) is not strict* in its second argument; we don't have to evaluate the rest of the foldr to know the beginning of concat. In mysum, since (+) is strict in its second argument, we need the results of the whole foldr before we can compute the final result.

So, we arrive at the one-line summary: A function strict* in its second argument will always require linear stack space with foldr, so foldl' should be used instead in that case. If the function is lazy/non-strict in its second argument we should use foldr to 1) support infinite lists and 2) to allow a streaming use of the input list where only part of it needs to be in memory at a time.