spark rdd topK sort spark获取topK排序

You can use either top or takeOrdered with key argument:

newRDD.top(2, key=lambda x: x[2])

or

newRDD.takeOrdered(2, key=lambda x: -x[2])

Note that top is taking elements in descending order and takeOrdered in ascending so key function is different in both cases.

如果是pairrdd对key,或者value排序的话:

Sort by key and value in ascending and descending order

val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)

//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)
原文地址:https://www.cnblogs.com/bonelee/p/14522640.html