一,创建Dataframe scala> val df = sc.parallelize(Seq( | | (0,"cat26",30.9), | | (1,"cat67",28.5), | | (2,"cat56",39.6), | | (3,"cat8",35.6))).toDF("Hour", "Category", "Value") df: org.apache.spark.sql.DataFrame = [Hour: int, Category: string ... 1 more field] scala> df.show() +----+--------+-----+ |Hour|Category|Value| +----+--------+-----+ | 0| cat26| 30.9| | 1| cat67| 28.5| | 2| cat56| 39.6| | 3| cat8| 35.6| +----+--------+-----+ 二,方法1:(!号是取反) scala> var df1 = df.select(df.columns.filter(x => !x.contains("Val")).map(df(_)) : _*) df1: org.apache.spark.sql.DataFrame = [Hour: int, Category: string] scala> df1.show() +----+--------+ |Hour|Category| +----+--------+ | 0| cat26| | 1| cat67| | 2| cat56| | 3| cat8| +----+--------+ 三,方法2: scala> val regex = """^((?!Va).)*$""".r regex: scala.util.matching.Regex = ^((?!Va).)*$ scala> val selection = df.columns.filter(s => regex.findFirstIn(s).isDefined) selection: Array[String] = Array(Hour, Category) scala> var newdf = df.select(selection.head, selection.tail : _*) newdf: org.apache.spark.sql.DataFrame = [Hour: int, Category: string] scala> newdf.show() +----+--------+ |Hour|Category| +----+--------+ | 0| cat26| | 1| cat67| | 2| cat56| | 3| cat8| +----+--------+ 正则表达式这块没怎么研究,可参考: https://www.runoob.com/scala/scala-regular-expressions.html https://stackoverflow.com/questions/59065137/select-columns-in-spark-dataframe-based-on-column-name-pattern
Spark Dataframe Scala选择部分列
如果没有一直坚持,也不会有质的飞跃,当生命有了限度,每个人的价值就会浮现。
船长博客,期待共同交流提高!
本文如对您有帮助,记得点击右下边小球【赞一下】,热烈期待您关注博客 n(*≧▽≦*)n