Spark笔记

[root@master bin]# ./spark-shell  --master yarn-client 
20/03/21 02:48:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/21 02:48:40 INFO spark.SecurityManager: Changing view acls to: root
20/03/21 02:48:40 INFO spark.SecurityManager: Changing modify acls to: root
20/03/21 02:48:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
20/03/21 02:48:40 INFO spark.HttpServer: Starting HTTP Server
20/03/21 02:48:41 INFO server.Server: jetty-8.y.z-SNAPSHOT
20/03/21 02:48:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50659
20/03/21 02:48:41 INFO util.Utils: Successfully started service 'HTTP class server' on port 50659.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
20/03/21 02:48:52 INFO spark.SparkContext: Running Spark version 1.6.0
20/03/21 02:48:52 INFO spark.SecurityManager: Changing view acls to: root
20/03/21 02:48:52 INFO spark.SecurityManager: Changing modify acls to: root
20/03/21 02:48:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
20/03/21 02:48:53 INFO util.Utils: Successfully started service 'sparkDriver' on port 38244.
20/03/21 02:48:54 INFO slf4j.Slf4jLogger: Slf4jLogger started
20/03/21 02:48:54 INFO Remoting: Starting remoting
20/03/21 02:48:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.90:44009]
20/03/21 02:48:55 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 44009.
20/03/21 02:48:55 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/21 02:48:55 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/21 02:48:55 INFO storage.DiskBlockManager: Created local directory at /usr/local/src/spark-1.6.0-bin-hadoop2.6/blockmgr-bae586ec-9c91-4d9b-bbec-682c94e5fe14
20/03/21 02:48:55 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
20/03/21 02:48:56 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/21 02:48:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
20/03/21 02:48:56 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
20/03/21 02:48:56 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/21 02:48:56 INFO ui.SparkUI: Started SparkUI at http://192.168.0.90:4040
20/03/21 02:48:57 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.90:8032
20/03/21 02:48:58 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
20/03/21 02:48:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/21 02:48:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
20/03/21 02:48:58 INFO yarn.Client: Setting up container launch context for our AM
20/03/21 02:48:58 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/21 02:48:58 INFO yarn.Client: Preparing resources for our AM container
20/03/21 02:49:00 INFO yarn.Client: Uploading resource file:/usr/local/src/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://master:9000/user/root/.sparkStaging/application_1584680108277_0036/spark-assembly-1.6.0-hadoop2.6.0.jar
scala> lines.flatMap(_.split(" "))
res25: List[String] = List(Preface, “The, Forsyte, Saga”, was, the, title, originally, destined, for, that, part, of, it, which, is, called, “The, Man, of, Property”;, and, to, adopt, it, for, the, collected, chronicles, of, the, Forsyte, family, has, indulged, the, Forsytean, tenacity, that, is, in, all, of, us., The, word, Saga, might, be, objected, to, on, the, ground, that, it, connotes, the, heroic, and, that, there, is, little, heroism, in, these, pages., But, it, is, used, with, a, suitable, irony;, and,, after, all,, this, long, tale,, though, it, may, deal, with, folk, in, frock, coats,, furbelows,, and, a, gilt-edged, period,, is, not, devoid, of, the, essential, heat, of, conflict., Discounting, for, the, gigantic, stature, and, blood-thirstiness, of, old, days,, as, they, ha...
scala> lines.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues((_.size)).toArray.sortWith(_._2>_._2).slice(0,10) 
res26: Array[(String, Int)] = Array((the,5144), (of,3407), (to,2782), (and,2573), (a,2543), (he,2139), (his,1912), (was,1702), (in,1694), (had,1526))
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import spark.sql
import spark.sql

scala> val orders = sql("select * from badou.orders")
20/03/22 10:23:16 ERROR metastore.ObjectStore: Version information found in metastore differs 0.13.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
20/03/22 10:23:20 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy17.create_database(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
        at com.sun.proxy.$Proxy18.createDatabase(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:309)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
        at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:308)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
        at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
        at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51)
        at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49)
        at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
        at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
        at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
        at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
        at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)
        at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
        at $line18.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:33)
        at $line18.$read$$iw$$iw$$iw$$iw.<init>(<console>:35)
        at $line18.$read$$iw$$iw$$iw.<init>(<console>:37)
        at $line18.$read$$iw$$iw.<init>(<console>:39)
        at $line18.$read$$iw.<init>(<console>:41)
        at $line18.$read.<init>(<console>:43)
        at $line18.$read$.<init>(<console>:47)
        at $line18.$read$.<clinit>(<console>)
        at $line18.$eval$.$print$lzycompute(<console>:7)
        at $line18.$eval$.$print(<console>:6)
        at $line18.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
        at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
        at org.apache.spark.repl.Main$.doMain(Main.scala:68)
        at org.apache.spark.repl.Main$.main(Main.scala:51)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

orders: org.apache.spark.sql.DataFrame = [order_id: string, user_id: string ... 5 more fields]

scala> orders.show(10)
+--------+-------+--------+------------+---------+-----------------+----------------------+
|order_id|user_id|eval_set|order_number|order_dow|order_hour_of_day|days_since_prior_order|
+--------+-------+--------+------------+---------+-----------------+----------------------+
|order_id|user_id|eval_set|order_number|order_dow|order_hour_of_day|  days_since_prior_...|
| 2539329|      1|   prior|           1|        2|               08|                      |
| 2398795|      1|   prior|           2|        3|               07|                  15.0|
|  473747|      1|   prior|           3|        3|               12|                  21.0|
| 2254736|      1|   prior|           4|        4|               07|                  29.0|
|  431534|      1|   prior|           5|        4|               15|                  28.0|
| 3367565|      1|   prior|           6|        2|               07|                  19.0|
|  550135|      1|   prior|           7|        1|               09|                  20.0|
| 3108588|      1|   prior|           8|        1|               14|                  14.0|
| 2295261|      1|   prior|           9|        1|               16|                   0.0|
+--------+-------+--------+------------+---------+-----------------+----------------------+
only showing top 10 rows
原文地址:https://www.cnblogs.com/hackerer/p/12541479.html