java.io.IOException: No FileSystem for scheme: hdfs

参考:https://www.cnblogs.com/justinzhang/p/4983673.html
https://blog.csdn.net/qq_31806205/article/details/80450742

scala调用hdfs出现这个问题的解决办法:
步骤1. maven的pom.xml加上hadoop-hdfs-3.6.0.jar和hadoop-common-3.6.0.jar两个jar包
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.0</version> </dependency>

步骤2. 在设置hadoop的配置的时候,显示设置这个类:"org.apache.hadoop.hdfs.DistributedFileSystem"
`
import java.net.URI
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._

val conf = new Configuration()
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hdfs = FileSystem.get(URI.create(path),conf)
`

spark调用hdfs产生这个问题的解决办法:
`
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().appName("UserLikeRec")
.config("spark.default.parallelism",1000)
.config("spark.hadoop.validateOutputSpecs", "false")
.master("local").getOrCreate()
// 增加这一句就可以解决问题,明确定义这个类:"org.apache.hadoop.hdfs.DistributedFileSystem"
val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hadoopConf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

val metaPath = "hdfs://...:8020/user/igallery5/prd/meta/"
val p2lPath = metaPath + "Pic_Label/part*"
val pic_label = spark.sparkContext.objectFile(String, List[String])
`

原文地址:https://www.cnblogs.com/xl717/p/13211854.html