Perhaps your file is not utf-8. I cannot reconstruct it.
### HADOOP: ~/Downloads ❯❯❯ hdfs -cat hdfs://xxxx/test.txt 17/04/06 13:43:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1.0 862910025238798 100733314 18_百度输入法:100733314 8919173c6d49abfab02853458247e584 1:129:18_百度输入法:1.0% ### SPARK scala> val t = sc.textFile("hdfs://xxxx/test.txt") t: org.apache.spark.rdd.RDD[String] = hdfs://xxxx/test.txt MapPartitionsRDD[3] at textFile at <console>:24 scala> t.first res2: String = 1.0 862910025238798 100733314 18_百度输入法:100733314 On Thu, Apr 6, 2017 at 12:47 PM, Jone Zhang <joyoungzh...@gmail.com> wrote: > var textFile = sc.textFile("xxx"); > textFile.first(); > res1: String = 1.0 100733314 18_?????:100733314 > 8919173c6d49abfab02853458247e584 1:129:18_?????:1.0 > > > hadoop fs -cat xxx > 1.0 100733314 18_百度输入法:100733314 8919173c6d49abfab02853458247e584 > 1:129:18_百度输入法:1.0 > > Why chinese character gash appear when i use spark textFile? > The code of hdfs file is utf-8. > > > Thanks >