Hi, I need to parse a file which is separated by a series of separators. I used SparkContext.textFile and I met two problems:
1) One of the separators is '\004', which could be recognized by python or R or Hive, however Spark seems can't recognize this one and returns a symbol looking like '?'. Also this symbol is not a question mark and I don't know how to parse. 2) Some of the separator are composed of several Chars, like "} =>". If I use str.split(Array('}', '=>')), it will separate the string but with many white spaces included in the middle. Is there a good way that I could separate by String instead of by Array of Chars? Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sc-textFile-can-t-recognize-004-tp8059.html Sent from the Apache Spark User List mailing list archive at Nabble.com.