Re: multiple splits fails

Eliran Bivas Sun, 03 Apr 2016 04:37:10 -0700

Hi Mich,

Few comments:


When doing .filter(_ > “”) you’re actually doing a lexicographic comparison and 
not filtering for empty lines (which could be achieved with _.notEmpty or 
_.length > 0).
I think that filtering with _.contains should be sufficient and the first 
filter can be omitted.

As for line => line.split(“\t”).split(“,”):
You have to do a second map or (since split() requires a regex as input) 
.split(“\t,”).
The problem is that your first split() call will generate an Array and then 
your second call will result in an error.
e.g.

val lines: Array[String] = line.split(“\t”)
lines.split(“,”) // Compilation error - no method split() exists for Array

So either go with map(_.split(“\t”)).map(_.split(“,”)) or map(_.split(“\t,”))

Hope that helps.

Eliran Bivas
Data Team | iguaz.io<http://iguaz.io>


On 3 Apr 2016, at 13:31, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:

Hi,

I am not sure this is the correct approach

Read a text file in

val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt")


Now I want to get rid of empty lines and filter only the lines that contain 
"ASE15"

 f.filter(_ > "").filter(_ contains("ASE15")).

The above works but I am not sure whether I need two filter transformation 
above? Can it be done in one?

Now I want to map the above filter to lines with carriage return ans split them 
by ","

f.filter(_ > "").filter(_ contains("ASE15")).map(line => (line.split("\t")))
res88: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[131] at map 
at <console>:30

Now I want to split the output by ","

scala> f.filter(_ > "").filter(_ contains("ASE15")).map(line => 
(line.split("\t").split(",")))
<console>:30: error: value split is not a member of Array[String]
              f.filter(_ > "").filter(_ contains("ASE15")).map(line => 
(line.split("\t").split(",")))
                                                                                
         ^
Any advice will be appreciated

Thanks

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

Re: multiple splits fails

Reply via email to