If what you have is a large number of named strings, why not use a Map[String,String] to represent them? If you're approaching a class with >22 String fields anyway, it probably makes more sense. You lose a bit of compile-time checking, but gain flexibility.
Also, merging two Maps to make a new one is pretty simple, compared to making many of these values classes. (Although, if you otherwise needed a class that represented "all of the things in class A and class B", this could be done easily with composition, a class with an A and a B inside.) On Thu, Jul 17, 2014 at 9:15 AM, Luis Guerra <[email protected]> wrote: > Hi all, > > I am a newbie Spark user with many doubts, so sorry if this is a "silly" > question. > > I am dealing with tabular data formatted as text files, so when I first load > the data, my code is like this: > > case class data_class( > V1: String, > V2: String, > V3: String, > V4: String, > V5: String, > V6: String, > V7: String) > > val data= sc.textFile(data_path) > .map(x => { > val fields = (x+" ").split("\t") > data_class(fields(0).trim(),fields(1).trim(),fields(2).trim(),fields(3).trim(), > fields(4).trim(), fields(5).trim(),fields(6).trim()) > }) > > I am doing this because I would like to access to each position using the > variable name (V1...V7). Is there any other way of doing this? > > Also related to this question, if I have data with more than 22 variables, I > am restringed to use class instead of case class. However, this kind of > solution has many restrictions mainly related to getter methods. Is there > any other way of doing this? > > And finally, one of my main problems comes after operations of different > data variables. For instance, if I have two different variables (data1 and > data2), and I want to join them both as: > > val data3 = data1.keyBy(_.V1).leftOuterJoin(data2.keyBy(_.V1)) > > Then I have to post process data3 in order to obtain a new class that > contains those variables from data1 and also those variables from data2. As > data3 is (key, (data1, data2)), do I have to create a new different class > with all these attributes from data1 and data2? This is kind of annoying > when there are many attributes. > > Thanks in advance, > > Best
