If what you have is a large number of named strings, why not use a
Map[String,String] to represent them? If you're approaching a class
with >22 String fields anyway, it probably makes more sense. You lose
a bit of compile-time checking, but gain flexibility.

Also, merging two Maps to make a new one is pretty simple, compared to
making many of these values classes.

(Although, if you otherwise needed a class that represented "all of
the things in class A and class B", this could be done easily with
composition, a class with an A and a B inside.)

On Thu, Jul 17, 2014 at 9:15 AM, Luis Guerra <[email protected]> wrote:
> Hi all,
>
> I am a newbie Spark user with many doubts, so sorry if this is a "silly"
> question.
>
> I am dealing with tabular data formatted as text files, so when I first load
> the data, my code is like this:
>
> case class data_class(
>   V1: String,
>   V2: String,
>   V3: String,
>   V4: String,
>   V5: String,
>   V6: String,
>   V7: String)
>
> val data= sc.textFile(data_path)
>   .map(x => {
>   val fields = (x+" ").split("\t")
> data_class(fields(0).trim(),fields(1).trim(),fields(2).trim(),fields(3).trim(),
> fields(4).trim(), fields(5).trim(),fields(6).trim())
>     })
>
> I am doing this because I would like to access to each position using the
> variable name (V1...V7). Is there any other way of doing this?
>
> Also related to this question, if I have data with more than 22 variables, I
> am restringed to use class instead of case class. However, this kind of
> solution has many restrictions mainly related to getter methods. Is there
> any other way of doing this?
>
> And finally, one of my main problems comes after operations of different
> data variables. For instance, if I have two different variables (data1 and
> data2), and I want to join them both as:
>
> val data3 = data1.keyBy(_.V1).leftOuterJoin(data2.keyBy(_.V1))
>
> Then I have to post process data3 in order to obtain a new class that
> contains those variables from data1 and also those variables from data2. As
> data3 is (key, (data1, data2)), do I have to create a new different class
> with all these attributes from data1 and data2? This is kind of annoying
> when there are many attributes.
>
> Thanks in advance,
>
> Best

Reply via email to