Re: Joining two HDFS files in in Spark

Yana Kadiyska Wed, 19 Mar 2014 05:53:09 -0700

Not sure what you mean by "not getting information how to join". If
you mean that you can't see the result I believe you need to collect
the result of the join on the driver, as in


val joinedRdd=enKeyValuePair1.join(enKeyValuePair)
joinedRdd.collect().map(prinltn)



On Wed, Mar 19, 2014 at 4:57 AM, Chhaya Vishwakarma
<chhaya.vishwaka...@lntinfotech.com> wrote:
> Hi
>
>
>
> I want to join two files from HDFS using spark shell.
>
> Both the files are tab separated and I want to join on second column
>
>
>
> Tried code
>
> But not giving any output
>
>
>
> val ny_daily=
> sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_daily"))
>
>
>
> val ny_daily_split = ny_daily.map(line =>line.split('\t'))
>
>
>
> val enKeyValuePair = ny_daily_split.map(line => (line(0).substring(0, 5),
> line(3).toInt))
>
>
>
>
>
> val ny_dividend=
> sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_dividends"))
>
>
>
> val ny_dividend_split = ny_dividend.map(line =>line.split('\t'))
>
>
>
> val enKeyValuePair1 = ny_dividend_split.map(line => (line(0).substring(0,
> 4), line(3).toInt))
>
>
>
> enKeyValuePair1.join(enKeyValuePair)
>
>
>
>
>
> But I am not getting any information for how to join files on particular
> column
>
> Please suggest
>
>
>
>
>
>
>
> Regards,
>
> Chhaya Vishwakarma
>
>
>
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"

Re: Joining two HDFS files in in Spark

Reply via email to