Not sure what you mean by "not getting information how to join". If you mean that you can't see the result I believe you need to collect the result of the join on the driver, as in
val joinedRdd=enKeyValuePair1.join(enKeyValuePair) joinedRdd.collect().map(prinltn) On Wed, Mar 19, 2014 at 4:57 AM, Chhaya Vishwakarma <chhaya.vishwaka...@lntinfotech.com> wrote: > Hi > > > > I want to join two files from HDFS using spark shell. > > Both the files are tab separated and I want to join on second column > > > > Tried code > > But not giving any output > > > > val ny_daily= > sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_daily")) > > > > val ny_daily_split = ny_daily.map(line =>line.split('\t')) > > > > val enKeyValuePair = ny_daily_split.map(line => (line(0).substring(0, 5), > line(3).toInt)) > > > > > > val ny_dividend= > sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_dividends")) > > > > val ny_dividend_split = ny_dividend.map(line =>line.split('\t')) > > > > val enKeyValuePair1 = ny_dividend_split.map(line => (line(0).substring(0, > 4), line(3).toInt)) > > > > enKeyValuePair1.join(enKeyValuePair) > > > > > > But I am not getting any information for how to join files on particular > column > > Please suggest > > > > > > > > Regards, > > Chhaya Vishwakarma > > > > > ________________________________ > The contents of this e-mail and any attachment(s) may contain confidential > or privileged information for the intended recipient(s). Unintended > recipients are prohibited from taking action on the basis of information in > this e-mail and using or disseminating the information, and must notify the > sender and delete it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail"