Hi I want to join two files from HDFS using spark shell. Both the files are tab separated and I want to join on second column
Tried code But not giving any output val ny_daily= sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_daily")) val ny_daily_split = ny_daily.map(line =>line.split('\t')) val enKeyValuePair = ny_daily_split.map(line => (line(0).substring(0, 5), line(3).toInt)) val ny_dividend= sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_dividends")) val ny_dividend_split = ny_dividend.map(line =>line.split('\t')) val enKeyValuePair1 = ny_dividend_split.map(line => (line(0).substring(0, 4), line(3).toInt)) enKeyValuePair1.join(enKeyValuePair) But I am not getting any information for how to join files on particular column Please suggest Regards, Chhaya Vishwakarma ________________________________ The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"