Re: Spark Left outer Join issue using programmatic sql joins

Radha krishna Wed, 06 Jul 2016 10:19:17 -0700

Hi Mich,
Here I given just a sample data,
I have some GB's of files in HDFS and performing left outer joins on those
files, and the final result I am going to store in Vertica data base table.
There is no duplicate columns in the target table but for the non matching
rows columns I want to insert "" empty value instead of null word.
On 06-Jul-2016 10:31 pm, "Mich Talebzadeh" <mich.talebza...@gmail.com>
wrote:


> what do you mean database table here?
>
> you have repeating column names for the table namely deptid
>
>
> 0: jdbc:hive2://rhes564:10010/default>  SELECT *  FROM emp e LEFT OUTER
> JOIN dept d ON e.deptid = d.deptid;
>
> INFO  : OK
> +-----------+---------+-----------+-----------+--------------+--+
> | e.emp_id  | e.name  | e.deptid  | d.deptid  | d.dept_name  |
> +-----------+---------+-----------+-----------+--------------+--+
> | 1001      | aba     | 10        | 10        | DEV          |
> | 1002      | abs     | 20        | 20        | TEST         |
> | 1003      | abd     | 10        | 10        | DEV          |
> | 1001      | aba     | 10        | 10        | DEV          |
> | 1002      | abs     | 20        | 20        | TEST         |
> | 1003      | abd     | 10        | 10        | DEV          |
> | 1004      | abf     | 30        | 30        | IT           |
> | 1005      | abg     | 10        | 10        | DEV          |
> | 1004      | abf     | 30        | 30        | IT           |
> | 1005      | abg     | 10        | 10        | DEV          |
> | 1006      | abh     | 20        | 20        | TEST         |
> | 1007      | abj     | 10        | 10        | DEV          |
> | 1006      | abh     | 20        | 20        | TEST         |
> | 1007      | abj     | 10        | 10        | DEV          |
> | 1008      | abk     | 30        | 30        | IT           |
> | 1009      | abl     | 20        | 20        | TEST         |
> | 1010      | abq     | 10        | 10        | DEV          |
> | 1008      | abk     | 30        | 30        | IT           |
> | 1009      | abl     | 20        | 20        | TEST         |
> | 1010      | abq     | 10        | 10        | DEV          |
> +-----------+---------+-----------+-----------+--------------+--+
> 20 rows selected (44.351 seconds)
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 6 July 2016 at 17:48, radha <grkmc...@gmail.com> wrote:
>
>> Hi ,
>> Thanks all, its working fine the issue is with some space for the dept id,
>>
>> I have one more doubt for the non matching records its showing null word,
>> even if i write into HDFS also its showing null word how can we avoid
>> writing null for the non matching columns, i want just empty value ("")
>>
>> same input i used in the dept table i removed the last row and the below
>> code i used to write into hdfs.
>>
>> DataFrame joinResult = sqlContext.sql("SELECT * FROM EMP e LEFT OUTER JOIN
>> DEPT d ON e.deptid = d.deptid");
>>                    joinResult.javaRDD().repartition(1).map(new
>> Function<Row, String>() {
>>                 private static final long serialVersionUID =
>> 9185646063977504742L;
>>                         @Override
>>                         public String call(Row arg0) throws Exception {
>>                                 String s;
>>
>>
>> s=arg0.getString(0)+"\u001c"+arg0.getString(1)+"\u001c"+arg0.getString(2)+"\u001c"+arg0.getString(3)+"\u001c"+arg0.getString(4)+"\u001e";
>>                                 return s;
>>                         }
>>                 }).saveAsTextFile(args[2]);
>>
>>
>> Output in HDFS File
>>
>> 10 1001 aba 10 dev
>> 10 1003 abd 10 dev
>> 10 1005 abg 10 dev
>> 10 1007 abj 10 dev
>> 10 1010 abq 10 dev
>> 20 1002 abs 20 Test
>> 20 1006 abh 20 Test
>> 20 1009 abl 20 Test
>> 30 1004 abf null null
>> 30 1008 abk null null
>>
>> in my case i want to store the join result back to data base table and its
>> storing "null" word for those non matching records, i want to store as
>> ""(empty value) for the non matching rows.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Left-outer-Join-issue-using-programmatic-sql-joins-tp27295p27299.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Re: Spark Left outer Join issue using programmatic sql joins

Reply via email to