Re: println not appearing in libraries when running job using spark-submit --master local

2016-03-28 Thread Kevin Peng
Ted, What triggerAndWait does is perform a rest call to a specified url and then waits until the status message that gets returned by that url in a json a field says complete. The issues is I put a println at the very top of the method and that doesn't get printed out, and I know that println isn

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Gourav, Apologies. I edited my post with this information: Spark version: 1.6 Result from spark shell OS: Linux version 2.6.32-431.20.3.el6.x86_64 ( mockbu...@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Thu Jun 19 21:14:45 UTC 2014 Thanks, KP On Mon,

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Gourav, I wish that was case, but I have done a select count on each of the two tables individually and they return back different number of rows: dps.registerTempTable("dps_pin_promo_lt") swig.registerTempTable("swig_pin_promo_lt") dps.count() RESULT: 42632 swig.count() RESULT: 42034 On

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Yong, Sorry, let explain my deduction; it is going be difficult to get a sample data out since the dataset I am using is proprietary. >From the above set queries (ones mentioned in above comments) both inner and outer join are producing the same counts. They are basically pulling out selected co

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
at 11:16 PM, Davies Liu wrote: > as @Gourav said, all the join with different join type show the same > results, > which meant that all the rows from left could match at least one row from > right, > all the rows from right could match at least one row from left, even > the numb

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
romo_lt s RIGHT OUTER JOIN > >>> dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND > s.ad = > >>> d.ad) WHERE s.date >= '2016-01-03' AND d.date >= > '2016-01-03'").count() > >>> res12: Long = 23809 > >&

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
gt;> Hi Kevin, > >>> > >>> Having given it a first look I do think that you have hit something > here > >>> and this does not look quite fine. I have to work on the multiple AND > >>> conditions in ON and see whether that is causing any issues. > &

Re: Setting Optimal Number of Spark Executor Instances

2017-03-15 Thread Kevin Peng
Mohini, We set that parameter before we went and played with the number of executors and that didn't seem to help at all. Thanks, KP On Tue, Mar 14, 2017 at 3:37 PM, mohini kalamkar wrote: > Hi, > > try using this parameter --conf spark.sql.shuffle.partitions=1000 > > Thanks, > Mohini > > On

spark jdbc postgres query results don't match those of postgres query

2018-03-29 Thread Kevin Peng
I am running into a weird issue in Spark 1.6, which I was wondering if anyone has encountered before. I am running a simple select query from spark using a jdbc connection to postgres: val POSTGRES_DRIVER: String = "org.postgresql.Driver" val srcSql = """select total_action_value, last_updated from

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Kevin Peng
Marcelo, Yes that is correct, I am going through a mirror, but 1.1.0 works properly, while 1.2.0 does not. I suspect there is crc in the 1.2.0 pom file. On Wed, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin wrote: > Seems like someone set up "m2.mines.com" as a mirror in your pom file > or ~/.m2/sett

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Kevin Peng
Ted, I have tried wiping out ~/.m2/org.../spark directory multiple times. It doesn't seem to work. On Wed, Mar 4, 2015 at 4:20 PM, Ted Yu wrote: > kpeng1: > Try wiping out ~/.m2 and build again. > > Cheers > > On Wed, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin > wrote: > >> Seems like someone s

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Kevin Peng
thread: http://search-hadoop.com/m/JW1q5Vfe6X1 > > Cheers > > On Wed, Mar 4, 2015 at 4:18 PM, Kevin Peng wrote: > >> Marcelo, >> >> Yes that is correct, I am going through a mirror, but 1.1.0 works >> properly, while 1.2.0 does not. I suspect there is crc in

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Kevin Peng
loudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_vd_cdh5_maven_repo.html > > On Wed, Mar 4, 2015 at 4:34 PM, Kevin Peng wrote: > > Ted, > > > > I am currently using CDH 5.3 distro, which has Spark 1.2.0, so I am not > too > > sure about the compatibilit

Re: spark sql writing in avro

2015-03-12 Thread Kevin Peng
Dale, I basically have the same maven dependency above, but my code will not compile due to not being able to reference to AvroSaver, though the saveAsAvro reference compiles fine, which is weird. Eventhough saveAsAvro compiles for me, it errors out when running the spark job due to it not being

Re: spark sql writing in avro

2015-03-13 Thread Kevin Peng
n will pick up the latest version of > spark-avro (for this machine). > > Now you should be able to compile and run. > > HTH, > Markus > > > On 03/12/2015 11:55 PM, Kevin Peng wrote: > > Dale, > > I basically have the same maven dependency above, but my

Re: Loading in json with spark sql

2015-03-13 Thread Kevin Peng
Yin, Yup thanks. I fixed that shortly after I posted and it worked. Thanks, Kevin On Fri, Mar 13, 2015 at 8:28 PM, Yin Huai wrote: > Seems you want to use array for the field of "providers", like > "providers":[{"id": > ...}, {"id":...}] instead of "providers":{{"id": ...}, {"id":...}} > >

Re: Spark Streaming into HBase

2014-09-03 Thread Kevin Peng
hbase 0.98 > > Cheers > > > On Wed, Sep 3, 2014 at 2:33 PM, Kevin Peng wrote: > >> Ted, >> >> The hbase-site.xml is in the classpath (had worse issues before... until >> I figured that it wasn't in the path). >> >> I get the following erro

Re: Invalid signature file digest for Manifest main attributes with spark job built using maven

2014-09-15 Thread Kevin Peng
Sean, Thanks. That worked. Kevin On Mon, Sep 15, 2014 at 3:37 PM, Sean Owen wrote: > This is more of a Java / Maven issue than Spark per se. I would use > the shade plugin to remove signature files in your final META-INF/ > dir. As Spark does, in its : > > > > *:* > > org/d