Phrase Search using Apache Spark in huge amount of text in files

2019-05-28 Thread Sandeep Giri
on this. This is in very early stages and hacky and probably would require more testing. Regards, Sandeep Giri, www.CloudxLab.com <http://www.cloudxlab.com/>

Does Spark shows logical or physical plan when executing job on the yarn cluster

2018-05-20 Thread giri ar
Hi, Good Day. Could you please let me know whether we can see spark logical or physical plan while running spark job on the yarn cluster( Eg: like number of stages) Thanks in advance. Thanks, Giri

Re: Not able pass 3rd party jars to mesos executors

2016-05-11 Thread Giri P
; > On Wed, May 11, 2016 at 10:05 PM, Giri P <gpatc...@gmail.com> wrote: > >> I'm not using docker >> >> On Wed, May 11, 2016 at 8:47 AM, Raghavendra Pandey < >> raghavendra.pan...@gmail.com> wrote: >> >>> By any chance, are you using docke

Re: Not able pass 3rd party jars to mesos executors

2016-05-11 Thread Giri P
I'm not using docker On Wed, May 11, 2016 at 8:47 AM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > By any chance, are you using docker to execute? > On 11 May 2016 21:16, "Raghavendra Pandey" > wrote: > >> On 11 May 2016 02:13, "gpatcham"

Re: using spark context in map funciton TASk not serilizable error

2016-01-20 Thread Giri P
method1 looks like this reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) reRDD has userId's def method1(sc:SparkContext , userId: string){ sc.cassandraTable("Keyspace", "Table2").where("userid = ?" userId) ...do something return "Test" } On Wed, Jan 20, 2016 at 11:00 AM,

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
I'm using spark cassandra connector to do this and the way we access cassandra table is sc.cassandraTable("keySpace", "tableName") Thanks Giri On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you pass the properties which are needed for

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
Can we use @transient ? On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: > I'm using spark cassandra connector to do this and the way we access > cassandra table is > > sc.cassandraTable("keySpace", "tableName") > > Thanks > Gi

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
that would work. > > Doesn't seem to be good practice. > > On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gpatc...@gmail.com> wrote: > >> Can we use @transient ? >> >> >> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: >>

Re: Maintaining overall cumulative data in Spark Streaming

2015-10-30 Thread Sandeep Giri
How to we reset the aggregated statistics to null? Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other site

Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
an StreamRDD with aggregated count and keep doing a fullouterjoin but didn't work. Seems like the StreamRDD gets reset. Kindly help. Regards, Sandeep Giri

RE: Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
Yes, update state by key worked. Though there are some more complications. On Oct 30, 2015 8:27 AM, "skaarthik oss" <skaarthik@gmail.com> wrote: > Did you consider UpdateStateByKey operation? > > > > *From:* Sandeep Giri [mailto:sand...@knowbigdata.com] > *S

Re: SPARK SQL Error

2015-10-15 Thread Giri
parkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks & Regards, Giri. -- View this message in context:

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek wrote: > Hello , > > > > Is there any way to query multiple collections from mongodb using spark > and java. And i want to create only one Configuration Object. Please help > if anyone has something

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
I think it should be possible by loading collections as RDD and then doing a union on them. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.co

Re: query avro hive table in spark sql

2015-08-28 Thread Giri P
Any idea what causing this error 15/08/28 21:03:03 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 9.0 (TID 20, dtord01hdw0228p.dc.dotomi.net): java.lang.RuntimeException: cannot find field message_campaign_id from [0:error_error_error_error_error_error_error, 1:cannot_determine_schema,

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
...@databricks.com; user@spark.apache.org can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below error when I used right version

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
queries in our application Any idea if this issue might be coz of querying across different schema version of data ? Thanks Giri On Thu, Aug 27, 2015 at 5:39 AM, java8964 java8...@hotmail.com wrote: What version of the Hive you are using? And do you compile to the right version of Hive when you

Re: Spark Interview Questions

2015-08-19 Thread Sandeep Giri
Thank you All. I have updated it to a little better version. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. Phone: +1-253-397-1945 (Office) [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon

Re: Spark Interview Questions

2015-08-17 Thread Sandeep Giri
This statement is from the Spark's website itself. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. Phone: +1-253-397-1945 (Office) [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon] http

Re: Spark Interview Questions

2015-07-30 Thread Sandeep Giri
i have prepared some interview questions: http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2 please provide your feedback. On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez ski.rodrig...@gmail.com wrote: You

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Sandeep Giri
Even for 2L records the MySQL will be better. Regards, Sandeep Giri, +1-253-397-1945 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon] http://knowbigdata.com [image: facebook icon

Re: resource allocation spark on yarn

2014-12-12 Thread Giri P
but on spark 0.9 we don't have these options --num-executors: controls how many executors will be allocated --executor-memory: RAM for each executor --executor-cores: CPU cores for each executor On Fri, Dec 12, 2014 at 12:27 PM, Sameer Farooqui same...@databricks.com wrote: Hi, FYI - There