Contineous errors trying to start spark-shell

2015-07-03 Thread Mohamed Lrhazi
Hello, I am trying to just start spark-shell... it starts, the prompt appears, then a never ending (literally) stream of these log lines proceeds What is it trying to do? Why is it failing? To start it I do: $ docker run -it ncssm/spark-base /spark/bin/spark-shell --master spark:// devzero.c

Re: Problem getting pyspark-cassandra and pyspark working

2015-02-16 Thread Mohamed Lrhazi
Will do. Thanks a lot. On Mon, Feb 16, 2015 at 7:20 PM, Davies Liu wrote: > Can you try the example in pyspark-cassandra? > > If not, you could create a issue there. > > On Mon, Feb 16, 2015 at 4:07 PM, Mohamed Lrhazi > wrote: > > So I tired building the connector from

Re: Problem getting pyspark-cassandra and pyspark working

2015-02-16 Thread Mohamed Lrhazi
ocol.Py4JError: Trying to call a package. am I building the wrong connector jar? or using the wrong jar? Thanks a lot, Mohamed. On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi < mohamed.lrh...@georgetown.edu> wrote: > Oh, I don't know. thanks a lot Davies, gonna figure that out n

Re: Problem getting pyspark-cassandra and pyspark working

2015-02-16 Thread Mohamed Lrhazi
> > > On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi > wrote: > > Yes, am sure the system cant find the jar.. but how do I fix that... my > > submit command includes the jar: > > > > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars > >

Re: Problem getting pyspark-cassandra and pyspark working

2015-02-16 Thread Mohamed Lrhazi
d try any suggestions highly appreciated. Thanks, Mohamed. On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu wrote: > It seems that the jar for cassandra is not loaded, you should have > them in the classpath. > > On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi > wrote: > > Hel

Problem getting pyspark-cassandra and pyspark working

2015-02-16 Thread Mohamed Lrhazi
Hello all, Trying the example code from this package ( https://github.com/Parsely/pyspark-cassandra) , I always get this error... Can you see what I am doing wrong? from googling arounf it seems to be that the jar is not found somehow... The spark log shows the JAR was processed at least. Thank

Can I set max execution time for any task in a job?

2014-12-15 Thread Mohamed Lrhazi
Is that possible, if not, how would one do it from PySpark ? This probably does not make sense in most cases, but am writing a script where my job involves downloading and pushing data into cassandra.. sometimes a task hangs forever, and I dont really mind killing it.. The job is not actually comp

Re: PySprak and UnsupportedOperationException

2014-12-10 Thread Mohamed Lrhazi
Thanks Davies. it turns out it was indeed and they fixed it in last night's nightly build! https://github.com/elasticsearch/elasticsearch-hadoop/issues/338 On Wed, Dec 10, 2014 at 2:52 AM, Davies Liu wrote: > On Tue, Dec 9, 2014 at 11:32 AM, Mohamed Lrhazi > wrote: > > Wh

Re: PySprak and UnsupportedOperationException

2014-12-09 Thread Mohamed Lrhazi
PythonRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) On Tue, Dec 9, 2014 at 2:32 PM, Mohamed Lrhazi < mohamed.lrh...@georgetown.edu> wrote: > While trying simple examples of PySpark code, I systematically g

PySprak and UnsupportedOperationException

2014-12-09 Thread Mohamed Lrhazi
While trying simple examples of PySpark code, I systematically get these failures when I try this.. I dont see any prior exceptions in the output... How can I debug further to find root cause? es_rdd = sc.newAPIHadoopRDD( inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat", keyCl

Re: PySpark elasticsearch question

2014-12-09 Thread Mohamed Lrhazi
found a format that worked, kind of accidentally: "es.query" : """{"query":{"match_all":{}},"fields":["title","_source"]}""" Thanks, Mohamed. On Tue, Dec 9, 2014 at 11:27 AM, Mohamed Lrhazi

Re: PySpark elasticsearch question

2014-12-09 Thread Mohamed Lrhazi
})] In [20]: On Tue, Dec 9, 2014 at 10:18 AM, Nick wrote: > try "es.query" something like "?q=*&fields=title,_source" for a match all > query. you need the "q=*" which is actually the query part of the query > > On Tue, Dec 9, 2014 at 3:15

PySpark elasticsearch question

2014-12-09 Thread Mohamed Lrhazi
Hello, Following a couple of tutorials, I cant seem to get pysprak to get any "fields" from ES other than the document id? I tried like so: es_rdd = sc.newAPIHadoopRDD(inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",keyClass="org.apache.hadoop.io.NullWritable",valueClass="org.elasti