Re: Keep local variable

2015-04-10 Thread Tassilo Klein
Hi Gerard, thanks for the hint with the Singleton object. Seems very interesting. However, when my singleton object (e.g. handle to my DB) is supposed to have a member variable that is non-serializable I again will have a problem, won’t I? At least I always run into issues that Python tries to pic

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
/1.2.0/configuration.html#compression-and-serialization > > > Thanks > Best Regards > > On Sun, Feb 22, 2015 at 11:47 PM, Tassilo Klein wrote: > >> Hi Akhil, >> >> thanks for your reply. I am using the latest version of Spark 1.2.1 (also >> tr

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
Hi Akhil, thanks for your reply. I am using the latest version of Spark 1.2.1 (also tried 1.3 developer branch). If I am not mistaken the TorrentBroadcast is the default there, isn't it? Thanks, Tassilo On Sun, Feb 22, 2015 at 10:59 AM, Akhil Das wrote: > Did you try with torrent broadcast fa

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
ki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf >> (linked from that article) >> >> to get a better idea of what your options are. >> >> If its possible to avoid writing to [any] disk I'd recommend that route, >> since that's the performance advantage Spark

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
rticle) > > to get a better idea of what your options are. > > If its possible to avoid writing to [any] disk I'd recommend that route, > since that's the performance advantage Spark has over vanilla Hadoop. > > On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein wrote: &g

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
Thanks for the info. The file system in use is a Lustre file system. Best, Tassilo On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke wrote: > A central location, such as NFS? > > If they are temporary for the purpose of further job processing you'll > want to keep them local to the node in the

Re: Spark 1.1 (slow, working), Spark 1.2 (fast, freezing)

2015-01-21 Thread Tassilo Klein
ks, if the worker is not reused. > > We will really appreciate that if you could provide a short script to > reproduce the freeze, then we can investigate the root cause and fix > it. Also, fire a JIRA for it, thanks! > > On Wed, Jan 21, 2015 at 4:56 PM, Tassilo

Re: Spark 1.1 (slow, working), Spark 1.2 (fast, freezing)

2015-01-21 Thread Tassilo Klein
by: > spark.python.worker.reuse = false > > On Tue, Jan 20, 2015 at 11:12 PM, Tassilo Klein > wrote: > > Hi, > > > > It's a bit of a longer script that runs some deep learning training. > > Therefore it is a bit hard to wrap up easily. > > > > Essentially I am havin

Re: Spark 1.1 (slow, working), Spark 1.2 (fast, freezing)

2015-01-20 Thread Tassilo Klein
Hi, It's a bit of a longer script that runs some deep learning training. Therefore it is a bit hard to wrap up easily. Essentially I am having a loop, in which a gradient is computed on each node and collected (this is where it freezes at some point). grads = zipped_trainData.map(distributed_gr

Re: Using Hadoop InputFormat in Python

2014-08-13 Thread Tassilo Klein
Thanks. This was already helping a bit. But the examples don't use custom InputFormats. Rather, org.apache fully qualified InputFormat. If I want to use my own custom InputFormat in form of .class (or jar) how can I use it? I tried providing it to pyspark with --jars and then using sc.newAPIHadoo

Re: Using Hadoop InputFormat in Python

2014-08-13 Thread Tassilo Klein
Yes, somehow seems logical. But where / how to pass -the InputFormat definition (.jar/.java/.class) Spark. I mean when using Hadoop I need to call something like 'hadoop jar -inFormat other stuff' to register the file format definition file. -- View this message in context: http://apache-s

Using Hadoop InputFormat in Python

2014-08-13 Thread Tassilo Klein
Hi, I'd like to read in a (binary) file from Python for which I have defined a Java InputFormat (.java) definition. However, now I am stuck in how to use that in Python and didn't find anything in newsgroups either. As far as I know, I have to use this newAPIHadoopRDD function. However, I am not s