Hi Gerard,
thanks for the hint with the Singleton object. Seems very interesting.
However, when my singleton object (e.g. handle to my DB) is supposed to
have a member variable that is non-serializable I again will have a
problem, won’t I? At least I always run into issues that Python tries to
pic
/1.2.0/configuration.html#compression-and-serialization
>
>
> Thanks
> Best Regards
>
> On Sun, Feb 22, 2015 at 11:47 PM, Tassilo Klein wrote:
>
>> Hi Akhil,
>>
>> thanks for your reply. I am using the latest version of Spark 1.2.1 (also
>> tr
Hi Akhil,
thanks for your reply. I am using the latest version of Spark 1.2.1 (also
tried 1.3 developer branch). If I am not mistaken the TorrentBroadcast is
the default there, isn't it?
Thanks,
Tassilo
On Sun, Feb 22, 2015 at 10:59 AM, Akhil Das
wrote:
> Did you try with torrent broadcast fa
ki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
>> (linked from that article)
>>
>> to get a better idea of what your options are.
>>
>> If its possible to avoid writing to [any] disk I'd recommend that route,
>> since that's the performance advantage Spark
rticle)
>
> to get a better idea of what your options are.
>
> If its possible to avoid writing to [any] disk I'd recommend that route,
> since that's the performance advantage Spark has over vanilla Hadoop.
>
> On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein wrote:
&g
Thanks for the info. The file system in use is a Lustre file system.
Best,
Tassilo
On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke
wrote:
> A central location, such as NFS?
>
> If they are temporary for the purpose of further job processing you'll
> want to keep them local to the node in the
ks, if the worker is not reused.
>
> We will really appreciate that if you could provide a short script to
> reproduce the freeze, then we can investigate the root cause and fix
> it. Also, fire a JIRA for it, thanks!
>
> On Wed, Jan 21, 2015 at 4:56 PM, Tassilo
by:
> spark.python.worker.reuse = false
>
> On Tue, Jan 20, 2015 at 11:12 PM, Tassilo Klein
> wrote:
> > Hi,
> >
> > It's a bit of a longer script that runs some deep learning training.
> > Therefore it is a bit hard to wrap up easily.
> >
> > Essentially I am havin
Hi,
It's a bit of a longer script that runs some deep learning training.
Therefore it is a bit hard to wrap up easily.
Essentially I am having a loop, in which a gradient is computed on each
node and collected (this is where it freezes at some point).
grads = zipped_trainData.map(distributed_gr
Thanks. This was already helping a bit. But the examples don't use custom
InputFormats. Rather, org.apache fully qualified InputFormat. If I want to
use my own custom InputFormat in form of .class (or jar) how can I use it? I
tried providing it to pyspark with --jars
and then using sc.newAPIHadoo
Yes, somehow seems logical. But where / how to pass -the InputFormat
definition (.jar/.java/.class) Spark.
I mean when using Hadoop I need to call something like 'hadoop jar
-inFormat other stuff' to register the file
format definition file.
--
View this message in context:
http://apache-s
Hi,
I'd like to read in a (binary) file from Python for which I have defined a
Java InputFormat (.java) definition. However, now I am stuck in how to use
that in Python and didn't find anything in newsgroups either.
As far as I know, I have to use this newAPIHadoopRDD function. However, I am
not s
12 matches
Mail list logo