Yeah this is due to the fact that the broadcasted variables are kept in
memory and I am guessing that it is referenced in a way that prevents it
from being garbage collected...
A solution could be to enable spark.cleaner.ttl, but I don't like it much
as it sounds more like a hacky solution.
There
the cleaning of old broadcasted vars.
2014-02-19 12:25 GMT+01:00 Eugen Cepoi cepoi.eu...@gmail.com:
Yeah this is due to the fact that the broadcasted variables are kept in
memory and I am guessing that it is referenced in a way that prevents it
from being garbage collected...
A solution could
Hi,
What is the size of RDD two?
You want to map à line from RDD one to multiple values from RDD two and get
the sum of all of them?
So as result you would have an rdd of size RDD1 and containing a number per
line?
2014-02-18 8:06 GMT+01:00 hanbo hanbo...@gmail.com:
Sincerely thank you for
Do you have the stacktrace?
I had something similar, where the Kryo deser was throwing EOF, but in fact
EOF means nothing, spark catches Kryo exceptions and then throws EOF (and
loses the reason...), in my case kryo couldn't find the class to which to
deser.
2014/1/8 Aureliano Buendia
hadoop in your fat jar:
includeorg.apache.hadoop:*/include
This would take a big chunk of the fat jar. Isn't this jar already
included in spark?
On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi cepoi.eu...@gmail.comwrote:
It depends how you deploy, I don't find it so complicated...
1) To build
You can set the log level to INFO, it looks like spark is logging
applicative errors as INFO. When I have errors that I can reproduce only on
live data, I am running a spark shell with my job in its classpath, then I
debug tweak things to find out what happens.
2014/1/5 Nan Zhu
Hi,
This is the list of the jars you use in your job, the driver will send all
those jars to each worker (otherwise the workers won't have the classes you
need in your job). The easy way to go is to build a fat jar with your code
and all the libs you depend on and then use this utility to get the
sbt
assembly also create that jar?
3. Is calling sc.jarOfClass() the most common way of doing this? I cannot
find any example by googling. What's the most common way that people use?
On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi cepoi.eu...@gmail.comwrote:
Hi,
This is the list of the jars
?
Using spark://localhost:7077 is a good way to simulate the production
driver and it provides the web ui. When using spark://localhost:7077, is
it required to create the fat jar? Wouldn't that significantly slow down
the development cycle?
On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi cepoi.eu
and it provides the web ui. When using spark://localhost:7077, is
it required to create the fat jar? Wouldn't that significantly slow down
the development cycle?
On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi cepoi.eu...@gmail.comwrote:
It depends how you deploy, I don't find it so complicated
Did you try to define the spark.executor.memory property to the amount of
memory you want per worker?
For example spark.executor.memory=2g
http://spark.incubator.apache.org/docs/latest/configuration.html
2014/1/2 Archit Thakur archit279tha...@gmail.com
Need not mention Workers could be seen
2014/1/2 Aureliano Buendia buendia...@gmail.com
On Thu, Jan 2, 2014 at 1:19 PM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
When developing I am using local[2] that launches a local cluster with 2
workers. In most cases it is fine, I just encountered some strange
behaviours for broadcasted
In scala case classes are serializable by default, your TileIdWritable
should be a case class. I usually enable Kryo ser for objects and keep
default ser for closures, this works pretty well.
Eugen
2013/12/24 Ameet Kini ameetk...@gmail.com
If Java serialization is the only one that properly
Ramachandrasekaran sri.ram...@gmail.com
Trying local[m], where m is the number of workers. For tests, local[2]
should be ideal. This is the way to accomplish writing tests for Spark code
generally.
On Tue, Nov 19, 2013 at 10:03 PM, Eugen Cepoi cepoi.eu...@gmail.comwrote:
Maybe a bug with HttpBroadcast
for other
inputs.
On Tue, Nov 19, 2013 at 10:40 PM, Eugen Cepoi cepoi.eu...@gmail.comwrote:
Yes sure for usual tests it is fine, but the broadcast is only done if we
are not in local mode (at least seems so).
In SparkContext we have def broadcast[T](value: T) =
env.broadcastManager.newBroadcast
for output
formats that go to a filesystem (e.g. HDFS), but HBase isn't a filesystem.
Matei
On Oct 11, 2013, at 8:53 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
Hi there,
I have got a few questions on how best to write to HBase from a spark
job.
- If we want to write using
Hi there,
I have got a few questions on how best to write to HBase from a spark job.
- If we want to write using TableOutputFormat are we supposed to use
saveAsNewAPIHadoopFile?
- Or should we do it by hand (without TableOutputFormat) in a foreach loop
for example?
- Or should use
17 matches
Mail list logo