has less free
> memory spilling may become more expensive.
>
>
> If the walk is your bottleneck and not GC then I would recommend JOL and
> guessing to better predict memory.
>
> On Mon, Feb 26, 2018, 4:47 PM Xin Liu <xin.e@gmail.com> wrote:
>
>&
Thanks!
Our protobuf object is fairly complex. Even O(N) takes a lot of time.
On Mon, Feb 26, 2018 at 6:33 PM, 叶先进 <advance...@gmail.com> wrote:
> H Xin Liu,
>
> Could you provide a concrete user case if possible(code to reproduce
> protobuf object and comparisons between p
Hi folks,
We have a situation where, shuffled data is protobuf based, and
SizeEstimator is taking a lot of time.
We have tried to override SizeEstimator to return a constant value, which
speeds up things a lot.
My questions, what is the side effect of disabling SizeEstimator? Is it
just spark
Hi,
I have a scenario where I'd like to store a RDD using parquet format in
many files, which corresponds to days, such as 2015/01/01, 2015/02/02, etc.
So far I used this method
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
to store text files
:42 PM, Xin Liu liuxin...@gmail.com wrote:
Hi,
I have tried a few models in Mllib to train a LogisticRegression model.
However, I consistently get much better results using other libraries such
as statsmodel (which gives similar results as R) in terms of AUC. For
illustration purpose, I used
Hi,
I have tried a few models in Mllib to train a LogisticRegression model.
However, I consistently get much better results using other libraries such
as statsmodel (which gives similar results as R) in terms of AUC. For
illustration purpose, I used a small data (I have tried much bigger data)