I think rdd.toLocalIterator is what you want. But it will keep one
partition's data in-memory.
On Wed, Sep 2, 2015 at 10:05 AM, Niranda Perera
wrote:
> Hi all,
>
> I have a large set of data which would not fit into the memory. So, I wan
> to take n number of data from the RDD given a particular
Hi All,
I am using spark-sql 1.3.1 with hadoop 2.4.0 version. I am running sql
query against parquet files and wanted to save result on s3 but looks like
https://issues.apache.org/jira/browse/SPARK-2984 problem still coming while
saving data to s3.
Hence Now i am saving result on hdfs and with t
Hi all,
I have a large set of data which would not fit into the memory. So, I wan
to take n number of data from the RDD given a particular index. for an
example, take 1000 rows starting from the index 1001.
I see that there is a take(num: Int): Array[T] method in the RDD, but it
only returns the
Hi, I have an idea, can someone give me some advice?
I want to compress data in in-memory column storage which is used by cache
table in spark. This will make cache table use less memory.
I will set an conf to this function, so if anyone want to use this function, he
can set this conf to t
https://issues.apache.org/jira/browse/SPARK-10399
Is the jira to track.
On Sep 1, 2015 5:32 PM, "Paul Wais" wrote:
> Paul: I've worked on running C++ code on Spark at scale before (via JNA,
> ~200
> cores) and am working on something more contribution-oriented now (via
> JNI).
> A few comments:
Dear Spark developers,
Could you suggest what is the intended use of UnsafeRow (except for Tungsten
groupBy and sort) and give an example how to use it?
1)Is it intended to be instantiated as the copy of the Row in order to perform
in-place modifications of it?
2)Can I create a new UnsafeRow giv
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200
cores) and am working on something more contribution-oriented now (via JNI).
A few comments:
* If you need something *today*, try JNA. It can be slow (e.g. a short
native function in a tight loop) but works if you have
Please vote on releasing the following candidate as Apache Spark version
1.5.0. The vote is open until Friday, Sep 4, 2015 at 21:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.5.0
[ ] -1 Do not release this package because ...
To
I am not much clear about resource allocation (CPU/CORE/Thread level
allocation) as per the parallelism by setting number of cores in spark
standalone mode .
Any guidelines for that .
--
Thanks & Regards,
Anshu Shukla
Thanks Sean, that make it clear.
On Tue, Sep 1, 2015 at 7:17 AM, Sean Owen wrote:
> Any 1.5 RC comes from the latest state of the 1.5 branch at some point
> in time. The next RC will be cut from whatever the latest commit is.
> You can see the tags in git for the specific commits for each RC.
>
Any 1.5 RC comes from the latest state of the 1.5 branch at some point
in time. The next RC will be cut from whatever the latest commit is.
You can see the tags in git for the specific commits for each RC.
There's no such thing as "1.5.1 SNAPSHOT" commits, just commits to
branch 1.5. I would ignore
Thanks for the explanation. Since 1.5.0 rc3 is not yet released, I assume it
would cut from 1.5 branch, doesn't that bring 1.5.1 snapshot code ?
The reason I am asking these questions is that I would like to know If I want
build 1.5.0 myself, which commit should I use ?
Sent from my iPad
>
Hi all,
Shivaram and I added a lint script for SparkR which is `dev/lint-r`. And
it's been already running on Jenkins. If there are any validation problems
in your patch, Jenkins will fail.
Could you please make sure that your patch don't have any validation
problems on your local machine before
The head of branch 1.5 will always be a "1.5.x-SNAPSHOT" version. Yeah
technically you would expect it to be 1.5.0-SNAPSHOT until 1.5.0 is
released. In practice I think it's simpler to follow the defaults of
the Maven release plugin, which will set this to 1.5.1-SNAPSHOT after
any 1.5.0-rc is relea
Sorry, I am still not follow. I assume the release would build from 1.5.0
before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from 1.5.1
snapshot during release ? Or 1.5.0 rc3 would build from the last commit of
1.5.0 (before changing to 1.5.1 snapshot) ?
Sent from my iPad
> On
Please do. Thanks.
On Mon, Aug 31, 2015 at 5:00 AM, Paul Weiss wrote:
> Sounds good, want me to create a jira and link it to SPARK-9697? Will put
> down some ideas to start.
> On Aug 31, 2015 4:14 AM, "Reynold Xin" wrote:
>
>> BTW if you are interested in this, we could definitely get some help
That's correct for the 1.5 branch, right? this doesn't mean that the
next RC would have this value. You choose the release version during
the release process.
On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen wrote:
> Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT,
>
> I a
17 matches
Mail list logo