and what do I expect to see in switching from one partition to
another as the code runs?
On Sat, Aug 30, 2014 at 10:30 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
In 1.1, you'll be able to get all of these properties using sortByKey, and then
mapPartitions on top to iterate through the key-value
BTW you can also use rdd.partitions() to get a list of Partition objects and
see how many there are.
On September 4, 2014 at 5:18:30 PM, Matei Zaharia (matei.zaha...@gmail.com)
wrote:
Partitioners also work in local mode, the only question is how to see which
data fell into each partition
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120657#comment-14120657
]
Matei Zaharia commented on SPARK-3215:
--
Thanks Marcelo! Just a few notes on the API
[
https://issues.apache.org/jira/browse/SPARK-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3052:
-
Assignee: Sandy Ryza
Misleading and spurious FileSystem closed errors whenever a job fails while
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118913#comment-14118913
]
Matei Zaharia commented on SPARK-3098:
--
Yup, let's maybe document this for now. I'll
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118918#comment-14118918
]
Matei Zaharia commented on SPARK-3098:
--
Created SPARK-3356 to track this.
In some
Matei Zaharia created SPARK-3356:
Summary: Document when RDD elements' ordering within partitions is
nondeterministic
Key: SPARK-3356
URL: https://issues.apache.org/jira/browse/SPARK-3356
Project
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3098.
--
Resolution: Won't Fix
In some cases, operation zipWithIndex get a wrong results
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117622#comment-14117622
]
Matei Zaharia commented on SPARK-3098:
--
It's true that the ordering of values after
Hi Nicholas,
At Databricks we already run https://github.com/databricks/spark-perf for each
release, which is a more comprehensive performance test suite.
Matei
On September 1, 2014 at 8:22:05 PM, Nicholas Chammas
(nicholas.cham...@gmail.com) wrote:
What do people think of running the Big
[
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3010:
-
Assignee: wangfei
fix redundant conditional
-
Key
[
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3010.
--
Resolution: Fixed
Fix Version/s: (was: 1.1.0)
1.2.0
[
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3010:
-
Priority: Trivial (was: Major)
fix redundant conditional
[
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116923#comment-14116923
]
Matei Zaharia commented on SPARK-:
--
The slowdown might be partly due to adding
[
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116942#comment-14116942
]
Matei Zaharia commented on SPARK-:
--
I see, that makes sense.
Large number
. does it apply to both sides of the join, or only one
(while othe other side is streaming)?
On Sat, Aug 30, 2014 at 1:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
In 1.1, you'll be able to get all of these properties using sortByKey, and then
mapPartitions on top to iterate through the key
, Steve Lewis (lordjoe2...@gmail.com) wrote:
Is there a sample of how to do this -
I see 1.1 is out but cannot find samples of mapPartitions
A Java sample would be very useful
On Sat, Aug 30, 2014 at 10:30 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
In 1.1, you'll be able to get all
[
https://issues.apache.org/jira/browse/SPARK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2889.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Spark creates Hadoop Configuration objects
[
https://issues.apache.org/jira/browse/SPARK-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3318:
-
Assignee: Holden Karau
The documentation for addFiles is wrong
[
https://issues.apache.org/jira/browse/SPARK-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3318.
--
Resolution: Fixed
Fix Version/s: 1.2.0
The documentation for addFiles is wrong
In 1.1, you'll be able to get all of these properties using sortByKey, and then
mapPartitions on top to iterate through the key-value pairs. Unfortunately
sortByKey does not let you control the Partitioner, but it's fairly easy to
write your own version that does if this is important.
In
[
https://issues.apache.org/jira/browse/SPARK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3257:
-
Assignee: Heather Miller
Enable :cp to add JARs in spark-shell (Scala 2.11
Personally I'd actually consider putting CDH4 back if there are still users on
it. It's always better to be inclusive, and the convenience of a one-click
download is high. Do we have a sense on what % of CDH users still use CDH4?
Matei
On August 28, 2014 at 11:31:13 PM, Sean Owen
Yes, executors run one task per core of your machine by default. You can also
manually launch them with more worker threads than you have cores. What cluster
manager are you on?
Matei
On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen (v...@paxata.com) wrote:
I'm thinking of local mode
[
https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114247#comment-14114247
]
Matei Zaharia commented on SPARK-3277:
--
Thanks Mridul -- I think Andrew and Patrick
[
https://issues.apache.org/jira/browse/SPARK-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3239.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Choose disks for spilling randomly
[
https://issues.apache.org/jira/browse/SPARK-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3239:
-
Summary: Choose disks for spilling randomly in PySpark (was: Choose disks
for spilling randomly
[
https://issues.apache.org/jira/browse/SPARK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3256:
-
Summary: Enable :cp to add JARs in spark-shell (Scala 2.10) (was: Enable
:cp to add JARs
Matei Zaharia created SPARK-3257:
Summary: Enable :cp to add JARs in spark-shell (Scala 2.11)
Key: SPARK-3257
URL: https://issues.apache.org/jira/browse/SPARK-3257
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3256:
-
Fix Version/s: (was: 1.2.0)
Enable :cp to add JARs in spark-shell (Scala 2.10
[
https://issues.apache.org/jira/browse/SPARK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3256:
-
Assignee: Chip Senkbeil
Enable :cp to add JARs in spark-shell (Scala 2.10
[
https://issues.apache.org/jira/browse/SPARK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3256.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Enable :cp to add JARs in spark-shell (Scala
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112791#comment-14112791
]
Matei Zaharia commented on SPARK-3215:
--
Hey Marcelo, while this could be useful
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112943#comment-14112943
]
Matei Zaharia commented on SPARK-3215:
--
I think we should try this externally first
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113055#comment-14113055
]
Matei Zaharia commented on SPARK-3215:
--
As I mentioned above, there's more to it than
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113121#comment-14113121
]
Matei Zaharia commented on SPARK-3215:
--
The problem is just how different future
Matei Zaharia created SPARK-3271:
Summary: Delete unused methods in Utils
Key: SPARK-3271
URL: https://issues.apache.org/jira/browse/SPARK-3271
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3271.
--
Resolution: Fixed
Delete unused methods in Utils
[
https://issues.apache.org/jira/browse/SPARK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3265.
--
Resolution: Fixed
Fix Version/s: (was: 1.0.2)
1.2.0
[
https://issues.apache.org/jira/browse/SPARK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3265:
-
Affects Version/s: 1.1.0
Allow using custom ipython executable with pyspark
[
https://issues.apache.org/jira/browse/SPARK-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3271:
-
Assignee: wangfei
Delete unused methods in Utils
[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113294#comment-14113294
]
Matei Zaharia commented on SPARK-3215:
--
Okay, so my suggestion is do it separately
Awesome to hear this, Mayur! Thanks for putting this together.
Matei
On August 27, 2014 at 10:04:12 PM, Mayur Rustagi (mayur.rust...@gmail.com)
wrote:
Hi,
We have migrated Pig functionality on top of Spark passing 100% e2e for success
cases in pig test suite. That means UDF, Joins other
I think this will increasingly be its role, though it doesn't make sense to use
it to core because it is clearly just a client of the core APIs. What usage do
you have in mind in particular? It would be nice to know how the non-SQL APIs
for this could be better.
Matei
On August 27, 2014 at
You can use spark-shell -i file.scala to run that. However, that keeps the
interpreter open at the end, so you need to make your file end with
System.exit(0) (or even more robustly, do stuff in a try {} and add that in
finally {}).
In general it would be better to compile apps and run them
Awesome to hear this, Mayur! Thanks for putting this together.
Matei
On August 27, 2014 at 10:04:12 PM, Mayur Rustagi (mayur.rust...@gmail.com)
wrote:
Hi,
We have migrated Pig functionality on top of Spark passing 100% e2e for success
cases in pig test suite. That means UDF, Joins other
[
https://issues.apache.org/jira/browse/SPARK-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3073.
--
Resolution: Fixed
Fix Version/s: 1.2.0
improve large sort (external sort) for PySpark
[
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111631#comment-14111631
]
Matei Zaharia commented on SPARK-2926:
--
I see, thanks for posting the benchmarks
[
https://issues.apache.org/jira/browse/SPARK-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3225:
-
Priority: Trivial (was: Minor)
Typo in script
--
Key: SPARK
[
https://issues.apache.org/jira/browse/SPARK-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3225:
-
Assignee: WangTaoTheTonic
Typo in script
--
Key: SPARK-3225
[
https://issues.apache.org/jira/browse/SPARK-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3225.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Typo in script
[
https://issues.apache.org/jira/browse/SPARK-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3240:
-
Assignee: Martin Weindel
Document workaround for MESOS-1688
Matei Zaharia created SPARK-3240:
Summary: Document workaround for MESOS-1688
Key: SPARK-3240
URL: https://issues.apache.org/jira/browse/SPARK-3240
Project: Spark
Issue Type: Documentation
[
https://issues.apache.org/jira/browse/SPARK-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3240.
--
Resolution: Fixed
Document workaround for MESOS-1688
This shouldn't be a chicken-and-egg problem, since the script fetches the AMI
from a known URL. Seems like an issue in publishing this release.
On August 26, 2014 at 1:24:45 PM, Shivaram Venkataraman
(shiva...@eecs.berkeley.edu) wrote:
This is a chicken and egg problem in some sense. We can't
It should be fixed now. Maybe you have a cached version of the page in your
browser. Open DevTools (cmd-shift-I), press the gear icon, and check disable
cache while devtools open, then refresh the page to refresh without cache.
Matei
On August 26, 2014 at 7:31:18 AM, Nicholas Chammas
Is this a standalone mode cluster? We don't currently make this guarantee,
though it will likely work in 1.0.0 to 1.0.2. The problem though is that the
standalone mode grabs the executors' version of Spark code from what's
installed on the cluster, while your driver might be built against
You can use sc.wholeTextFiles to read each file as a complete String, though it
requires each file to be small enough for one task to process.
On August 26, 2014 at 4:01:45 PM, Chris Fregly (ch...@fregly.com) wrote:
i've seen this done using mapPartitions() where each partition represents a
You should try to find a Java-based library, then you can call it from Scala.
Matei
On August 26, 2014 at 6:58:11 PM, Wei Tan (w...@us.ibm.com) wrote:
Hi I am trying to find a CUDA library in Scala, to see if some matrix
manipulation in MLlib can be sped up.
I googled a few but found no
connect to an existing 1.0.0 cluster and see
what what happens...
Thanks, Matei :)
On Tue, Aug 26, 2014 at 6:37 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Is this a standalone mode cluster? We don't currently make this guarantee,
though it will likely work in 1.0.0 to 1.0.2. The problem
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110183#comment-14110183
]
Matei Zaharia commented on SPARK-3098:
--
Sorry, I don't understand -- what exactly
[
https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2976:
-
Summary: Replace tabs with spaces (was: Too many ugly tabs instead of
white spaces)
Replace
[
https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2976:
-
Assignee: Kousuke Saruta
Replace tabs with spaces
[
https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2976.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Replace tabs with spaces
.
This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.
On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks
launched as long as you leave at least 32 MB of memory
for this one.
Matei
On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) wrote:
This is kind of weird then, seems perhaps unrelated to this issue (or at least
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being
freed and didn't re-offer the machine *even
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
release?
One possibility is that your S3 bucket is in a remote Amazon region, which
would make it very slow. In my experience though saveAsTextFile has worked even
for pretty large datasets in that situation, so maybe
Chen (tnac...@gmail.com) wrote:
Hi Matei,
I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.
Tim
On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha
the synthetic operation and see if I get the same results or not.
Amnon
On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark
Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
release?
One
Hey Nicholas,
In general we've been looking at these periodically (at least I have) and
asking people to close out of date ones, but it's true that the list has gotten
fairly large. We should probably have an expiry time of a few months and close
them automatically. I agree that it's daunting
Have you tried the pipe() operator? It should work if you can launch your
script from the command line. Just watch out for any environment variables
needed (you can pass them to pipe() as an optional argument if there are some).
On August 25, 2014 at 12:41:29 AM, Jaonary Rabarisoa
It seems to be because you went there with https:// instead of http://. That
said, we'll fix it so that it works on both protocols.
Matei
On August 25, 2014 at 1:56:16 PM, Nick Chammas (nicholas.cham...@gmail.com)
wrote:
https://spark.apache.org/screencasts/1-first-steps-with-spark.html
The
Chen tnac...@gmail.com wrote:
+1 to have the work around in.
I'll be investigating from the Mesos side too.
Tim
On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
bad
.
This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.
On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks
launched as long as you leave at least 32 MB of memory
for this one.
Matei
On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) wrote:
This is kind of weird then, seems perhaps unrelated to this issue (or at least
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being
freed and didn't re-offer the machine *even
Chen (tnac...@gmail.com) wrote:
Hi Matei,
I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.
Tim
On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha
, coarse-grained mode would be a challenge as we have to
constantly remind people to kill their shells as soon as their queries finish.
Am I correct in viewing Mesos in coarse-grained mode as being similar to Spark
Standalone's cpu allocation behavior?
On Sat, Aug 23, 2014 at 7:16 PM, Matei
, coarse-grained mode would be a challenge as we have to
constantly remind people to kill their shells as soon as their queries finish.
Am I correct in viewing Mesos in coarse-grained mode as being similar to Spark
Standalone's cpu allocation behavior?
On Sat, Aug 23, 2014 at 7:16 PM, Matei
Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained
mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the
duration of the job.
Matei
On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote:
I just wanted to bring up a
TypeTags are unfortunately not thread-safe in Scala 2.10. They were still
somewhat experimental at the time so we decided not to use them. If you want
though, you can probably design other APIs that pass a TypeTag around (e.g.
make a method that takes an RDD[T] but also requires an implicit
You should be able to just download / unzip a Spark release and run it on a
Windows machine with the provided .cmd scripts, such as bin\spark-shell.cmd.
The scripts to launch a standalone cluster (e.g. start-all.sh) won't work on
Windows, but you can launch a standalone cluster manually using
Matei Zaharia created SPARK-3091:
Summary: Add support for caching metadata on Parquet files
Key: SPARK-3091
URL: https://issues.apache.org/jira/browse/SPARK-3091
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3085:
-
Target Version/s: 1.1.0
Use compact data structures in SQL joins
Matei Zaharia created SPARK-3084:
Summary: Collect broadcasted tables in parallel in joins
Key: SPARK-3084
URL: https://issues.apache.org/jira/browse/SPARK-3084
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3085:
-
Description: We can reuse the CompactBuffer from Spark Core. (was: We can
reuse
Thanks for sharing this, Brandon! Looks like a great architecture for people to
build on.
Matei
On August 15, 2014 at 2:07:06 PM, Brandon Amos (a...@adobe.com) wrote:
Hi Spark community,
At Adobe Research, we're happy to open source a prototype
technology called Spindle we've been
[
https://issues.apache.org/jira/browse/SPARK-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2736:
-
Priority: Major (was: Minor)
Create PySpark RDD from Apache Avro File
[
https://issues.apache.org/jira/browse/SPARK-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097701#comment-14097701
]
Matei Zaharia commented on SPARK-2736:
--
I bumped this up to Major because the PR also
[
https://issues.apache.org/jira/browse/SPARK-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2736.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Create PySpark RDD from Apache Avro File
[
https://issues.apache.org/jira/browse/SPARK-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2983.
--
Resolution: Fixed
Fix Version/s: 1.1.0
improve performance of sortByKey
What is your Spark executor memory set to? (You can see it in Spark's web UI at
http://driver:4040 under the executors tab). One thing to be aware of is that
the JVM never really releases memory back to the OS, so it will keep filling up
to the maximum heap size you set. Maybe 4 executors with
[
https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093845#comment-14093845
]
Matei Zaharia commented on SPARK-2967:
--
Good catch, this is a difference in behavior
Good question; I don't know of one but I believe people at Cloudera had some
thoughts of porting Sqoop to Spark in the future, and maybe they'd consider
DistCP as part of this effort. I agree it's missing right now.
Matei
On August 12, 2014 at 11:04:28 AM, Gary Malouf (malouf.g...@gmail.com)
[
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092418#comment-14092418
]
Matei Zaharia commented on SPARK-2962:
--
I thought this was fixed in https
[
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091685#comment-14091685
]
Matei Zaharia commented on SPARK-2926:
--
Hey Saisai, a couple of questions about
Your map-only job should not be shuffling, but if you want to see what's
running, look at the web UI at http://driver:4040. In fact the job should not
even write stuff to disk except inasmuch as the Hadoop S3 library might build
up blocks locally before sending them on.
My guess is that it's
Hi everyone,
The PMC recently voted to add two new committers and PMC members: Joey Gonzalez
and Andrew Or. Both have been huge contributors in the past year -- Joey on
much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew
on Spark Core. Join me in welcoming them as
Just as a note, when you're developing stuff, you can use test-only in sbt,
or the equivalent feature in Maven, to run just some of the tests. This is what
I do, I don't wait for Jenkins to run things. 90% of the time if it passes the
tests that I know could break stuff, it will pass all of
[
https://issues.apache.org/jira/browse/SPARK-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2887:
-
Assignee: Davies Liu
RDD.countApproxDistinct() is wrong when RDD has more one partition
Emails sent from Nabble have it, while others don't. Unfortunately I haven't
received a reply from ASF infra on this yet.
Matei
On August 5, 2014 at 2:04:10 PM, Nicholas Chammas (nicholas.cham...@gmail.com)
wrote:
Looks like this feature has been turned off. Are these changes intentional? Or
501 - 600 of 2046 matches
Mail list logo