I think last time I tried I had some trouble releasing it because the
release scripts no longer work with branch-1.4. You can build from the
branch yourself, but it might be better to upgrade to the later versions.
On Wed, Jul 6, 2016 at 11:02 PM, Niranda Perera
wrote:
> Hi guys,
>
> May I know
See https://issues.apache.org/jira/browse/SPARK-16390
On Sat, Jul 2, 2016 at 6:35 PM, Reynold Xin wrote:
> Thanks, Koert, for the great email. They are all great points.
>
> We should probably create an umbrella JIRA for easier tracking.
>
>
> On Saturday, July 2, 2016, Ko
Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, July 8, 2016 at 23:00 PDT and passes
if a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...
> people will be asking about the reasons Spark does this. Where are
> > such issues reported usually?
> >
> > Pozdrawiam,
> > Jacek Laskowski
> >
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> &g
Jacek,
This is definitely not necessary, but I wouldn't waste cycles "fixing"
things like this when they have virtually zero impact. Perhaps next time we
update this code we can "fix" it.
Also can you comment on the pull request directly?
On Tue, Jul 5, 2016 at 1:07 PM, Jacek Laskowski wrote:
Please consider this vote canceled and I will work on another RC soon.
On Tue, Jun 21, 2016 at 6:26 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a ma
This seems like a Scala compiler bug.
On Tuesday, July 5, 2016, Jacek Laskowski wrote:
> Well, there is foreach for Java and another foreach for Scala. That's
> what I can understand. But while supporting two language-specific APIs
> -- Scala and Java -- Dataset API lost support for such simple
Thanks, Koert, for the great email. They are all great points.
We should probably create an umbrella JIRA for easier tracking.
On Saturday, July 2, 2016, Koert Kuipers wrote:
> after working with the Dataset and Aggregator apis for a few weeks porting
> some fairly complex RDD algos (an overall
Because in that case you cannot merge anything meant for 2.1 until 2.0 is
released.
On Saturday, July 2, 2016, Jacek Laskowski wrote:
> Hi,
>
> Always release from master. What could be the gotchas?
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache
There isn't one pre-made, but the default works out OK. The main thing
you'd need to update are spacing changes for function argument indentation
and import ordering.
On Fri, Jul 1, 2016 at 4:11 AM, Anton Okolnychyi wrote:
> Hi, all.
>
> I've read the Spark code style guide.
> I am wondering if
Multiple instances of test runs are usually running in parallel, so they
would need to bind to different ports.
On Friday, July 1, 2016, Cody Koeninger wrote:
> Thanks for the response. I'm talking about test code that starts up
> embedded network services for integration testing.
>
> KafkaTest
kes
> lots of time.
>
> Not sure what could be done here.
>
> Thanks
>
> On Thu, Jun 30, 2016 at 10:10 PM, Reynold Xin wrote:
>
>> Which version are you using here? If the underlying files change,
>> technically we should go through optimization again.
>>
Which version are you using here? If the underlying files change,
technically we should go through optimization again.
Perhaps the real "fix" is to figure out why is logical plan creation so
slow for 700 columns.
On Thu, Jun 30, 2016 at 1:58 PM, Darshan Singh
wrote:
> Is there a way I can use
Yes, scheduling is centralized in the driver.
For debugging, I think you'd want to set the executor JVM, not the worker
JVM flags.
On Thu, Jun 30, 2016 at 11:36 AM, cbruegg wrote:
> Hello everyone,
>
> I'm a student assistant in research at the University of Paderborn, working
> on integrating
If people want this to happen, please go comment on the INFRA ticket:
https://issues.apache.org/jira/browse/INFRA-12185
Otherwise it will probably be dropped.
On Mon, Jun 27, 2016 at 7:04 PM, Reynold Xin wrote:
> Filed infra ticket: https://issues.apache.org/jira/browse/INFRA-12
Filed infra ticket: https://issues.apache.org/jira/browse/INFRA-12185
On Mon, Jun 27, 2016 at 10:02 AM, Reynold Xin wrote:
> Let me look into this...
>
>
> On Monday, June 27, 2016, Nicholas Chammas
> wrote:
>
>> Howdy,
>>
>> It seems like every week
We are happy to announce the availability of Spark 1.6.2! This maintenance
release includes fixes across several areas of Spark. You can find the list
of changes here: https://s.apache.org/spark-1.6.2
And download the release here: http://spark.apache.org/downloads.html
Yup this is bad. Can you create a JIRA ticket too?
On Mon, Jun 27, 2016 at 12:22 PM, Koert Kuipers wrote:
> hey,
>
> since SPARK-15982 was fixed (https://github.com/apache/spark/pull/13727)
> i believe all external DataSources that rely on using .load(path) without
> being a FileFormat themselv
Let me look into this...
On Monday, June 27, 2016, Nicholas Chammas
wrote:
> Howdy,
>
> It seems like every week we have at least a couple of people emailing the
> user list in vain with "Unsubscribe" in the subject, the body, or both.
>
> I remember a while back that every email on the user lis
Vote passed. Please see below. I will work on packaging the release.
+1 (9 votes, 4 binding)
Reynold Xin*
Sean Owen*
Tim Hunter
Michael Armbrust*
Sean McNamara*
Kousuke Saruta
Sameer Agarwal
Krishna Sankar
Vaquar Khan
0
none
-1
Maciej Bryński
* binding votes
On Sun, Jun 19, 2016 at 9:24 PM
5.0. Packages
>> 5.1. com.databricks.spark.csv - read/write OK (--packages
>> com.databricks:spark-csv_2.10:1.4.0)
>> 6.0. DataFrames
>> 6.1. cast,dtypes OK
>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
>> 6.3. All joins,sql,set operations,udf OK
>>
are people that develop for Spark on Windows. The
> referenced issue is indeed Minor and has nothing to do with unit tests.
>
>
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* Wednesday, June 22, 2016 4:09 PM
> *To:* Marcelo Vanzin
> *Cc:* Ul
+1 myself
On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara wrote:
> +1
>
> On Jun 22, 2016, at 1:14 PM, Michael Armbrust
> wrote:
>
> +1
>
> On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
> wrote:
>
>> +1
>>
>> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
>> wrote:
>>
>>> +1 This release pas
SPARK-12818 is about building a bloom filter on existing data. It has
nothing to do with the ORC bloom filter, which can be used to do predicate
pushdown.
On Tue, Jun 21, 2016 at 7:45 PM, BaiRan wrote:
> Hi all,
>
> I have a question about bloom filter implementation in Spark-12818 issue.
> If
Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
if a majority of at least 3+1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...
Please vote on releasing the following candidate as Apache Spark version
1.6.2. The vote is open until Wednesday, June 22, 2016 at 22:00 PDT and
passes if a majority of at least 3+1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.6.2
[ ] -1 Do not release this package because ...
.Assert.fail(Assert.java:86)
>
> at org.junit.Assert.assertTrue(Assert.java:41)
>
> at org.junit.Assert.assertNotNull(Assert.java:621)
>
> at org.junit.Assert.assertNotNull(Assert.java:631)
>
> at
> org.apache.spark.launcher.LauncherServerSuite.testCommunication(LauncherSe
Thanks for the kind words, Krishna! Please keep the feedback coming.
On Saturday, June 18, 2016, Krishna Sankar wrote:
> Hi all,
>Just wanted to thank all for the dataset API - most of the times we see
> only bugs in these lists ;o).
>
>- Putting some context, this weekend I was updating
Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.2!
> >
> > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a
> > majority of at least 3+1 PMC votes are cast.
&
Please go for it!
On Friday, June 17, 2016, Pedro Rodriguez wrote:
> I would be open to working on Dataset documentation if no one else isn't
> already working on it. Thoughts?
>
> On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian > wrote:
>
>> As mentioned in the PR description, this is just an ini
Cody has graciously worked on a new connector for dstream for Kafka 0.10.
Can people that use Kafka test this connector out? The patch is at
https://github.com/apache/spark/pull/11863
Although we have stopped merging new features into branch-2.0, this
connector is very decoupled from rest of Spark
Please vote on releasing the following candidate as Apache Spark version
1.6.2!
The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a
majority of at least 3+1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.6.2
[ ] -1 Do not release this package because ...
You should be fine in 1.6 onward. Count distinct doesn't require data to
fit in memory there.
On Thu, Jun 16, 2016 at 1:57 AM, Avshalom wrote:
> Hi all,
>
> We would like to perform a count distinct query based on a certain filter.
> e.g. our data is of the form:
>
> userId, Name, Restaurant na
Are you running Spark on YARN, Mesos, Standalone? For all of them you can
make the Hive dependency just part of your application, and then you can
manage this pretty easily.
On Wed, Jun 15, 2016 at 2:35 AM, Rostyslav Sotnychenko <
r.sotnyche...@gmail.com> wrote:
> Hello!
>
> I have a question re
It's been a while and we have accumulated quite a few bug fixes in
branch-1.6. I'm thinking about cutting 1.6.2 rc this week. Any patches
somebody want to get in last minute?
On a related note, I'm thinking about cutting 2.0.0 rc this week too. I
looked at the 60 unresolved tickets and almost all
You just need to run normal packaging and all the scripts are now setup to
run without the assembly jars.
On Tuesday, June 14, 2016, Franklyn D'souza
wrote:
> Just wondering where the spark-assembly jar has gone in 2.0. i've been
> reading that its been removed but i'm not sure what the new work
Thanks for the email. Things like this (and bugs) are exactly the reason
the preview releases exist. It seems like enough people have run into
problem with this one that maybe we should just bring it back for backward
compatibility.
On Monday, June 13, 2016, Egor Pahomov wrote:
> In May due to t
Did you try this on master?
On Mon, Jun 13, 2016 at 11:26 AM, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:
> Hi,
>
> Running the first query of tpcds on a standalone setup (4 nodes, tpcds2
> generated for scale 10 and transformed in parquet under hdfs) it results
> in one exce
Is this just to set some string? That makes sense. One thing you would need
to make sure is that Spark should still work outside of Hadoop, and also in
older versions of Hadoop.
On Thu, Jun 9, 2016 at 4:37 PM, Weiqing Yang
wrote:
> Hi,
>
> Hadoop has implemented a feature of log tracing – caller
Yes you can :)
On Wed, Jun 8, 2016 at 6:00 PM, Alexander Pivovarov
wrote:
> Can I just enable spark.kryo.registrationRequired and look at error
> messages to get unregistered classes?
>
> On Wed, Jun 8, 2016 at 5:52 PM, Reynold Xin wrote:
>
>> Due to type erasure th
Due to type erasure they have no difference, although watch out for Scala
tuple serialization.
On Wednesday, June 8, 2016, Ted Yu wrote:
> I think the second group (3 classOf's) should be used.
>
> Cheers
>
> On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov > wrote:
>
>> if my RDD is RDD[(St
Take a look at the implementation of typed sum/avg:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scalalang/typed.scala
You can implement a typed max/min.
On Tue, Jun 7, 2016 at 4:31 PM, Alexander Pivovarov
wrote:
> Ted, It does not work l
Thanks for the email. How do you deal with in-memory state that reference
the classes? This can happen in both streaming and caching in RDD and
temporary view creation in SQL.
On Mon, Jun 6, 2016 at 3:40 PM, S. Kai Chen wrote:
> Hi,
>
> We use spark-shell heavily for ad-hoc data analysis as well
Thanks for fixing it!
On Mon, Jun 6, 2016 at 1:49 PM, Imran Rashid wrote:
> Hi all,
>
> just a heads up, I introduced a flaky test, BlacklistIntegrationSuite, a
> week ago or so. I *thought* I had solved the problems, but turns out there
> was more flakiness remaining. for now I've just turne
The bahir one was a good argument actually. I just clicked the button to
push it into Maven central.
On Mon, Jun 6, 2016 at 12:00 PM, Mark Hamstra
wrote:
> Fine. I don't feel strongly enough about it to continue to argue against
> putting the artifacts on Maven Central.
>
> On Mon, Jun 6, 2016
Congratulations, Yanbo!
On Friday, June 3, 2016, Matei Zaharia wrote:
> Hi all,
>
> The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a
> super active contributor in many areas of MLlib. Please join me in
> welcoming Yanbo!
>
> Matei
> -
Also what happens if we want to do a second preview release? The naming
> doesn't seem to allow then unless we call it preview 2.
>
> Tom
>
>
> On Wednesday, June 1, 2016 6:27 PM, Sean Owen wrote:
>
>
> On Wed, Jun 1, 2016 at 5:58 PM, Reynold Xin wrote:
> >
Hi Sean,
(writing this email with my Apache hat on only and not Databricks hat)
The preview release is available here:
http://spark.apache.org/downloads.html (there is an entire section
dedicated to it and also there is a news link to it on the right).
Again, I think this is a good opportunity t
To play devil's advocate, previews are technically not RCs. They are
actually voted releases.
On Wed, Jun 1, 2016 at 1:46 PM, Michael Armbrust
wrote:
> Yeah, we don't usually publish RCs to central, right?
>
> On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin wrote:
>
>&
They are here ain't they?
https://repository.apache.org/content/repositories/orgapachespark-1182/
Did you mean publishing them to maven central? My understanding is that
publishing to maven central isn't a required step of doing theses. This
might be a good opportunity to discuss that. My thought
I think your understanding is correct. There will be external libraries
that allow you to use the twitter streaming dstream API even in 2.0 though.
On Sat, May 28, 2016 at 8:37 AM, Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:
> As far as I could understand...
> 1. Using Python (PySpark
They should get printed if you turn on debug level logging.
On Fri, May 27, 2016 at 1:00 PM, Koert Kuipers wrote:
> hello all,
> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
> trying to run some algorithms at scale on our cluster.
> unfortunately this means that when i
Here's a ticket: https://issues.apache.org/jira/browse/SPARK-15598
On Fri, May 20, 2016 at 12:35 AM, Reynold Xin wrote:
> Andres - this is great feedback. Let me think about it a little bit more
> and reply later.
>
>
> On Thu, May 19, 2016 at 11:12 AM, Andres Perez
Yup - but the reason we did the null handling that way was for Python,
which also affects Scala.
On Thu, May 26, 2016 at 4:17 PM, Koert Kuipers wrote:
> ok, thanks for creating ticket.
>
> just to be clear: my example was in scala
>
> On Thu, May 26, 2016 at 7:07 PM, Rey
This is unfortunately due to the way we set handle default values in
Python. I agree it doesn't follow the principle of least astonishment.
Maybe the best thing to do here is to put the actual default values in the
Python API for csv (and json, parquet, etc), rather than using None in
Python. This
It's probably a good idea to have the vertica dialect too, since it doesn't
seem like it'd be too difficult to maintain. It is not going to be as
performant as the native Vertica data source, but is going to be much
lighter weight.
On Thu, May 26, 2016 at 3:09 PM, Mohammed Guller
wrote:
> Verti
I think the risk is everybody starts following this, then this will be
unmanageable, given the size of the number of organizations involved.
The two main labels that we actually use are starter + releasenotes.
On Wed, May 25, 2016 at 2:58 PM, Luciano Resende
wrote:
>
>
> On Wed, May 25, 2016 at
ily replaced by .flatMap (to do explosion) and
> .select (to rename output columns)
>
> Cheng
>
>
> On 5/25/16 12:30 PM, Reynold Xin wrote:
>
> Based on this discussion I'm thinking we should deprecate the two explode
> functions.
>
> On Wednesday, May 25, 2016, Ko
, 2016 at 8:30 AM, Reynold Xin wrote:
> Yup I have published it to maven. Will post the link in a bit.
>
> One thing is that for developers, it might be better to use the nightly
> snapshot because that one probably has fewer bugs than the preview one.
>
>
> On Wednesday, May 25,
helpful for preparing for the migration. Do you
> plan to push 2.0.0-preview to Maven too? (I for one would appreciate the
> convenience.)
>
> On Wed, May 25, 2016 at 8:44 AM, Reynold Xin > wrote:
>
>> In the past the Spark community have created preview packages (not
>
In the past the Spark community have created preview packages (not official
releases) and used those as opportunities to ask community members to test
the upcoming versions of Apache Spark. Several people in the Apache
community have suggested we conduct votes for these preview packages and
turn th
Thanks, Koert. This is great. Please keep them coming.
On Tue, May 24, 2016 at 9:27 AM, Koert Kuipers wrote:
> https://issues.apache.org/jira/browse/SPARK-15507
>
> On Tue, May 24, 2016 at 12:21 PM, Ted Yu wrote:
>
>> Please log a JIRA.
>>
>> Thanks
>>
>> On Tue, May 24, 2016 at 8:33 AM, Koert
Kubernetes itself already has facilities for http proxy, doesn't it?
On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh wrote:
> Hi,
>
> I am currently working on deploying Spark on kuberentes (K8s) and it is
> working fine. I am running Spark with standalone mode and checkpointing
> the state to
This vote passes with 14 +1s (5 binding*) and no 0 or -1! Thanks to
everyone who voted. I'll start work on publishing the release.
+1:
Reynold Xin*
Sean Owen*
Ovidiu-Cristian MARCU
Krishna Sankar
Michael Armbrust*
Yin Huai
Joseph Bradley*
Xiangrui Meng*
Herman van Hövell tot Westerflier
V
It's probably due to GC.
On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote:
> Hi All,
> I am here to get some expert advice on a use case I am working on.
>
> Cluster & job details below -
>
> Data - 6 Tb
> Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)
>
> Parameters-
> --execut
Andres - this is great feedback. Let me think about it a little bit more
and reply later.
On Thu, May 19, 2016 at 11:12 AM, Andres Perez wrote:
> Hi all,
>
> We were in the process of porting an RDD program to one which uses
> Datasets. Most things were easy to transition, but one hole in
> fun
I filed https://issues.apache.org/jira/browse/SPARK-15441
On Thu, May 19, 2016 at 8:48 AM, Andres Perez wrote:
> Hi all, I'm getting some odd behavior when using the joinWith
> functionality for Datasets. Here is a small test case:
>
> val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).
The old one is deprecated but should still work though.
On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty
wrote:
> Hi Doug,
>
> If you look at the API docs here:
> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
; at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Del
19, 2016, 2:43 AM Reynold Xin > wrote:
>
>> Users would be able to run this already with the 3 lines of code you
>> supplied right? In general there are a lot of methods already on
>> SparkContext and we lean towards the more conservative side in introducing
>> new API
Users would be able to run this already with the 3 lines of code you
supplied right? In general there are a lot of methods already on
SparkContext and we lean towards the more conservative side in introducing
new API variants.
Note that this is something we are doing automatically in Spark SQL for
rowse/SPARK-15370?jql=project%20=%20SPARK%20AND%20resolution%20=%20Unresolved%20AND%20affectedVersion%20=%202.0.0>
>
> To rephrase: for 2.0 do you have specific issues that are not a priority
> and will released maybe with 2.1 for example?
>
> Keep up the good work!
>
> On 1
or people using Spark with Mesos.
>
> Thanks!
> Mike
>
> From: on behalf of Reynold Xin
> Date: Wednesday, May 18, 2016 at 6:40 AM
> To: "dev@spark.apache.org"
> Subject: [vote] Apache Spark 2.0.0-preview release (rc1)
>
> Hi,
>
> In the past the Apac
On 18 May 2016, at 16:28, Sean Owen wrote:
>
> I think it's a good idea. Although releases have been preceded before
> by release candidates for developers, it would be good to get a formal
> preview/beta release ratified for public consumption ahead of a new
> major release. Bett
Hi,
In the past the Apache Spark community have created preview packages (not
official releases) and used those as opportunities to ask community members
to test the upcoming versions of Apache Spark. Several people in the Apache
community have suggested we conduct votes for these preview packages
It seems like the problem here is that we are not using unique names
for mapelements_isNull?
On Tue, May 17, 2016 at 3:29 PM, Koert Kuipers wrote:
> hello all, we are slowly expanding our test coverage for spark
> 2.0.0-SNAPSHOT to more in-house projects. today i ran into this issue...
>
> thi
It might be best to fix this with fallback first, and then figure out how
we can do it more intelligently.
On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray wrote:
> Hi,
>
> I've raised JIRA SPARK-15258 (with code attached to re-produce problem)
> and would like to have a go at fixing it but don'
Sure go for it. Thanks.
On Thu, May 12, 2016 at 11:41 PM, 段石石 wrote:
> Hi, all:
>
>
> I have add takeSample in the DataFrame, which sampling with specify num of
> the rows. It has a similary version in RDD, but not supported in DataFrame
> now. And now local test is done, Is it ok to make a pr
2, 2016 at 3:35 PM, Reynold Xin wrote:
> > That's true. I think I want to differentiate end-user vs developer.
> Public
> > isn't the best word. Maybe EndUser?
> >
> > On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman
> > wrote:
> >>
>
That's true. I think I want to differentiate end-user vs developer. Public
isn't the best word. Maybe EndUser?
On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote:
> > We curre
We currently have three levels of interface annotation:
- unannotated: stable public API
- DeveloperApi: A lower-level, unstable API intended for developers.
- Experimental: An experimental user-facing API.
After using this annotation for ~ 2 years, I would like to propose the
following changes:
Adding Kay
On Wed, May 11, 2016 at 12:01 PM, Brian Cho wrote:
> Hi,
>
> I'm interested in adding read-time (from HDFS) to Task Metrics. The
> motivation is to help debug performance issues. After some digging, its
> briefly mentioned in SPARK-1683 that this feature didn't make it due to
> metri
Probably not. Want to submit a pull request?
On Tuesday, May 3, 2016, Koert Kuipers wrote:
> yes it works fine if i switch to using the implicits on the SparkSession
> (which is a val)
>
> but do we want to break the old way of doing the import?
>
> On Tue, May 3, 2016 at 12:56 PM, Ted Yu > wro
Thanks, Shane!
On Monday, May 2, 2016, shane knapp wrote:
> workers -01 and -04 are back up, is is -06 (as i hit the wrong power
> button by accident). :)
>
> -01 and -04 got hung on shutdown, so i'll investigate them and see
> what exactly happened. regardless, we should be building happily!
Definitely looks like a bug.
Ted - are you looking at this?
On Mon, May 2, 2016 at 7:15 AM, Koert Kuipers wrote:
> Created issue:
> https://issues.apache.org/jira/browse/SPARK-15062
>
> On Mon, May 2, 2016 at 6:48 AM, Ted Yu wrote:
>
>> I tried the same statement using Spark 1.6.1
>> There wa
Hi devs,
Three weeks ago I mentioned on the dev list creating branch-2.0
(effectively "feature freeze") in 2 - 3 weeks. I've just created Spark's
branch-2.0 to form the basis of the 2.0 release. We have closed ~ 1700
issues. That's huge progress, and we should celebrate that.
Compared with past r
This is a nice feature in broadcast join. It is just a little bit
complicated to do and as a result hasn't been prioritized as highly yet.
On Thu, Apr 28, 2016 at 5:51 AM, wrote:
> I was aiming to show the operations with pseudo-code, but I apparently
> failed, so Java it is J
>
> Assume the fo
Hm while this is an attractive idea in theory, in practice I think you are
substantially overestimating HDFS' ability to handle a lot of small,
ephemeral files. It has never really been optimized for that use case.
On Thu, Apr 28, 2016 at 11:15 AM, Michael Gummelt
wrote:
> > if after a work-load
Usually no - but sortByKey does because it needs the range boundary to be
built in order to have the RDD. It is a long standing problem that's
unfortunately very difficult to solve without breaking the RDD API.
In DataFrame/Dataset we don't have this issue though.
On Sun, Apr 24, 2016 at 10:54 P
I pushed a commit to close all but the last one.
On Sat, Apr 23, 2016 at 2:08 AM, Sean Owen wrote:
> Except for the last one I think they're closeable. We can't close any
> PR directly. It's possible to push an empty commit with comments like
> "Closes #" to make the ASF processes close the
t;>
>> Tom
>>
>>
>> On Monday, April 18, 2016 5:23 PM, Marcelo Vanzin
>> wrote:
>>
>>
>> On Mon, Apr 18, 2016 at 3:09 PM, Reynold Xin wrote:
>> > IIUC, the reason for that PR is that they found the string comparison to
>> > increase t
Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian)
syntax? They are not being rendered in email.
On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote:
> bq. it's actually in use right now in spite of not being in any upstream
> HBase release
>
> If it is not in upstream, then
Yea in general I feel examples that bring in a large amount of dependencies
should be outside Spark.
On Tue, Apr 19, 2016 at 10:15 AM, Marcelo Vanzin
wrote:
> Hey all,
>
> Two reasons why I think we should remove that from the examples:
>
> - HBase now has Spark integration in its own repo, so
35 PM, Reynold Xin wrote:
> But doExecute is not called?
>
> On Mon, Apr 18, 2016 at 10:32 PM, Zhan Zhang
> wrote:
>
>> Hi Reynold,
>>
>> I just check the code for CollectLimit, there is a shuffle happening to
>> collect them in one partition.
>>
>
think that is the reason. Do I understand correctly?
>
> Thanks.
>
> Zhan Zhang
> On Apr 18, 2016, at 10:08 PM, Reynold Xin wrote:
>
> Unless I'm really missing something I don't think so. As I said, it goes
> through an iterator and after processing each stre
the wholeStageCodeGen.
>
> Correct me if I am wrong.
>
> Thanks.
>
> Zhan Zhang
>
> On Apr 18, 2016, at 11:09 AM, Reynold Xin wrote:
>
> I could be wrong but I think we currently do that through whole stage
> codegen. After processing every row on the stream side,
gt;> bandwidth to check on things, will be very discouraging to new folks.
>>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>>> confidence are going to
zin wrote:
> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin wrote:
> > The bigger problem is that it is much easier to maintain backward
> > compatibility rather than dictating forward compatibility. For example,
> as
> > Marcin said, if we come up with a slightly different shu
mance, we wouldn't be able to do that if we want to
allow Spark 1.6 shuffle service to read something generated by Spark 2.1.
On Mon, Apr 18, 2016 at 1:59 PM, Marcelo Vanzin wrote:
> On Mon, Apr 18, 2016 at 1:53 PM, Reynold Xin wrote:
> > That's not the only one. For example,
That's not the only one. For example, the hash shuffle manager has been off
by default since Spark 1.2, and we'd like to remove it in 2.0:
https://github.com/apache/spark/pull/12423
How difficult it is to just change the package name to say v2?
On Mon, Apr 18, 2016 at 1:51 PM, Mark Grover wrot
601 - 700 of 1414 matches
Mail list logo