date:20161215

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-15 Thread Reynold Xin

I'm going to start this with a +1! On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > In addition to usual binary artifacts, this is the first release where > we have installable packages for Python [1] and R [2] that are part of > the release. I'm

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-15 Thread Shivaram Venkataraman

In addition to usual binary artifacts, this is the first release where we have installable packages for Python [1] and R [2] that are part of the release. I'm including instructions to test the R package below. Holden / other Python developers can chime in if there are special instructions to

[VOTE] Apache Spark 2.1.0 (RC5)

2016-12-15 Thread Reynold Xin

Please vote on releasing the following candidate as Apache Spark version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.1.0 [ ] -1 Do not release this package because ...

Re: spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Marcelo Vanzin

I posted a PR; the solution I suggested seems to work (and is simpler than breaking spark-tags into multiple artifacts). On Thu, Dec 15, 2016 at 4:46 PM, Ryan Williams wrote: > ah I see this thread now, thanks; interestingly I don't think the solution > I've

Re: spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Ryan Williams

ah I see this thread now, thanks; interestingly I don't think the solution I've proposed here (splitting spark-tags' test-bits into a "-tests" JAR and having spark-core

Re: spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Marcelo Vanzin

You're right; we had a discussion here recently about this. I'll re-open that bug, if you want to send a PR. (I think it's just a matter of making the scalatest dependency "provided" in spark-tags, if I remember the discussion.) On Thu, Dec 15, 2016 at 4:15 PM, Ryan Williams

spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Ryan Williams

spark-core depends on spark-tags (compile scope) which depends on scalatest (compile scope), so spark-core leaks test-deps into downstream libraries' "compile"-scope classpath. The cause is that spark-core has logical "test->test" and "compile->compile" dependencies on spark-tags, but spark-tags

Spark 2.1.0-rc3 cut

2016-12-15 Thread Reynold Xin

Committers please use 2.1.1 as the fix version for patches merged into the branch. I will post a voting email once the packaging is done.

Re: Expand the Spark SQL programming guide?

2016-12-15 Thread Anton Okolnychyi

I think it will make sense to show a sample implementation of UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator API for typed Datasets. Jim, what if I submit a PR and you join the review process? I also do not mind to split this if you want, but it seems to be an

Re: Is restarting of SparkContext allowed?

2016-12-15 Thread Alexey Klimov

I also wanted to ask that if it's not design way to use SparkContext how much would it take to get it work completely correct. (e.g. are there any other singletones that can preserve other state between running different SparkContext's). -- View this message in context:

Is restarting of SparkContext allowed?

2016-12-15 Thread Alexey Klimov

Hello, my question is the continuation of a problem I described here . I've done some investigation and found out that nameNode.getDelegationToken is called during

Re: Output Side Effects for different chain of operations

2016-12-15 Thread Chawla,Sumit

I am already creating these files on slave. How can i create an RDD from these slaves? Regards Sumit Chawla On Thu, Dec 15, 2016 at 11:42 AM, Reynold Xin wrote: > You can just write some files out directly (and idempotently) in your > map/mapPartitions functions. It is

Re: Expand the Spark SQL programming guide?

2016-12-15 Thread Jim Hughes

Hi Anton, I'd like to see this as well. I've been working on implementing geospatial user-defined types and functions. Having examples of aggregations and window functions would be awesome! I did test out implementing a distributed convex hull as a UserDefinedAggregateFunction, and that

Re: Output Side Effects for different chain of operations

2016-12-15 Thread Reynold Xin

You can just write some files out directly (and idempotently) in your map/mapPartitions functions. It is just a function that you can run arbitrary code after all. On Thu, Dec 15, 2016 at 11:33 AM, Chawla,Sumit wrote: > Any suggestions on this one? > > Regards > Sumit

Re: Output Side Effects for different chain of operations

2016-12-15 Thread Chawla,Sumit

Any suggestions on this one? Regards Sumit Chawla On Tue, Dec 13, 2016 at 8:31 AM, Chawla,Sumit wrote: > Hi All > > I have a workflow with different steps in my program. Lets say these are > steps A, B, C, D. Step B produces some temp files on each executor node. >

Mistake in Apache Spark Java.

2016-12-15 Thread Mario Fernandez Villa

Hello, My name is Mario Fernández and I´m a Big Data developer, I usually program in Apache Spark in Java, and we have a big problem to read properly a csv file. The issue is that: When I want to read csv file, for instance, with semicolon delimiter, the dataframe take semicolon like

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-15 Thread Reynold Xin

In general this falls directly into the domain of external cluster managers (YARN, Mesos, Kub). The standalone thing was meant as a simple way to deploy Spark, and we gotta be careful with introducing a lot more features to it because then it becomes just a full fledged cluster manager and is

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-15 Thread Hegner, Travis

Thanks for the response Jörn, This patch is intended only for spark standalone. My understanding of the YARN cgroup support is that it only limits cpu, rather than allocates it based on the priority or shares system. This could be old documentation that I'm remembering, however. Another issue

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-15 Thread Jörn Franke

Hi, What about yarn or mesos used in combination with Spark. The have also cgroups. Or a kubernetes etc deployment. > On 15 Dec 2016, at 17:37, Hegner, Travis wrote: > > Hello Spark Devs, > > > I have finally completed a mostly working proof of concept. I do not want

Forking or upgrading Apache Parquet in Spark

2016-12-15 Thread Dongjoon Hyun

Hi, All. I made a PR to upgrade Parquet to 1.9.0 for Apache Spark 2.2 on Late March. - https://github.com/apache/spark/pull/16281 Currently, there occurs some important options about that. Here is the summary. 1. Forking Parquet 1.8.X and maintaining like Spark Hive. 2. Wait and see for

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-15 Thread Hegner, Travis

Hello Spark Devs, I have finally completed a mostly working proof of concept. I do not want to create a pull request for this code, as I don't believe it's production worthy at the moment. My intent is to better communicate what I'd like to accomplish. Please review the following patch:

Re: Document Similarity -Spark Mllib

2016-12-15 Thread Liang-Chi Hsieh

OK. I go to check the DIMSUM implementation in Spark MLlib. The probability a column is sampled is decided by math.sqrt(10 * math.log(nCol) / threshold) / colMagnitude. The most influential parameter is colMagnitude. If in your dataset, the colMagnitude for most columns is very low, then looks

Re: Belief propagation algorithm is open sourced

2016-12-15 Thread Bertrand Dechoux

Nice! I am especially interested in Bayesian Networks, which are only one of the many models that can be expressed by a factor graph representation. Do you do Bayesian Networks learning at scale (parameters and structure) with latent variables? Are you using publicly available tools for that?

Expand the Spark SQL programming guide?

2016-12-15 Thread Anton Okolnychyi

Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Re: [VOTE] Apache Spark 2.1.0 (RC5)

[VOTE] Apache Spark 2.1.0 (RC5)

Re: spark-core "compile"-scope transitive-dependency on scalatest

Re: spark-core "compile"-scope transitive-dependency on scalatest

Re: spark-core "compile"-scope transitive-dependency on scalatest

spark-core "compile"-scope transitive-dependency on scalatest

Spark 2.1.0-rc3 cut

Re: Expand the Spark SQL programming guide?

Re: Is restarting of SparkContext allowed?

Is restarting of SparkContext allowed?

Re: Output Side Effects for different chain of operations

Re: Expand the Spark SQL programming guide?

Re: Output Side Effects for different chain of operations

Re: Output Side Effects for different chain of operations

Mistake in Apache Spark Java.

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

Forking or upgrading Apache Parquet in Spark

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

Re: Document Similarity -Spark Mllib

Re: Belief propagation algorithm is open sourced

Expand the Spark SQL programming guide?

24 matches

Site Navigation

Mail list logo

Footer information