Alright I have merged the patch ( https://github.com/apache/spark/pull/4173
) since I don't see any strong opinions against it (as a matter of fact
most were for it). We can still change it if somebody lays out a strong
argument.
On Tue, Jan 27, 2015 at 12:25 PM, Matei Zaharia
wrote:
> The type
+1
Tested on Mac OS X
On Tue, Jan 27, 2015 at 12:35 PM, Krishna Sankar
wrote:
> +1
> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min
> mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
> -Dhadoop.version=2.6.0 -Phive -DskipTests
> 2. Tested pyspark, mlib - running as wel
Hi Patrick:
I would love to help reviewing in any way I can. Im fairly new here. Can you
help with a pointer to get me started.
Thanks
From: Patrick Wendell
To: "dev@spark.apache.org"
Sent: Tuesday, January 27, 2015 3:56 PM
Subject: Friendly reminder/request to help with reviews!
H
Hey All,
Just a reminder, as always around release time we have a very large
volume of patches show up near the deadline.
One thing that can help us maximize the number of patches we get in is
to have community involvement in performing code reviews. And in
particular, doing a thorough review and
+1
1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.1.x &
1.2.0
2.1. statistics OK
2.2. Linear/Ridge/Laso Regression
The type alias means your methods can specify either type and they will work.
It's just another name for the same type. But Scaladocs and such will show
DataFrame as the type.
Matei
> On Jan 27, 2015, at 12:10 PM, Dirceu Semighini Filho
> wrote:
>
> Reynold,
> But with type alias we will hav
Reynold,
But with type alias we will have the same problem, right?
If the methods doesn't receive schemardd anymore, we will have to change
our code to migrade from schema to dataframe. Unless we have an implicit
conversion between DataFrame and SchemaRDD
2015-01-27 17:18 GMT-02:00 Reynold Xin :
It has been pretty evident for some time that's what it is, hasn't it?
Yes that's a better name IMO.
On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin wrote:
> Hi,
>
> We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to
> get the community's opinion.
>
> The context is that Sche
thats great. guess i was looking at a somewhat stale master branch...
On Tue, Jan 27, 2015 at 2:19 PM, Reynold Xin wrote:
> Koert,
>
> As Mark said, I have already refactored the API so that nothing is
> catalyst is exposed (and users won't need them anyway). Data types, Row
> interfaces are bot
Koert,
As Mark said, I have already refactored the API so that nothing is catalyst
is exposed (and users won't need them anyway). Data types, Row interfaces
are both outside catalyst package and in org.apache.spark.sql.
On Tue, Jan 27, 2015 at 9:08 AM, Koert Kuipers wrote:
> hey matei,
> i thin
Dirceu,
That is not possible because one cannot overload return types.
SQLContext.parquetFile (and many other methods) needs to return some type,
and that type cannot be both SchemaRDD and DataFrame.
In 1.3, we will create a type alias for DataFrame called SchemaRDD to not
break source compatibi
Okay - we've resolved all issues with the signatures and keys.
However, I'll leave the current vote open for a bit to solicit
additional feedback.
On Tue, Jan 27, 2015 at 10:43 AM, Sean McNamara
wrote:
> Sounds good, that makes sense.
>
> Cheers,
>
> Sean
>
>> On Jan 27, 2015, at 11:35 AM, Patric
Sounds good, that makes sense.
Cheers,
Sean
> On Jan 27, 2015, at 11:35 AM, Patrick Wendell wrote:
>
> Hey Sean,
>
> Right now we don't publish every 2.11 binary to avoid combinatorial
> explosion of the number of build artifacts we publish (there are other
> parameters such as whether hive i
Hey Sean,
Right now we don't publish every 2.11 binary to avoid combinatorial
explosion of the number of build artifacts we publish (there are other
parameters such as whether hive is included, etc). We can revisit this
in future feature releases, but .1 releases like this are reserved for
bug fix
We’re using spark on scala 2.11 /w hadoop2.4. Would it be practical / make
sense to build a bin version of spark against scala 2.11 for versions other
than just hadoop1 at this time?
Cheers,
Sean
> On Jan 27, 2015, at 12:04 AM, Patrick Wendell wrote:
>
> Please vote on releasing the follow
Yes - the key issue is just due to me creating new keys this time
around. Anyways let's take another stab at this. In the mean time,
please don't hesitate to test the release itself.
- Patrick
On Tue, Jan 27, 2015 at 10:00 AM, Sean Owen wrote:
> Got it. Ignore the SHA512 issue since these aren't
Got it. Ignore the SHA512 issue since these aren't somehow expected by
a policy or Maven to be in a certain format. Just wondered if the
difference was intended.
The Maven way of generated the SHA1 hashes is to set this on the
install plugin, AFAIK, although I'm not sure if the intent was to hash
I personally have no preference DataFrame vs. DataTable, but only wish to lay
out the history and etymology simply because I'm into that sort of thing.
"Frame" comes from Marvin Minsky's 1970's AI construct: "slots" and the data
that go in them. The S programming language (precursor to R) adopte
Hey Sean,
The release script generates hashes in two places (take a look a bit
further down in the script), one for the published artifacts and the
other for the binaries. In the case of the binaries we use SHA512
because, AFAIK, the ASF does not require you to use SHA1 and SHA512 is
better. In th
I am running into this issue as well, when storing large Arrays as the
value in a kv pair
and then doing a reducebykey.
Can one of the experts please comment if it would make sense to add an
operation to
add values in place like accumulators do - this would essentially merge the
vectors for
a given
In master, Reynold has already taken care of moving Row
into org.apache.spark.sql; so, even though the implementation of Row (and
GenericRow et al.) is in Catalyst (which is more optimizer than parser),
that needn't be of concern to users of the API in its most recent state.
On Tue, Jan 27, 2015 a
Hi,
I’m trying to run the Spark test suite on an EC2 instance, but I can’t get
Yarn tests to pass. The hostname I get on that machine is not resolvable,
but adding a line in /etc/hosts makes the other tests pass, except for Yarn
tests.
Any help is greatly appreciated!
thanks,
iulian
ubuntu@ip-1
hey matei,
i think that stuff such as SchemaRDD, columar storage and perhaps also
query planning can be re-used by many systems that do analysis on
structured data. i can imagine panda-like systems, but also datalog or
scalding-like (which we use at tresata and i might rebase on SchemaRDD at
some p
I'm +1 on this, although a little worried about unknowingly introducing
SparkSQL dependencies every time someone wants to use this. It would be
great if the interface can be abstract and the implementation (in this
case, SparkSQL backend) could be swapped out.
One alternative suggestion on the nam
Hmm... Scaler and Scalar are very close together both in terms of
pronunciation and spelling - and I wouldn't want to create confusion
between the two. Further - this operation (elementwise multiplication by a
static vector) is general enough that maybe it should have a more general
name?
On Tue,
60m-vector costs 480MB memory. You have 12 of them to be reduced to the
driver. So you need ~6GB memory not counting the temp vectors generated
from '_+_'. You need to increase driver memory to make it work. That being
said, ~10^7 hits the limit for the current impl of glm. -Xiangrui
On Jan 23, 201
I would call it Scaler. You might want to add it to the spark.ml pipieline
api. Please check the spark.ml.HashingTF implementation. Note that this
should handle sparse vectors efficiently.
Hadamard and FFTs are quite useful. If you are intetested, make sure that
we call an FFT libary that is licen
Can't the SchemaRDD remain the same, but deprecated, and be removed in the
release 1.5(+/- 1) for example, and the new code been added to DataFrame?
With this, we don't impact in existing code for the next few releases.
2015-01-27 0:02 GMT-02:00 Kushal Datta :
> I want to address the issue tha
I think there are several signing / hash issues that should be fixed
before this release.
Hashes:
http://issues.apache.org/jira/browse/SPARK-5308
https://github.com/apache/spark/pull/4161
The hashes here are correct, but have two issues:
As noted in the JIRA, the format of the hash file is "non
You certainly do not need yo build Spark as root. It might clumsily
overcome a permissions problem in your local env but probably causes other
problems.
On Jan 27, 2015 11:18 AM, "angel__" wrote:
> I had that problem when I tried to build Spark 1.2. I don't exactly know
> what
> is causing it, bu
I had that problem when I tried to build Spark 1.2. I don't exactly know what
is causing it, but I guess it might have something to do with user
permissions.
I could finally fix this by building Spark as "root" user (now I'm dealing
with another problem, but ...that's another story...)
--
View
Thanks, Andrew. That's great material.
On Mon, Jan 26, 2015 at 10:23 PM, Andrew Ash wrote:
> In addition to the references you have at the end of the presentation,
> there's a great set of practical examples based on the learnings from Qt
> posted here: http://www21.in.tum.de/~blanchet/api-desi
32 matches
Mail list logo