Right, perhaps also need preserve some DAG information? I am wondering if
there is any work around this.
Sandy Ryza
Hey Michael,
Thanks for the clarification. I was actually assuming the query would fail.
Ok, so this means I will have to do the validation in an RDD transformation
feeding into the SchemaRDD.
On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust
wrote:
> As the scala doc for applySchema says, "It
As the scala doc for applySchema says, "It is important to make sure that
the structure of every [[Row]] of the provided RDD matches the provided
schema. Otherwise, there will be runtime exceptions." We don't check as
doing runtime reflection on all of the data would be very expensive. You
will o
Hello,
I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some
of the Rows in the RDD are malformed--that is, they do not conform to the
schema defined by the StructType. When running a select statement on this
SchemaRDD I would expect SparkSQL to either reject the malformed ro
Here we go: https://issues.apache.org/jira/browse/SPARK-4823
On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das
wrote:
> I added code to compute topK products for each user and topK user for each
> product in SPARK-3066..
>
> That is different than row similarity calculation as we need both user and
I added code to compute topK products for each user and topK user for each
product in SPARK-3066..
That is different than row similarity calculation as we need both user and
product factors to calculate the topK recommendations..
For (1) and (2) we are trying to answer similarUsers to given a use
I believe many apache services are/were down due to an outage.
On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas
wrote:
> Nevermind, seems to be back up now.
>
> On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> For example: https://issues.apache.org/ji
For example: https://issues.apache.org/jira/browse/SPARK-3431
Where do we report/track issues with JIRA itself being down?
Nick
Nevermind, seems to be back up now.
On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
> For example: https://issues.apache.org/jira/browse/SPARK-3431
>
> Where do we report/track issues with JIRA itself being down?
>
> Nick
>
+1
Tested on Mac OS X.
Matei
> On Dec 10, 2014, at 1:08 PM, Patrick Wendell wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 1.2.0!
>
> The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commi
It's not so cheap to compute row similarities when there are many rows, as
it amounts to computing the outer product of a matrix A (i.e. computing
AA^T, which is expensive).
There is a JIRA to track handling (1) and (2) more efficiently than
computing all pairs: https://issues.apache.org/jira/brow
Please vote on releasing the following candidate as Apache Spark version 1.2.0!
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e
The release files, including signatures, digests, etc.
This vote is closed in favor of RC2.
On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell wrote:
> Hey All,
>
> Thanks all for the continued testing!
>
> The issue I mentioned earlier SPARK-4498 was fixed earlier this week
> (hat tip to Mark Hamstra who contributed to fix).
>
> In the interim a few sm
Hi Andrew,
It looks like somehow you are including jars from the upstream Apache
Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support,
we had to modify Hive to use a different version of Kryo that was
compatible with Spark's Kryo version.
https://github.com/pwendell/hive/commit/5b
Hi,
It seems there are multiple places where we would like to compute row
similarity (accurate or approximate similarities)
Basically through RowMatrix columnSimilarities we can compute column
similarities of a tall skinny matrix
Similarly we should have an API in RowMatrix called rowSimilaritie
reminder -- this is happening friday morning @ 730am!
On Mon, Dec 1, 2014 at 5:10 PM, shane knapp wrote:
> i'll send out a reminder next week, but i wanted to give a heads up: i'll
> be bringing down the entire jenkins infrastructure for reboots and system
> updates.
>
> please let me know if t
Apologize for the format, somehow it got messed up and linefeed were removed.
Here's a reformatted version.
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run
HiveContext, it gives me
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run
HiveContext, it gives me the following error:
Caused by: java.lang.ClassNotFoundException:
com.esotericsoftware.shaded.org.objenesis.
I think that if we were able to maintain the full set of created RDDs as
well as some scheduler and block manager state, it would be enough for most
apps to recover.
On Wed, Dec 10, 2014 at 5:30 AM, Jun Feng Liu wrote:
> Well, it should not be mission impossible thinking there are so many HA
> s
Dose Spark today really leverage Tachyon linage to process data? It seems
like the application should call createDependency function in TachyonFS to
create a new linage node. But I did not find any place call that in Spark
code. Did I missed anything?
Best Regards
Jun Feng Liu
IBM China System
Well, it should not be mission impossible thinking there are so many HA
solution existing today. I would interest to know if there is any specific
difficult.
Best Regards
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing
Phone: 86-10-82452683
E-mail: liuj...@cn.ibm.com
B
Hi
Issue created https://issues.apache.org/jira/browse/SPARK-4816
Probably a maven-related question for profiles in child modules
I couldn't find a clean solution, just a workaround : modify pom.xml in
mllib module to force activation of netlib-lgpl module.
Hope a maven expert will help.
Gu
This would be plausible for specific purposes such as Spark streaming or
Spark SQL, but I don't think it is doable for general Spark driver since it
is just a normal JVM process with arbitrary program state.
On Wed, Dec 10, 2014 at 12:25 AM, Jun Feng Liu wrote:
> Do we have any high availability
Do we have any high availability support in Spark driver level? For
example, if we want spark drive can move to another node continue
execution when failure happen. I can see the RDD checkpoint can help to
serialization the status of RDD. I can image to load the check point from
another node wh
24 matches
Mail list logo