You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
To: u...@spark.ap
You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
To: user@spark.ap
Chapter 6 of my book implements Dijkstra's Algorithm. The source code is
available to download for free.
https://www.manning.com/books/spark-graphx-in-action
From: Brian Wilson
To: user@spark.apache.org
Sent: Monday, October 24, 2016 7:11 AM
Subject: Shortest path with directed and
In chapter 10 of Spark GraphX In Action, we describe how to use Zeppelin with
d3.js to render graphs using d3's force-directed rendering algorithm. The
source code can be downloaded for free from
https://www.manning.com/books/spark-graphx-in-action
From: agc studio
To: user@spark.apache.
It's been reduced to a single line of code.
http://technicaltidbit.blogspot.com/2016/03/dataframedataset-swap-places-in-spark-20.html
From: Gerhard Fiedler
To: "dev@spark.apache.org"
Sent: Friday, June 3, 2016 9:01 AM
Subject: Where is DataFrame.scala in 2.0?
When I look at the
Yes, it is possible to use GraphX from Java but it requires 10x the amount of
code and involves using obscure typing and pre-defined lambda prototype
facilities. I give an example of it in my book, the source code for which can
be downloaded for free from
https://www.manning.com/books/spark-gra
At first glance, it looks like the only streaming data sources available out of
the box from the github master branch are
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
and
https://github.com/apache/spark/blob/
http://go.databricks.com/apache-spark-2.0-presented-by-databricks-co-founder-reynold-xin
From: Sourav Mazumder
To: user
Sent: Wednesday, April 20, 2016 11:07 AM
Subject: Spark 2.0 forthcoming features
Hi All,
Is there somewhere we can get idea of the upcoming features in Spark 2
As with all history, "what if"s are not scientifically testable hypotheses, but
my speculation is the energy (VCs, startups, big Internet companies,
universities) within Silicon Valley contrasted to Germany.
From: Mich Talebzadeh
To: Michael Malak ; "user @spark"
There have been commercial CEP solutions for decades, including from my
employer.
From: Mich Talebzadeh
To: Mark Hamstra
Cc: Corey Nolet ; "user @spark"
Sent: Sunday, April 17, 2016 3:48 PM
Subject: Re: Apache Flink
The problem is that the strength and wider acceptance of a typic
In terms of publication date, a paper on Nephele was published in 2009, prior
to the 2010 USENIX paper on Spark. Nephele is the execution engine of
Stratosphere, which became Flink.
From: Mark Hamstra
To: Mich Talebzadeh
Cc: Corey Nolet ; "user @spark"
Sent: Sunday, April 17, 2016 3:
I see you've been burning the midnight oil.
From: Reynold Xin
To: "dev@spark.apache.org"
Sent: Friday, April 1, 2016 1:15 AM
Subject: [discuss] using deep learning to improve Spark
Hi all,
Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
I'd like to bring your attention
Will Spark 2.0 Structured Streaming obviate some of the Druid/Spark use cases?
From: Raymond Honderdors
To: "yuzhih...@gmail.com"
Cc: "user@spark.apache.org"
Sent: Wednesday, March 23, 2016 8:43 AM
Subject: Re: Spark with Druid
I saw these but i fail to understand how to direct th
Would it make sense (in terms of feasibility, code organization, and
politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines
to a Java compatibility layer/class?
From: Reynold Xin
To: "dev@spark.apache.org"
Sent: Thursday, February 25, 2016 4:23 PM
Subject: [d
[
https://issues.apache.org/jira/browse/SPARK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990391#comment-14990391
]
Michael Malak commented on SPARK-3789:
--
My publisher tells me the MEAP for S
[
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Malak updated SPARK-11278:
--
Component/s: GraphX
> PageRank fails with unified memory mana
[
https://issues.apache.org/jira/browse/SPARK-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950679#comment-14950679
]
Michael Malak commented on SPARK-2365:
--
It's off-topic of IndexedRDD, bu
[
https://issues.apache.org/jira/browse/SPARK-10939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948596#comment-14948596
]
Michael Malak commented on SPARK-10939:
---
Here Matei explains the explicit de
[
https://issues.apache.org/jira/browse/SPARK-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Malak updated SPARK-10972:
--
Description:
Currently expressions used to .join() in DataFrames are limited to column names
Michael Malak created SPARK-10972:
-
Summary: UDFs in SQL joins
Key: SPARK-10972
URL: https://issues.apache.org/jira/browse/SPARK-10972
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909883#comment-14909883
]
Michael Malak commented on SPARK-10722:
---
I have seen this in a small Hello W
[
https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739681#comment-14739681
]
Michael Malak commented on SPARK-10489:
---
Feynman Liang:
Link https://github
Yes. And a paper that describes using grids (actually varying grids) is
http://research.microsoft.com/en-us/um/people/jingdw/pubs%5CCVPR12-GraphConstruction.pdf
In the Spark GraphX In Action book that Robin East and I are writing, we
implement a drastically simplified version of this in chapter
I would also add, from a data locality theoretic standpoint, mapPartitions()
provides for node-local computation that plain old map-reduce does not.
From my Android phone on T-Mobile. The first nationwide 4G network.
Original message
From: Ashic Mahtab
Date: 06/28/2015 10:5
http://www.datascienceassn.org/content/making-sense-making-sense-performance-data-analytics-frameworks
From: "bit1...@163.com"
To: user
Sent: Monday, April 27, 2015 8:33 PM
Subject: Why Spark is much faster than Hadoop MapReduce even on disk
#yiv1713360705 body {line-height:1.5;}
You could have your receiver send a "magic value" when it is done. I discuss
this Spark Streaming pattern in my presentation "Spark Gotchas and
Anti-Patterns". In the PDF version, it's slides
34-36.http://www.datascienceassn.org/content/2014-11-05-spark-gotchas-and-anti-patterns-julia-language
Michael Malak created SPARK-6710:
Summary: Wrong initial bias in GraphX SVDPlusPlus
Key: SPARK-6710
URL: https://issues.apache.org/jira/browse/SPARK-6710
Project: Spark
Issue Type: Bug
I believe that in the initialization portion of GraphX SVDPlusPluS, the
initialization of biases is incorrect. Specifically, in line
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
instead of
(vd._1, vd._2, msg.get._2 / msg.ge
Can my new book, Spark GraphX In Action, which is currently in MEAP
http://manning.com/malak/, be added to
https://spark.apache.org/documentation.html and, if appropriate, to
https://spark.apache.org/graphx/ ?
Michael Malak
[
https://issues.apache.org/jira/browse/SPARK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365758#comment-14365758
]
Michael Malak commented on SPARK-6388:
--
Isn't it Hadoop 2.7 that is su
Since RDDs are generally unordered, aren't things like textFile().first() not
guaranteed to return the first row (such as looking for a header row)? If so,
doesn't that make the example in
http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?
---
[
https://issues.apache.org/jira/browse/SPARK-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309459#comment-14309459
]
Michael Malak commented on SPARK-4279:
--
Is there another place where I might be
1. Is IndexedRDD planned for 1.3?
https://issues.apache.org/jira/browse/SPARK-2365
2. Once IndexedRDD is in, is it planned to convert Word2VecModel to it from its
current Map[String,Array[Float]]?
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Wo
But isn't foldLeft() overkill for the originally stated use case of max diff of
adjacent pairs? Isn't foldLeft() for recursive non-commutative non-associative
accumulation as opposed to an embarrassingly parallel operation such as this
one?
This use case reminds me of FIR filtering in DSP. It se
ose not immersed in data science
or AI and thus may have narrower appeal.
- Original Message -----
From: Evan R. Sparks
To: Matei Zaharia
Cc: Koert Kuipers ; Michael Malak ;
Patrick Wendell ; Reynold Xin ;
"dev@spark.apache.org"
Sent: Tuesday, January 27, 2015 9:55 AM
Subject: Re: renaming
And in the off chance that anyone hasn't seen it yet, the Jan. 13 Bay Area
Spark Meetup YouTube contained a wealth of background information on this idea
(mostly from Patrick and Reynold :-).
https://www.youtube.com/watch?v=YWppYPWznSQ
From: Patrick Wendell
To:
I created https://issues.apache.org/jira/browse/SPARK-5343 for this.
- Original Message -
From: Michael Malak
To: "dev@spark.apache.org"
Cc:
Sent: Monday, January 19, 2015 5:09 PM
Subject: GraphX ShortestPaths backwards?
GraphX ShortestPaths seems to be following edges
Michael Malak created SPARK-5343:
Summary: ShortestPaths traverses backwards
Key: SPARK-5343
URL: https://issues.apache.org/jira/browse/SPARK-5343
Project: Spark
Issue Type: Bug
GraphX ShortestPaths seems to be following edges backwards instead of forwards:
import org.apache.spark.graphx._
val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))),
sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,""
lib.ShortestPaths.run(g,Array(3)).vertices.collect
res1: Array[(org.apac
But wouldn't the gain be greater under something similar to EdgePartition1D
(but perhaps better load-balanced based on number of edges for each vertex) and
an algorithm that primarily follows edges in the forward direction?
From: Ankur Dave
To: Michael Malak
Cc: "dev@spark.
Does GraphX make an effort to co-locate vertices onto the same workers as the
majority (or even some) of its edges?
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache
According to:
https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting
"Note that TriangleCount requires the edges to be in canonical orientation
(srcId < dstId)"
But isn't this overstating the requirement? Isn't the requirement really that
IF there are duplicate ed
Asim Jalis writes:
>
> Thanks. Another question. I have event data with timestamps. I want to
> create a sliding window
> using timestamps. Some windows will have a lot of events in them others
> won’t. Is there a way
> to get an RDD made of this kind of a variable length window?
You should c
Thank you. I created
https://issues.apache.org/jira/browse/SPARK-5064
- Original Message -
From: xhudik
To: dev@spark.apache.org
Cc:
Sent: Saturday, January 3, 2015 2:04 PM
Subject: Re: GraphX rmatGraph hangs
Hi Michael,
yes, I can confirm the behavior.
It get stuck (loop?) and eat a
Michael Malak created SPARK-5064:
Summary: GraphX rmatGraph hangs
Key: SPARK-5064
URL: https://issues.apache.org/jira/browse/SPARK-5064
Project: Spark
Issue Type: Bug
Components
The following single line just hangs, when executed in either Spark Shell or
standalone:
org.apache.spark.graphx.util.GraphGenerators.rmatGraph(sc, 4, 8)
It just outputs "0 edges" and then locks up.
The only other information I've found via Google is:
http://mail-archives.apache.org/mod_mbox/sp
On Wednesday, October 22, 2014 9:06 AM, Sean Owen wrote:
> No, there's no such thing as an RDD of RDDs in Spark.
> Here though, why not just operate on an RDD of Lists? or a List of RDDs?
> Usually one of these two is the right approach whenever you feel
> inclined to operate on an RDD of RDDs.
Depending on the density of your keys, the alternative signature
def updateStateByKey[S](updateFunc: (Iterator[(K, Seq[V], Option[S])]) ?
Iterator[(K, S)], partitioner: Partitioner, rememberPartitioner:
Boolean)(implicit arg0: ClassTag[S]): DStream[(K, S)]
at least iterates by key rather than
It's really more of a Scala question than a Spark question, but the standard OO
(not Scala-specific) way is to create your own custom supertype (e.g.
MyCollectionTrait), inherited/implemented by two concrete classes (e.g. MyRDD
and MyArray), each of which manually forwards method calls to the co
At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would
"roughly double" in 1.1 from the current approx. 15.
http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf
What are the planned additional algorithms?
In Jira, I only see two when fil
How about a treeReduceByKey? :-)
On Friday, June 20, 2014 11:55 AM, DB Tsai wrote:
Currently, the reduce operation combines the result from mapper
sequentially, so it's O(n).
Xiangrui is working on treeReduce which is O(log(n)). Based on the
benchmark, it dramatically increase the performan
Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I
missing something fundamental?
val nodes = sc.parallelize(Array((1L, "N1"), (2L, "N2"), (3L, "N3"), (4L,
"N4"), (5L, "N5")))
val edges = sc.parallelize(Array(Edge(1L, 2L, "E1"), Edge(1L, 3L, "E2"),
Edge(2L, 4L, "E
[
https://issues.apache.org/jira/browse/SPARK-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Malak resolved SPARK-1836.
--
Resolution: Duplicate
> REPL $outer type mismatch causes lookup() and equals() probl
[
https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011492#comment-14011492
]
Michael Malak commented on SPARK-1199:
--
See also additional test cases in
h
Mohit Jaggi:
A workaround is to use zipWithIndex (to appear in Spark 1.0, but if you're
still on 0.9x you can swipe the code from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
), map it to (x => (x._2,x._1)) and then sortByKey.
Sp
[
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007565#comment-14007565
]
Michael Malak commented on SPARK-1867:
--
Thank you, sam, that fixed it for me!
[
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007238#comment-14007238
]
Michael Malak commented on SPARK-1867:
--
I, too, have run into this issue, and I
While developers may appreciate "1.0 == API stability," I'm not sure that will
be the understanding of the VP who gives the green light to a Spark-based
development effort.
I fear a bug that silently produces erroneous results will be perceived like
the FDIV bug, but in this case without the mo
[
https://issues.apache.org/jira/browse/SPARK-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998807#comment-13998807
]
Michael Malak commented on SPARK-1836:
--
Michael Ambrust: Indeed. Do you thi
Michael Malak created SPARK-1857:
Summary: map() with lookup() causes exception
Key: SPARK-1857
URL: https://issues.apache.org/jira/browse/SPARK-1857
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Malak updated SPARK-1836:
-
Description:
Anand Avati partially traced the cause to REPL wrapping classes in $outer
classes
When using map() and lookup() in conjunction, I get an exception (each
independently works fine). I'm using Spark 0.9.0/Scala 2.10.3
val a = sc.parallelize(Array(11))
val m = sc.parallelize(Array((11,21)))
a.map(m.lookup(_)(0)).collect
14/05/14 15:03:35 ERROR Executor: Exception in task ID 23
sc
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
0.9.0/Scala 2.10.3.
Is this a bug?
Spark Shell (equals uses match
Is it permissible to use a custom class (as opposed to e.g. the built-in String
or Int) for the key in groupByKey? It doesn't seem to be working for me on
Spark 0.9.0/Scala 2.10.3:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
class C(val s:String) extends Serializ
12)))
r: org.apache.spark.rdd.RDD[(C, Int)] = ParallelCollectionRDD[3] at parallelize
at :14
scala> r.lookup(new C("a"))
:17: error: type mismatch;
found : C
required: C
r.lookup(new C("a"))
^
On Tuesday, May 13, 2014 3:05 PM, Ana
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
0.9.0
Michael Malak created SPARK-1817:
Summary: RDD zip erroneous when partitions do not divide RDD count
Key: SPARK-1817
URL: https://issues.apache.org/jira/browse/SPARK-1817
Project: Spark
s the ASF Jira system will let me
reset my password.
On Sunday, May 11, 2014 4:40 AM, Michael Malak wrote:
Is this a bug?
scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala> sc.parallelize(1L to 2L,4).zip(sc.par
Is this a bug?
scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect
res1: Array[(Long, Int)] = Array((2,11))
"looks like Spark outperforms Stratosphere fairly consistently in the
experiments"
There was one exception the paper noted, which was when memory resources were
constrained. In that case, Stratosphere seemed to have degraded more gracefully
than Spark, but the author did not explore it deeper.
UTION.
From: Michael Malak
To: "dev@spark.incubator.apache.org"
Sent: Thursday, September 26, 2013 12:27 PM
Subject: Kafka not shutting down cleanly; Actor serializtion?
Tathagata:
I don't believe Kafka streams are being shut down cleanly, which im
Domingo Mihovilovic writes:
> Imagine that you are processing a stream data at high speed and needs to
>build, update,
> and access some memory data structure where the "model" is stored.
Normally this is done with updateStateByKey, which maintains an RDD behind the
sce
Tathagata:
I don't believe Kafka streams are being shut down cleanly, which implies that
the most recent Kafka offsets are not being committed back to Zookeeper, which
implies starting/restarting a Spark Streaming process would result in duplicate
events.
The simple Spark Streaming code (runn
Yup, it was the directory structure com/mystuff/whateverUDF.class that was
missing. Thought I had tried that before posting my question, but...
Thanks for your help!
From: Edward Capriolo
To: "user@hive.apache.org" ; Michael Malak
Sent: Tuesda
Thus far, I've been able to create Hive UDFs, but now I need to define them
within a Java package name (as opposed to the "default" Java package as I had
been doing), but once I do that, I'm no longer able to load them into Hive.
First off, this works:
add jar /usr/lib/hive/lib/hive-contrib-0.1
Perhaps you can first create a temp table that contains only the records that
will match? See the UNION ALL trick at
http://www.mail-archive.com/hive-user@hadoop.apache.org/msg01906.html
From: Brad Ruderman
To: user@hive.apache.org
Sent: Monday, July 29, 201
Untested:
SELECT a.c100, a.c300, b.c400
FROM t1 a
JOIN t2 b
ON a.c200 = b.c200
JOIN (SELECT DISTINCT a.c100
FROM t1 a2
JOIN t2 b2
ON a2.c200 = b2.c200
WHERE b2.c400 >= SYSDATE - 1) a3
ON a.c100 = a3.c100
WHERE b.c400 >= SYSDATE - 1
AND a.c300 =
I have found that for output larger than a few GB, redirecting stdout results
in an incomplete file. For very large output, I do CREATE TABLE MYTABLE AS
SELECT ... and then copy the resulting HDFS files directly out of
/user/hive/warehouse.
From: Bertrand De
Just copy and paste the whole long expressions to their second occurrences.
From: dyuti a
To: user@hive.apache.org
Sent: Friday, June 28, 2013 10:58 AM
Subject: Fwd: Need urgent help in hive query
Hi Experts,
I'm trying with the below SQL query in Hive, whi
ang wrote:
Thanks Michael! That worked without modification!
>
>
>
>On Sat, Jun 22, 2013 at 5:05 PM, Michael Malak wrote:
>
>Or, the single-language (HiveQL) alternative might be (i.e. I haven't tested
>it):
>>
>>select f1,
>> f2,
>>
Or, the single-language (HiveQL) alternative might be (i.e. I haven't tested
it):
select f1,
f2,
if(max(if(f3='P',f4,null)) is null,0,max(if(f3='P',f4,null))) pf4,
if(max(if(f3='P',f5,null)) is null,0,max(if(f3='P',f5,null))) pf5,
if(max(if(f3='N',f4,null)) is null,0,
thing.
From: Edward Capriolo
To: "user@hive.apache.org" ; Michael Malak
Sent: Thursday, June 20, 2013 9:15 PM
Subject: Re: INSERT non-static data to array?
i think you could select into as sub query and then use lateral view.not
exactly the same but somethin
I've created
https://issues.apache.org/jira/browse/HIVE-4771
to track this issue.
- Original Message -
From: Michael Malak
To: "user@hive.apache.org"
Cc:
Sent: Wednesday, June 19, 2013 2:35 PM
Subject: Re: INSERT non-static data to array?
The example code for inlin
Michael Malak created HIVE-4771:
---
Summary: Support subqueries in INSERT for array types
Key: HIVE-4771
URL: https://issues.apache.org/jira/browse/HIVE-4771
Project: Hive
Issue Type
c int[]);
INSERT INTO table_a
SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id)
FROM table_b
From: Edward Capriolo
To: "user@hive.apache.org" ; Michael Malak
Sent: Wednesday, June 19, 2013 2:06 PM
Subject: Re: INSERT non
Is the only way to INSERT data into a column of type array<> to load data from
a pre-existing file, to use hard-coded values in the INSERT statement, or copy
an entire array verbatim from another table? I.e. I'm assuming that a) SQL1999
array INSERT via subquery is not (yet) implemented in Hive
--- On Mon, 5/6/13, Peter Chu wrote:
> In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY
> clause.
Although MySQL allows it, it is not ANSI SQL.
http://stackoverflow.com/questions/1225144/why-does-mysql-allow-group-by-queries-without-aggregate-functions
--- On Sun, 5/5/13, Peter Chu wrote:
> I am wondering if there is any way to do this without resorting to
> using left outer join and finding nulls.
I have found this to be an acceptable substitute. Is it not working for you?
[
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582664#comment-13582664
]
Michael Malak commented on HIVE-3528:
-
As noted in the first comment from
h
[
https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582662#comment-13582662
]
Michael Malak commented on HIVE-4022:
-
Note that there is a workaround for the cas
[
https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Malak updated HIVE-4022:
Description:
Originally thought to be Avro-specific, and first noted with respect to
HIVE-3528
If no one has any objection, I'm going to update HIVE-4022, which I entered a
week ago when I thought the behavior was Avro-specific, to indicate it actually
affects even native Hive tables.
https://issues.apache.org/jira/browse/HIVE-4022
--- On Fri, 2/15/13, Michael Malak wrote:
&
It seems that all Hive columns (at least those of primitive types) are always
NULLable? What about columns of type STRUCT?
The following:
echo 1,2 >twovalues.csv
hive
CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE
Michael Malak created HIVE-4022:
---
Summary: Avro SerDe queries don't handle hard-coded nulls for
optional/nullable structs
Key: HIVE-4022
URL: https://issues.apache.org/jira/browse/HIVE-4022
Pr
[
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578783#comment-13578783
]
Michael Malak commented on HIVE-3528:
-
Sean:
OK, I've researched the proble
[
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578538#comment-13578538
]
Michael Malak commented on HIVE-3528:
-
Sean:
I mean
https://github.com/apache/
[
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578523#comment-13578523
]
Michael Malak commented on HIVE-3528:
-
I've tried the latest Avro SerDe fr
o I
would write to a different directory and then move the files over...
dean
On Wed, Feb 13, 2013 at 1:26 PM, Michael Malak wrote:
Is it possible to INSERT INTO TABLE t SELECT FROM where t has a column with a
STRUCT?
Based on
http://grokbase.com/t/hive/user/109r87hh3e/insert-data-into-a-co
[
https://issues.apache.org/jira/browse/AVRO-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573711#comment-13573711
]
Michael Malak commented on AVRO-1035:
-
ha...@cloudera.com has provided example cod
output streams to let Avro take it as
> append-able?
> I don't think its possible for Avro to carry it since Avro
> (core) does
> not reverse-depend on Hadoop. Should we document it
> somewhere though?
> Do you have any ideas on the best place to do that?
>
> On Thu,
1 - 100 of 105 matches
Mail list logo