Yes. Yes you can.
On Tue, Jul 17, 2018 at 11:42 AM, Sathi Chowdhury wrote:
> Hi,
> My question is about ability to integrate spark streaming with multiple
> clusters.Is it a supported use case. An example of that is that two topics
> owned by different group and they have their own kakka infra .
Just found out that I need following option while reading:
.option("basePath", "hdfs://localhost:9000/ptest/")
https://stackoverflow.com/questions/43192940/why-is-partition-key-column-missing-from-dataframe
On Tue, Jul 17, 2018 at 3:48 PM, Nirav Patel wrote:
> I created a hive table with pa
I created a hive table with parquet storage using sparkSql. Now in hive cli
when I do describe and Select I can see partition columns in both as
regular columns as well as partition column. However if I try to do same in
sparkSql (Dataframe) I don't see partition columns.
I need to do projection o
Thanks 0xF0F0F0 and Ashutosh for the pointers.
Holden,
I am trying to look into sparklingml...what am I looking for? Also which
chapter/page of your book should I look at?
Mohit.
On Sun, Jul 15, 2018 at 3:02 AM Holden Karau wrote:
> If you want to see some examples in a library shows a way to
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578?
that was reported as a performance issue, not OOMs, but its in the exact
same part of the code and the change was to reduce the memory pressure
significantly.
On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey
wrote:
> Hello.
>
> I
Hi,
My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .
Can i have two dataframes as a result of spark.readstream listening to
different kafk
this may work
val df_post= listCustomCols
.foldLeft(df_pre){(tempDF, listValue) =>
tempDF.withColumn(
listValue.name,
new Column(listValue.name.toString + funcUDF(listValue.name))
)
and outsource the renaming to an udf
or you can rename the c
Hi,My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .Can i have two
dataframes as a result of spark.readstream listening to different kaf
Hi,My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .Can i have two
dataframes as a result of spark.readstream listening to different kaf
Hi,
This is a very general question. It's hard to andswer your question without
fully understanding your business and technological needs.
You might want to watch this video:
https://www.youtube.com/watch?v=2UKSLHDH5vc&t=8s
Shmuel
On Tue, Jul 17, 2018 at 12:11 AM Gautam Singaraju <
gautam.singa
30G user data, how to get distinct users count after creating a composite key
based on company and userid?
在 2018-07-13 18:24:52,Jean Georges Perrin 写道:
Just thinking out loud… repartition by key? create a composite key based on
company and userid?
How big is your dataset?
On Jul 13, 2018, a
Hi guys,
I'm trying to profile my Spark code on cProfiler and check where more time
is taken. I found the most time taken is by some socket object, which I'm
quite clueless of, as to where it is used.
Can anyone shed some light on this?
ncallstottimepercallcumtimepercallfilename:lineno(function
12 matches
Mail list logo