Hi All,
I am scratching my head against this weird behavior, where df (read from
.csv) of size ~3.4GB gets cross joined with itself and creates 50K tasks!
How to correlate input size with number of tasks in this case?
--
Regards,
Rishi Shah
Responses inline.
On Wed, Aug 28, 2019 at 8:42 AM Nick Dawes wrote:
> Thank you, TD. Couple of follow up questions please.
>
> 1) "It only keeps around the minimal intermediate state data"
>
> How do you define "minimal" here? Is there a configuration property to
> control the time or size of
1) this is not a use case, but a technical solution. Hence nobody can tell you
if it make sense or not
2) do an upsert in Cassandra. However keep in mind that the application
submitting to the Kafka topic and the one consuming from the Kafka topic need
to ensure that they process messages in
What exactly is your requirement?
Is the read before write mandatory?
Are you maintaining states in Cassandra?
Regards
Prathmesh Ranaut
https://linkedin.com/in/prathmeshranaut
> On Aug 29, 2019, at 3:35 PM, Shyam P wrote:
>
>
> thanks Aayush. For every record I need to get the data
Sorry,
I call sqoop job from above function. Can you help me to resolve this.
Thanks
On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri
wrote:
> Hi Users,
> I am launching a Sqoop job from Spark job and would like to FAIL Spark job
> if Sqoop job fails.
>
> def executeSqoopOriginal(serverName:
Hi Users,
I am launching a Sqoop job from Spark job and would like to FAIL Spark job
if Sqoop job fails.
def executeSqoopOriginal(serverName: String, schemaName: String,
username: String, password: String,
query: String, splitBy: String, fetchSize: Int,
numMappers: Int,
thanks Aayush.
For every record I need to get the data from cassandra table and
update it ? Else it may not update the existing record.
What is this datastax-spark-connector ? is that not a "Cassandra
connector library written for spark"?
If not , how to write ourselves.
Where and how to
Cassandra is upsert, you should be able to do what you need with a single
statement unless you’re looking to maintain counters.
I’m not sure if there is a Cassandra connector library written for spark
streaming because we wrote one ourselves when we wanted to do the same.
Regards
Prathmesh
Hi,
I need to do a PoC for a business use-case.
*Use case :* Need to update a record in Cassandra table if exists.
Will spark streaming support compare each record and update existing
Cassandra record ?
For each record received from kakfa topic , If I want to check and compare
each record
What is the no of part files in that big table? And what is the
distribution of request ID? Is the variance of the column is less or huge?
Because partitionBy clause will move data with same request ID to one
executor. If the data is huge it might put load on executor.
On Sun, 25 Aug 2019 at
Hi,
I am exploring the usage of Catalyst expression functions to avoid the
performance issues associated with UDFs.
One thing that I noticed is that there is a trait called CodegenFallback and
there are some Catalyst expressions in Spark that inherit from it [0].
My question is, is there a
Hey all,
I want to load a parquet containing my edges into an Graph my code so far
looks like this:
val edgesDF = spark.read.parquet("/path/to/edges/parquet/")
val edgesRDD = edgesDF.rdd
val graph = Graph.fromEdgeTuples(edgesRDD, 1)
But simply this produces an error:
[error] found :
12 matches
Mail list logo