Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
Hi, I registered an accumulator in driver via sparkContext.register(myCustomAccumulator, "accumulator-name"). But this accumulator is not available in task.metrics.accumulators() list. Accumulator is not visible in spark UI as well. Does spark need different configuration to make accumulator visib

How to merge fragmented IDs into one cluster if one/more IDs are shared

2017-10-05 Thread Tushar Sudake
Hello Sparkans, I want to merge following cluster / set of IDs into one if they have shared IDs. For example: uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4 uuid_3_2,uuid_3_5,uuid_3_6 uuid_3_5,uuid_3_7,uuid_3_8,uuid_3_9 into single: uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4,uuid_3_5,uuid_3_6,uuid_3_7,uuid_3_8,

Any rabbit mq connect for spark structured streaming ?

2017-10-05 Thread Darshan Pandya
-- Sincerely, Darshan

Spark ML - LogisticRegression extract words with highest weights

2017-10-05 Thread pun
I am using Spark ML's pipeline to classify text documents with the following steps: Tokenizer -> CountVectorizer -> LogisticRegression I want to be able to print the words with the highest weights. Can this be done? So far I have been able to extract the LR coefficients, but can those be tied up t

Re: How to merge fragmented IDs into one cluster if one/more IDs are shared

2017-10-05 Thread 孫澤恩
Hi there, About GraphX, i thing that the graph process is parse you data into (VertexA) - [Edge1] - (VertexB). As we see the Graph class of GraphX contains edges and vertices. Such that, in the first line of your data would be parse to uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_3 as vertices. (uuid_3_

Re: Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
Any response on this one? Thanks in advance! On Thu, Oct 5, 2017 at 1:44 PM, Tarun Kumar wrote: > Hi, I registered an accumulator in driver via > sparkContext.register(myCustomAccumulator, > "accumulator-name"). But this accumulator is not available in > task.metrics.accumulators() > list. Acc

Spark structured streaming - join static dataset with streaming dataset

2017-10-05 Thread jithinpt
I'm using Spark structured streaming to process records read from Kafka. Here's what I'm trying to achieve: (a) Each record is a Tuple2 of type (Timestamp, DeviceId). (b) I've created a static Dataset[DeviceId] which contains the set of all valid device IDs (of type DeviceId) that are expected to