Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
t be counted, right? > however in my test, this one is still counted > > > > On Tue, Feb 6, 2018 at 2:05 PM, Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > >> Yes, that is correct. >> >> On Tue, Feb 6, 2018 at 4:56 PM, Jiewen Shao >>

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
Yes, that is correct. On Tue, Feb 6, 2018 at 4:56 PM, Jiewen Shao wrote: > Vishnu, thanks for the reply > so "event time" and "window end time" have nothing to do with current > system timestamp, watermark moves with the higher value of "timestamp" > f

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
tml#watermark Thanks, Vishnu On Tue, Feb 6, 2018 at 12:11 PM Jiewen Shao wrote: > sample code: > > Let's say Xyz is POJO with a field called timestamp, > > regarding code withWatermark("timestamp", "20 seconds") > > I expect the msg with timestamp 20

Re: Apache Spark - Spark Structured Streaming - Watermark usage

2018-01-31 Thread Vishnu Viswanath
applicable for aggregation only. If you are having only a map function and don't want to process it, you could do a filter based on its EventTime field, but I guess you will have to compare it with the processing time since there is no API to access Watermark by the user. -Vishnu On Fri, J

Handling skewed data

2017-04-17 Thread Vishnu Viswanath
Hello All, Does anyone know if the skew handling code mentioned in this talk https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? If so can I know where to look for more info, JIRA? Pull request? Thanks in advance. Regards, Vishnu Viswanath.

Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN
ture 3 > 1741.0) If (feature 47 <= 0.0) Predict: 1.0 Else (feature 47 > 0.0) How can I achieve the same thing using MLpipelines model. Thanks in Advance. Vishnu

Re: Installing Spark on Mac

2016-03-04 Thread Vishnu Viswanath
Installing spark on mac is similar to how you install it on Linux. I use mac and have written a blog on how to install spark here is the link : http://vishnuviswanath.com/spark_start.html Hope this helps. On Fri, Mar 4, 2016 at 2:29 PM, Simon Hafner wrote: > I'd try `brew install spark` or `ap

Re: Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Vishnu Viswanath
Thank you Ashwin. On Sun, Feb 28, 2016 at 7:19 PM, Ashwin Giridharan wrote: > Hi Vishnu, > > A partition will either be in memory or in disk. > > -Ashwin > On Feb 28, 2016 15:09, "Vishnu Viswanath" > wrote: > >> Hi All, >> >> I ha

Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Vishnu Viswanath
, or will the whole 2nd partition be kept in disk. Regards, Vishnu

Question on RDD caching

2016-02-04 Thread Vishnu Viswanath
will store the Partition in memory 4. Therefore, each node can have partitions of different RDDs in it's cache. Can someone please tell me if I am correct. Thanks and Regards, Vishnu Viswanath,

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+ > when+using+Joda+D

Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN
HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue wrote: > I am tryi

Re: custom schema in spark throwing error

2015-12-21 Thread VISHNU SUBRAMANIAN
Try this val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true) )) On Mon, Dec 21, 2015 at 8:26 AM, Divya Gehlot wrote: > >1. scala> import org.apache.spark.sql.hive.HiveContext >2. impor

Re: Spark ML Random Forest output.

2015-12-04 Thread Vishnu Viswanath
el to index) and (index to label) and use this for getting back your original label. May be there is better way to do this.. Regards, Vishnu On Fri, Dec 4, 2015 at 4:56 PM, Eugene Morozov wrote: > Hello, > > I've got an input dataset of handwritten digits and working java code

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you. On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang wrote: > You can get 1.6.0-RC1 from > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > currently, but it's not the last release version. > > 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath >

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6? Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang wrote: > You can set "handleInvalid" to "skip" which help yo

Re: General question on using StringIndexer in SparkML

2015-12-01 Thread Vishnu Viswanath
column on which I am doing StringIndexing, the test data is having values which was not there in train data. Since fit() is done only on the train data, the indexing is failing. Can you suggest me what can be done in this situation. Thanks, On Mon, Nov 30, 2015 at 12:32 AM, Vishnu Viswanath

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
spark ml doc > http://spark.apache.org/docs/latest/ml-guide.html#how-it-works > > > > On Mon, Nov 30, 2015 at 12:52 AM, Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > >> Thanks for the reply Yanbo. >> >> I understand that the model will be trained usin

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
and do the prediction. Thanks and Regards, Vishnu Viswanath On Sun, Nov 29, 2015 at 8:14 AM, Yanbo Liang wrote: > Hi Vishnu, > > The string and indexer map is generated at model training step and > used at model prediction step. > It means that the string and indexer map will n

General question on using StringIndexer in SparkML

2015-11-28 Thread Vishnu Viswanath
and A. So the StringIndexer will assign index as C 0.0 B 1.0 A 2.0 These indexes are different from what we used for modeling. So won’t this give me a wrong prediction if I use StringIndexer? ​ -- Thanks and Regards, Vishnu Viswanath, *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
at 8:28 PM, Ted Yu wrote: Vishnu: > rowNumber (deprecated, replaced with row_number) is a window function. > >* Window function: returns a sequential number starting at 1 within a > window partition. >* >* @group window_funcs >* @since 1.6.0 >*/ >

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
: Yes, thats why I thought of adding row number in both the DataFrames and join them based on row number. Is there any better way of doing this? Both DataFrames will have same number of rows always, but are not related by any column to do join. Thanks and Regards, Vishnu Viswanath ​ On Wed, Nov 25, 201

Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
.withColumn("line2",df2("line")) org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330 missing from line#2326 in operator !Project [line#2326,line#2330 AS line2#2331]; ​ Thanks and Regards, Vishnu Viswanath *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Re: how to us DataFrame.na.fill based on condition

2015-11-23 Thread Vishnu Viswanath
Thanks for the reply Davies I think replace, replaces a value with another value. But what I want to do is fill in the null value of a column.( I don't have a to_replace here ) Regards, Vishnu On Mon, Nov 23, 2015 at 1:37 PM, Davies Liu wrote: > DataFrame.replace(to_replace, value

how to us DataFrame.na.fill based on condition

2015-11-23 Thread Vishnu Viswanath
Hi Can someone tell me if there is a way I can use the fill method in DataFrameNaFunctions based on some condition. e.g., df.na.fill("value1","column1","condition1") df.na.fill("value2","column1","condition2") i want to fill nulls in column1 with values - either value 1 or value 2, based

How VectorIndexer works in Spark ML pipelines

2015-10-15 Thread VISHNU SUBRAMANIAN
2.0,252.0,253.0,35.0,73.0,252.0,252.0,253.0,35.0,31.0,211.0,252.0,253.0,35.0])] I can,t understand what is happening. I tried with simple data sets also , but similar result. Please help. Thanks, Vishnu

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
HI Vinod, Yes If you want to use a scala or python function you need the block of code. Only Hive UDF's are available permanently. Thanks, Vishnu On Wed, Jul 8, 2015 at 5:17 PM, vinod kumar wrote: > Thanks Vishnu, > > When restart the service the UDF was not accessible by my q

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
Hi, sqlContext.udf.register("udfname", functionname _) example: def square(x:Int):Int = { x * x} register udf as below sqlContext.udf.register("square",square _) Thanks, Vishnu On Wed, Jul 8, 2015 at 2:23 PM, vinod kumar wrote: > Hi Everyone, > > I am new to s

Re: used cores are less then total no. of core

2015-02-24 Thread VISHNU SUBRAMANIAN
Try adding --total-executor-cores 5 , where 5 is the number of cores. Thanks, Vishnu On Wed, Feb 25, 2015 at 11:52 AM, Somnath Pandeya < somnath_pand...@infosys.com> wrote: > Hi All, > > > > I am running a simple word count example of spark (standalone cluster) , >

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy < 2013ht12...@wilp.bits-pilani.ac.in> wrote: > Hello All, > > I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark &

Re: getting the cluster elements from kmeans run

2015-02-11 Thread VISHNU SUBRAMANIAN
You can use model.predict(point) that will help you identify the cluster center and map it to the point. rdd.map(x => (x,model.predict(x))) Thanks, Vishnu On Wed, Feb 11, 2015 at 11:06 PM, Harini Srinivasan wrote: > Hi, > > Is there a way to get the elements of each cluster a

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
thrift server. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:31 PM, Ashish Mukherjee < ashish.mukher...@gmail.com> wrote: > Thanks for your reply, Vishnu. > > I assume you are suggesting I build Hive tables and cache them in memory > and query on top of that for fast, real-time queryi

Re: Re: How can I read this avro file using spark & scala?

2015-02-11 Thread VISHNU SUBRAMANIAN
Check this link. https://github.com/databricks/spark-avro Home page for Spark-avro project. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:19 PM, Todd wrote: > Databricks provides a sample code on its website...but i can't find it for > now. > > > > > > > At 20

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Siddarth, It depends on what you are trying to solve. But the connectivity for cassandra and spark is good . The answer depends upon what exactly you are trying to solve. Thanks, Vishnu On Wed, Feb 11, 2015 at 7:47 PM, Siddharth Ubale < siddharth.ub...@syncoms.com> wrote: > Hi ,

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
thrift server. Spark exposes hive query language and allows you access its data through spark .So you can consider using HiveQL for querying . Thanks, Vishnu On Wed, Feb 11, 2015 at 4:12 PM, Ashish Mukherjee < ashish.mukher...@gmail.com> wrote: > Hi, > > I am planning to use

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

2015-02-06 Thread VISHNU SUBRAMANIAN
Can you try creating just a single spark context and then try your code. If you want to use it for streaming pass the same sparkcontext object instead of conf. Note: Instead of just replying to me , try to use reply to all so that the post is visible for the community . That way you can expect im

Re: Shuffle Dependency Casting error

2015-02-05 Thread VISHNU SUBRAMANIAN
Hi, Could you share the code snippet. Thanks, Vishnu On Thu, Feb 5, 2015 at 11:22 PM, aanilpala wrote: > Hi, I am working on a text mining project and I want to use > NaiveBayesClassifier of MLlib to classify some stream items. So, I have two > Spark contexts one of which is a

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN
You can use updateStateByKey() to perform the above operation. On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta wrote: > > Hi Sean, > > Kafka Producer is working fine. > This is related to Spark. > > How can i configure spark so that it will make sure to remember count from > the beginning. > > If

Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN
looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > Can you check permissions etc as I am able to run > r.saveAsTextFile("file:///home/cloudera/tmp/out

how to do incremental model updates using spark streaming and mllib

2014-12-25 Thread vishnu
incrementally updating , how do i do it. Thanks, Vishnu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-do-incremental-model-updates-using-spark-streaming-and-mllib-tp20862.html Sent from the Apache Spark User List mailing list archive at Nabble.com