date:20170306

Re: Check if dataframe is empty

2017-03-06 Thread Deepak Sharma

If the df is empty , the .take would return java.util.NoSuchElementException. This can be done as below: df.rdd.isEmpty On Tue, Mar 7, 2017 at 9:33 AM, wrote: > Dataframe.take(1) is faster. > > > > *From:* ashaita...@nz.imshealth.com [mailto:ashaita...@nz.imshealth.com] > *Sent:* Tuesday, March

RE: Check if dataframe is empty

2017-03-06 Thread AShaitarov

Thank you for the prompt response. But why is it faster? There is an implementation of isEmpty for rdd: def isEmpty(): Boolean = withScope { partitions.length == 0 || take(1).length == 0 } Basically, the same take(1). Is it because of limit? Regards, Artem Shaitarov From: jasbir.s...@acce

RE: Check if dataframe is empty

2017-03-06 Thread jasbir.sing

Dataframe.take(1) is faster. From: ashaita...@nz.imshealth.com [mailto:ashaita...@nz.imshealth.com] Sent: Tuesday, March 07, 2017 9:22 AM To: user@spark.apache.org Subject: Check if dataframe is empty Hello! I am pretty sure that I am asking something which has been already asked lots of times.

Check if dataframe is empty

2017-03-06 Thread AShaitarov

Hello! I am pretty sure that I am asking something which has been already asked lots of times. However, I cannot find the question in the mailing list archive. The question is - I need to check whether dataframe is empty or not. I receive a dataframe from 3rd party library and this dataframe ca

How does Spark provide Hive style bucketing support?

2017-03-06 Thread SRK

Hi, How does Spark provide Hive style bucketing support in Spark 2.x? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-Spark-provide-Hive-style-bucketing-support-tp28462.html Sent from the Apache Spark User List mailing list archive

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Raju Bairishetti

@Eli, Thanks for the suggestion. If you do not mind can you please elaborate approaches? On Mon, Mar 6, 2017 at 7:29 PM, Eli Super wrote: > Hi > > Try to implement binning and/or feature engineering (smart feature > selection for example) > > Good luck > > On Mon, Mar 6, 2017 at 6:56 AM, Raju Ba

Trouble with Thriftserver with hsqldb (Spark 2.1.0)

2017-03-06 Thread Yana Kadiyska

Hi folks, trying to run Spark 2.1.0 thrift server against an hsqldb file and it seems to...hang. I am starting thrift server with: sbin/start-thriftserver.sh --driver-class-path ./conf/hsqldb-2.3.4.jar , completely local setup hive-site.xml is like this: hive.metastore.warehouse.d

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani

Thank you Ankur for the quick response, really appreciate it! Making the class serializable resolved the exception! Best regards, Mina On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava wrote: > The fix for this make your class Serializable. The reason being the > closures you have defined in the

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Ankur Srivastava

The fix for this make your class Serializable. The reason being the closures you have defined in the class need to be serialized and copied over to all executor nodes. Hope this helps. Thanks Ankur On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani wrote: > Hi, > > I am trying to start with spark and

org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani

Hi, I am trying to start with spark and get number of lines of a text file in my mac, however I get org.apache.spark.SparkException: Task not serializable error on JavaRDD logData = javaCtx.textFile(file); Please see below for the sample of code and the stackTrace. Any idea why this error is t

Fwd: Spark application does not work with only one core

2017-03-06 Thread Maximilien Belinga

I am currently working to deploy two spark applications and I want to restrict cores and executors per application. My config is as follows: spark.executor.cores=1 spark.driver.cores=1 spark.cores.max=1 spark.executor.instances=1 Now the issue is that with this exact configuration, one streaming

Spark application does not work with only one core

2017-03-06 Thread Maximilien Belinga

I am currently working to deploy two spark applications and I want to restrict cores and executors per application. My config is as follows: spark.executor.cores=1 spark.driver.cores=1 spark.cores.max=1 spark.executor.instances=1 Now the issue is that with this exact configuration, one streaming

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Steve Loughran

On 6 Mar 2017, at 12:30, Nira Amit mailto:amitn...@gmail.com>> wrote: And it's very difficult if it's doing unexpected things. All serialisations do unexpected things. Nobody understands them. Sorry

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira Amit

And by the way - I don't want the Avro details to be hidden away from me. The whole purpose of the work I'm doing is to benchmark different serialization tools and strategies. If I want to use Kryo serialization for example, then I need to understand how the API works. And it's very difficult if it

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira Amit

Hi Sean, Yes, we discussed this in Jira and you suggested I take this discussion to the mailing list, so I did. I don't have the option to migrate the code I'm working on to Datasets at the moment (or to Scala, as another developer suggested in the Jira discussion), so I have to work with the the J

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Sean Owen

I think this is the same thing we already discussed extensively on your JIRA. The type of the key/value class argument to newAPIHadoopFile are not the type of your custom class, but of the Writable describing encoding of keys and values in the file. I think that's the start of part of the problem.

Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira

I tried to load a custom type from avro files into a RDD using the newAPIHadoopFile. I started with the following naive code: JavaPairRDD events = sc.newAPIHadoopFile("file:/path/to/data.avro", AvroKeyInputFormat.class, MyCustomClass.class, NullWritable.class,

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Eli Super

Hi Try to implement binning and/or feature engineering (smart feature selection for example) Good luck On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti wrote: > Hi, > I am new to Spark ML Lib. I am using FPGrowth model for finding related > items. > > Number of transactions are 63K and the t

Re: LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari

Thanks Sean. Our training MSE is really large. We definitely need better predictor variables. Training Mean Squared Error = 7.72E8 Thanks, Manish On Mon, Mar 6, 2017 at 4:45 PM, Sean Owen wrote: > There's nothing unusual about negative values from a linear regression. > If, generally, your pr

Re: LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Sean Owen

There's nothing unusual about negative values from a linear regression. If, generally, your predicted values are far from your actual values, then your model hasn't fit well. You may have a bug somewhere in your pipeline or you may have data without much linear relationship. Most of this isn't a Sp

LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari

Hi All, We are using a LinearRegressionModel in Scala. We are using a standard StandardScaler to normalize the data before modelling.. the Code snippet looks like this - *Modellng - * val labeledPointsRDD = tableRecords.map(row => { val filtered = row.toSeq.filter({ case s: String => false case _

Re: Check if dataframe is empty

RE: Check if dataframe is empty

RE: Check if dataframe is empty

Check if dataframe is empty

How does Spark provide Hive style bucketing support?

Re: FPGrowth Model is taking too long to generate frequent item sets

Trouble with Thriftserver with hsqldb (Spark 2.1.0)

Re: org.apache.spark.SparkException: Task not serializable

Re: org.apache.spark.SparkException: Task not serializable

org.apache.spark.SparkException: Task not serializable

Fwd: Spark application does not work with only one core

Spark application does not work with only one core

Re: Wrong runtime type when using newAPIHadoopFile in Java

Re: Wrong runtime type when using newAPIHadoopFile in Java

Re: Wrong runtime type when using newAPIHadoopFile in Java

Re: Wrong runtime type when using newAPIHadoopFile in Java

Wrong runtime type when using newAPIHadoopFile in Java

Re: FPGrowth Model is taking too long to generate frequent item sets

Re: LinearRegressionModel - Negative Predicted Value

Re: LinearRegressionModel - Negative Predicted Value

LinearRegressionModel - Negative Predicted Value

21 matches

Site Navigation

Mail list logo

Footer information