If the df is empty , the .take would return
java.util.NoSuchElementException.
This can be done as below:
df.rdd.isEmpty
On Tue, Mar 7, 2017 at 9:33 AM, wrote:
> Dataframe.take(1) is faster.
>
>
>
> *From:* ashaita...@nz.imshealth.com [mailto:ashaita...@nz.imshealth.com]
> *Sent:* Tuesday, March
Thank you for the prompt response. But why is it faster? There is an
implementation of isEmpty for rdd:
def isEmpty(): Boolean = withScope {
partitions.length == 0 || take(1).length == 0
}
Basically, the same take(1). Is it because of limit?
Regards,
Artem Shaitarov
From: jasbir.s...@acce
Dataframe.take(1) is faster.
From: ashaita...@nz.imshealth.com [mailto:ashaita...@nz.imshealth.com]
Sent: Tuesday, March 07, 2017 9:22 AM
To: user@spark.apache.org
Subject: Check if dataframe is empty
Hello!
I am pretty sure that I am asking something which has been already asked lots
of times.
Hello!
I am pretty sure that I am asking something which has been already asked lots
of times. However, I cannot find the question in the mailing list archive.
The question is - I need to check whether dataframe is empty or not. I receive
a dataframe from 3rd party library and this dataframe ca
Hi,
How does Spark provide Hive style bucketing support in Spark 2.x?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-Spark-provide-Hive-style-bucketing-support-tp28462.html
Sent from the Apache Spark User List mailing list archive
@Eli, Thanks for the suggestion. If you do not mind can you please
elaborate approaches?
On Mon, Mar 6, 2017 at 7:29 PM, Eli Super wrote:
> Hi
>
> Try to implement binning and/or feature engineering (smart feature
> selection for example)
>
> Good luck
>
> On Mon, Mar 6, 2017 at 6:56 AM, Raju Ba
Hi folks, trying to run Spark 2.1.0 thrift server against an hsqldb file
and it seems to...hang.
I am starting thrift server with:
sbin/start-thriftserver.sh --driver-class-path ./conf/hsqldb-2.3.4.jar ,
completely local setup
hive-site.xml is like this:
hive.metastore.warehouse.d
Thank you Ankur for the quick response, really appreciate it! Making the
class serializable resolved the exception!
Best regards,
Mina
On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava wrote:
> The fix for this make your class Serializable. The reason being the
> closures you have defined in the
The fix for this make your class Serializable. The reason being the
closures you have defined in the class need to be serialized and copied
over to all executor nodes.
Hope this helps.
Thanks
Ankur
On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani wrote:
> Hi,
>
> I am trying to start with spark and
Hi,
I am trying to start with spark and get number of lines of a text file
in my mac, however I get
org.apache.spark.SparkException: Task not serializable error on
JavaRDD logData = javaCtx.textFile(file);
Please see below for the sample of code and the stackTrace.
Any idea why this error is t
I am currently working to deploy two spark applications and I want to
restrict cores and executors per application. My config is as follows:
spark.executor.cores=1
spark.driver.cores=1
spark.cores.max=1
spark.executor.instances=1
Now the issue is that with this exact configuration, one streaming
I am currently working to deploy two spark applications and I want to
restrict cores and executors per application. My config is as follows:
spark.executor.cores=1
spark.driver.cores=1
spark.cores.max=1
spark.executor.instances=1
Now the issue is that with this exact configuration, one streaming
On 6 Mar 2017, at 12:30, Nira Amit
mailto:amitn...@gmail.com>> wrote:
And it's very difficult if it's doing unexpected things.
All serialisations do unexpected things. Nobody understands them. Sorry
And by the way - I don't want the Avro details to be hidden away from me.
The whole purpose of the work I'm doing is to benchmark different
serialization tools and strategies. If I want to use Kryo serialization for
example, then I need to understand how the API works. And it's very
difficult if it
Hi Sean,
Yes, we discussed this in Jira and you suggested I take this discussion to
the mailing list, so I did.
I don't have the option to migrate the code I'm working on to Datasets at
the moment (or to Scala, as another developer suggested in the Jira
discussion), so I have to work with the the J
I think this is the same thing we already discussed extensively on your
JIRA.
The type of the key/value class argument to newAPIHadoopFile are not the
type of your custom class, but of the Writable describing encoding of keys
and values in the file. I think that's the start of part of the problem.
I tried to load a custom type from avro files into a RDD using the
newAPIHadoopFile. I started with the following naive code:
JavaPairRDD events =
sc.newAPIHadoopFile("file:/path/to/data.avro",
AvroKeyInputFormat.class, MyCustomClass.class,
NullWritable.class,
Hi
Try to implement binning and/or feature engineering (smart feature
selection for example)
Good luck
On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti wrote:
> Hi,
> I am new to Spark ML Lib. I am using FPGrowth model for finding related
> items.
>
> Number of transactions are 63K and the t
Thanks Sean. Our training MSE is really large. We definitely need better
predictor variables.
Training Mean Squared Error = 7.72E8
Thanks,
Manish
On Mon, Mar 6, 2017 at 4:45 PM, Sean Owen wrote:
> There's nothing unusual about negative values from a linear regression.
> If, generally, your pr
There's nothing unusual about negative values from a linear regression. If,
generally, your predicted values are far from your actual values, then your
model hasn't fit well. You may have a bug somewhere in your pipeline or you
may have data without much linear relationship. Most of this isn't a Sp
Hi All,
We are using a LinearRegressionModel in Scala. We are using a standard
StandardScaler to normalize the data before modelling.. the Code snippet
looks like this -
*Modellng - *
val labeledPointsRDD = tableRecords.map(row =>
{
val filtered = row.toSeq.filter({ case s: String => false case _
21 matches
Mail list logo