Efficiently updating running sums only on new data

2022-10-11 Thread Greg Kopff
would appreciate it if anyone had any pointers about how to approach this sort of problem that they could share. Kind regards, — Greg

Reading parquet strips non-nullability from schema

2022-07-06 Thread Greg Kopff
e this information available, and I control my data to ensure that it meets the nullability constraints. Thanks for your time. Kindest regards, — Greg

Re: [Java 17] --add-exports required?

2022-06-23 Thread Greg Kopff
about all the —add-opens. Cheers, — Greg. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Java 17] --add-exports required?

2022-06-23 Thread Greg Kopff
jar  common-java5-3.0.0-M7.jar  common-junit3-3.0.0-M7.jar  common-junit4-3.0.0-M7.jar[DEBUG] Forking command line: /bin/sh -c cd '/Users/greg/devel/spark-java17' && '/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java' '-jar' '/Users/greg/devel/spark-java17/target/suref

[Java 17] --add-exports required?

2022-06-22 Thread Greg Kopff
a. Cheers, — Greg. [1]: https://spark.apache.org/releases/spark-release-3-3-0.html [2]: https://issues.apache.org/jira/browse/SPARK-33772 [3]: https://stackoverflow.com/questions/72230174 - To unsubscribe e-mail: use

Using netlib-java lib in fat jar issue

2016-04-24 Thread greg huang
Hi there, I have included the netlib-java lib in my fat jar, but the spark always said: 16/04/24 06:11:16 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 16/04/24 06:11:16 WARN BLAS: Failed to load implementation from:

How to remove empty strings from JavaRDD

2016-04-07 Thread greg huang
Hi All, Can someone give me a example code to get rid of the empty string in JavaRDD? I kwon there is a filter method in JavaRDD: https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/rdd/RDD.html#filter(scala.Function1) Regards, Greg

Does anyone install netlib-java on AWS EMR Spark?

2016-03-22 Thread greg huang
? Regards, Greg

Does anyone have experience processing large volume images on Spark cluster

2016-03-10 Thread greg huang
Hi All, Does anyone have experience processing large volume images on Spark cluster? Such as use the Spark to run some MapReduce tasks to distinguish some common features, for example count cars number in a satellite pictures. Regards, Greg

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Greg Hill
ownloading, but it isn't. I ended up having to create a fat JAR with all of the dependencies to get around that one. Greg - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

spark-submit problems with --packages and --deploy-mode cluster

2015-12-11 Thread Greg Hill
pass along the downloaded JARs? Here's my stderr output: https://gist.github.com/jimbobhickville/1f10b3508ef946eccb92 Thanks in advance for any suggestions. Greg

RE: Help accessing protected S3

2015-07-23 Thread Greg Anderson
...@hortonworks.com] Sent: Thursday, July 23, 2015 11:37 AM To: Ewan Leith Cc: Greg Anderson; user@spark.apache.org Subject: Re: Help accessing protected S3 On 23 Jul 2015, at 01:50, Ewan Leith ewan.le...@realitymine.com wrote: I think the standard S3 driver used in Spark from the Hadoop project

Help accessing protected S3

2015-07-22 Thread Greg Anderson
I have a protected s3 bucket that requires a certain IAM role to access. When I start my cluster using the spark-ec2 script, everything works just fine until I try to read from that part of s3. Here is the command I am using: ./spark-ec2 -k KEY -i KEY_FILE.pem

Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Greg Temchenko
Hi, This seems not fixed yet. I filed an issue in jira: https://issues.apache.org/jira/browse/SPARK-5505 Greg -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html Sent from the Apache Spark User

Running Spark on SPARC64 X+

2014-11-10 Thread Greg Jennings
! Thanks in advance! Greg

Re: SPARK_SUBMIT_CLASSPATH question

2014-10-15 Thread Greg Hill
a whole screen. Greg From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Date: Tuesday, October 14, 2014 1:57 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: SPARK_SUBMIT_CLASSPATH question It seems to me

SPARK_SUBMIT_CLASSPATH question

2014-10-14 Thread Greg Hill
dependency JAR. There must be a better way, so what am I missing? Greg

Re: Spark on YARN driver memory allocation bug?

2014-10-09 Thread Greg Hill
: Andrew Or and...@databricks.commailto:and...@databricks.com Date: Wednesday, October 8, 2014 3:25 PM To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Spark on YARN driver

Spark on YARN driver memory allocation bug?

2014-10-08 Thread Greg Hill
even in yarn-cluster mode. Shouldn't it only allocate that memory on the YARN node that is going to run the driver process, not the local client machine? Greg

Re: Spark with YARN

2014-09-24 Thread Greg Hill
Do you have YARN_CONF_DIR set in your environment to point Spark to where your yarn configs are? Greg From: Raghuveer Chanda raghuveer.cha...@gmail.commailto:raghuveer.cha...@gmail.com Date: Wednesday, September 24, 2014 12:25 PM To: u...@spark.incubator.apache.orgmailto:u

Re: clarification for some spark on yarn configuration options

2014-09-23 Thread Greg Hill
at the code you modified, I don't see any place it's picking up spark.driver.memory either. Is that a separate bug? Greg From: Andrew Or and...@databricks.commailto:and...@databricks.com Date: Monday, September 22, 2014 8:11 PM To: Nishkam Ravi nr...@cloudera.commailto:nr...@cloudera.com Cc: Greg

recommended values for spark driver memory?

2014-09-23 Thread Greg Hill
. The customer can then tweak that if they need to for their particular job. Thanks in advance. Greg

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
am I misunderstanding here? Greg From: Andrew Or and...@databricks.commailto:and...@databricks.com Date: Tuesday, September 9, 2014 5:49 PM To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
Gah, ignore me again. I was reading the logic backwards. For some reason it isn't picking up my SPARK_DRIVER_MEMORY environment variable and is using the default of 512m. Probably an environmental issue. Greg From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Date: Monday

Re: spark on yarn history server + hdfs permissions issue

2014-09-11 Thread Greg Hill
To answer my own question, in case someone else runs into this. The spark user needs to be in the same group on the namenode, and hdfs caches that information for it seems like at least an hour. Magically started working on its own. Greg From: Greg greg.h...@rackspace.commailto:greg.h

spark on yarn history server + hdfs permissions issue

2014-09-09 Thread Greg Hill
or am I missing something? 2. Is there a way to tell Spark to log with more permissive permissions so the history server can read the generated logs? Greg

Re: pyspark on yarn hdp hortonworks

2014-09-05 Thread Greg Hill
/hadoop-lzo-0.6.0.jar which is in my SPARK_CLASSPATH environment variable, but that doesn't seem to be picked up by pyspark. Any ideas? I can't find much in the way of docs on getting the environment right for pyspark. Greg From: Andrew Or and...@databricks.commailto:and...@databricks.com Date

spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
/monitoring.html Greg

Re: spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
Nevermind, PEBKAC. I had put in the wrong port in the $LOG_DIR environment variable. Greg From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Date: Wednesday, September 3, 2014 1:56 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user

Spark on YARN question

2014-09-02 Thread Greg Hill
along to YARN? Greg

Re: Spark on YARN question

2014-09-02 Thread Greg Hill
Thanks. That sounds like how I was thinking it worked. I did have to install the JARs on the slave nodes for yarn-cluster mode to work, FWIW. It's probably just whichever node ends up spawning the application master that needs it, but it wasn't passed along from spark-submit. Greg From

Bayes Net with Graphx?

2014-06-06 Thread Greg
Hi, I want to create a (very large) Bayes net using Graphx. To do so, I need to able to associate conditional probability tables with each node of the graph. Is there any way to do this? All of the examples I've seen just have the basic nodes and vertices, no associated information. thanks, Greg