Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-05 Thread Felix Cheung
Congrats and thanks! From: Hyukjin Kwon Sent: Wednesday, March 3, 2021 4:09:23 PM To: Dongjoon Hyun Cc: Gabor Somogyi ; Jungtaek Lim ; angers zhu ; Wenchen Fan ; Kent Yao ; Takeshi Yamamuro ; dev ; user @spark Subject: Re: [ANNOUNCE] Announcing Apache Spark

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP has

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung
Congrats From: Jungtaek Lim Sent: Thursday, June 18, 2020 8:18:54 PM To: Hyukjin Kwon Cc: Mridul Muralidharan ; Reynold Xin ; dev ; user Subject: Re: [ANNOUNCE] Apache Spark 3.0.0 Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19, 20

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung
Maybe it’s the reverse - the package is built to run in latest but not compatible with slightly older (3.5.2 was Dec 2018) From: Jeff Zhang Sent: Thursday, December 26, 2019 5:36:50 PM To: Felix Cheung Cc: user.spark Subject: Re: Fail to use SparkR of 3.0

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung
It looks like a change in the method signature in R base packages. Which version of R are you running on? From: Jeff Zhang Sent: Thursday, December 26, 2019 12:46:12 AM To: user.spark Subject: Fail to use SparkR of 3.0 preview 2 I tried SparkR of spark 3.0 prev

Re: SparkR integration with Hive 3 spark-r

2019-11-24 Thread Felix Cheung
I think you will get more answer if you ask without SparkR. You question is independent on SparkR. Spark support for Hive 3.x (3.1.2) was added here https://github.com/apache/spark/commit/1b404b9b9928144e9f527ac7b1caa15f932c2649 You should be able to connect Spark to Hive metastore.

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung
That’s great! From: ☼ R Nair Sent: Saturday, August 24, 2019 10:57:31 AM To: Dongjoon Hyun Cc: d...@spark.apache.org ; user @spark/'user @spark'/spark users/user@spark Subject: Re: JDK11 Support in Apache Spark Finally!!! Congrats On Sat, Aug 24, 2019, 11:11

Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-16 Thread Felix Cheung
Not currently in Spark. However, there are systems out there that can share DataFrame between languages on top of Spark - it’s not calling the python UDF directly but you can pass the DataFrame to python and then .map(UDF) that way. From: Fiske, Danny Sent: Mo

Re: Spark SQL in R?

2019-06-08 Thread Felix Cheung
I don’t think you should get a hive-xml from the internet. It should have connection information about a running hive metastore - if you don’t have a hive metastore service as you are running locally (from a laptop?) then you don’t really need it. You can get spark to work with it’s own.

Re: sparksql in sparkR?

2019-06-07 Thread Felix Cheung
This seem to be more a question of spark-sql shell? I may suggest you change the email title to get more attention. From: ya Sent: Wednesday, June 5, 2019 11:48:17 PM To: user@spark.apache.org Subject: sparksql in sparkR? Dear list, I am trying to use sparksql

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
. From: shane knapp Sent: Friday, May 31, 2019 7:38:10 PM To: Denny Lee Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user Subject: Re: Should python-2 be supported in Spark 3.0? +1000 ;) On

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website > Spark website and state that Python 2 is deprecated in Spark 3.0 I suspect people will then ask when is Spark 3.0 coming out then. Might need to provide some clarity on that. From: Reynold Xin Sent: Thur

Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; user@spark.apache.org Subject: Re: Static parti

Re: ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-14 Thread Felix Cheung
And a plug for the Graph Processing track - A discussion of comparison talk between the various Spark options (GraphX, GraphFrames, CAPS), or the ongoing work with SPARK-25994 Property Graphs, Cypher Queries, and Algorithms Would be great! From: Felix Cheung

ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community! As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! This is an important milestone as we celebrate 20 years of ASF. We have tracks like Big Data and Machine Learning among many others. Please submit your talks/thoughts/challenges/learnings here: https

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Felix Cheung
If anyone wants to improve docs please create a PR. lol But seriously you might want to explore other projects that manage job submission on top of spark instead of rolling your own with spark-submit. From: Pat Ferrel Sent: Tuesday, March 26, 2019 2:38 PM To:

Re: Spark - Hadoop custom filesystem service loading

2019-03-23 Thread Felix Cheung
Hmm thanks. Do you have a proposed solution? From: Jhon Anderson Cardenas Diaz Sent: Monday, March 18, 2019 1:24 PM To: user Subject: Spark - Hadoop custom filesystem service loading Hi everyone, On spark 2.2.0, if you wanted to create a custom file system impl

Re: Spark-hive integration on HDInsight

2019-02-21 Thread Felix Cheung
You should check with HDInsight support From: Jay Singh Sent: Wednesday, February 20, 2019 11:43:23 PM To: User Subject: Spark-hive integration on HDInsight I am trying to integrate spark with hive on HDInsight spark cluster . I copied hive-site.xml in spark/co

Re: SparkR + binary type + how to get value

2019-02-19 Thread Felix Cheung
: From: Thijs Haarhuis Sent: Tuesday, February 19, 2019 5:28 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Thanks. I got it working now by using the unlist function. I have another question, maybe you can help me with, since I did

Re: SparkR + binary type + how to get value

2019-02-17 Thread Felix Cheung
: Thijs Haarhuis Sent: Thursday, February 14, 2019 4:01 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Sure.. I have the following code: printSchema(results) cat("\n\n\n") firstRow <- first(results

Re: SparkR + binary type + how to get value

2019-02-13 Thread Felix Cheung
Please share your code From: Thijs Haarhuis Sent: Wednesday, February 13, 2019 6:09 AM To: user@spark.apache.org Subject: SparkR + binary type + how to get value Hi all, Does anybody have any experience in accessing the data from a column which has a binary ty

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-10 Thread Felix Cheung
And it might not work completely. Spark only officially supports JDK 8. I’m not sure if JDK 9 and + support is complete? From: Jungtaek Lim Sent: Thursday, February 7, 2019 5:22 AM To: Gabor Somogyi Cc: Hande, Ranjit Dilip (Ranjit); user@spark.apache.org Subject

Re: I have trained a ML model, now what?

2019-01-23 Thread Felix Cheung
Please comment in the JIRA/SPIP if you are interested! We can see the community support for a proposal like this. From: Pola Yao Sent: Wednesday, January 23, 2019 8:01 AM To: Riccardo Ferrari Cc: Felix Cheung; User Subject: Re: I have trained a ML model, now

Re: I have trained a ML model, now what?

2019-01-22 Thread Felix Cheung
About deployment/serving SPIP https://issues.apache.org/jira/browse/SPARK-26247 From: Riccardo Ferrari Sent: Tuesday, January 22, 2019 8:07 AM To: User Subject: I have trained a ML model, now what? Hi list! I am writing here to here about your experience on pu

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dataframe

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
From: Li Gao Sent: Saturday, January 19, 2019 8:43 AM To: Felix Cheung Cc: Serega Sheypak; user Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? on yarn it is impossible afaik. on kubernetes you can use taints to

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall... From: Serega Sheypak Sent: Friday, January 18, 2019 3:21 PM To: user Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? Hi, is there any possibility to tell Scheduler to blacklist specific node

Re: spark2.4 arrow enabled true,error log not returned

2019-01-12 Thread Felix Cheung
Do you mean you run the same code on yarn and standalone? Can you check if they are running the same python versions? From: Bryan Cutler Sent: Thursday, January 10, 2019 5:29 PM To: libinsong1...@gmail.com Cc: zlist Spark Subject: Re: spark2.4 arrow enabled true

Re: SparkR issue

2018-10-14 Thread Felix Cheung
1 seems like its spending a lot of time in R (slicing the data I guess?) and not with Spark 2 could you write it into a csv file locally and then read it from Spark? From: ayan guha Sent: Monday, October 8, 2018 11:21 PM To: user Subject: SparkR issue Hi We ar

Re: can Spark 2.4 work on JDK 11?

2018-09-29 Thread Felix Cheung
Not officially. We have seen problem with JDK 10 as well. It will be great if you or someone would like to contribute to get it to work.. From: kant kodali Sent: Tuesday, September 25, 2018 2:31 PM To: user @spark Subject: can Spark 2.4 work on JDK 11? Hi All,

Re: spark.lapply

2018-09-26 Thread Felix Cheung
It looks like the native R process is terminated from buffer overflow. Do you know how much data is involved? From: Junior Alvarez Sent: Wednesday, September 26, 2018 7:33 AM To: user@spark.apache.org Subject: spark.lapply Hi! I’m using spark.lapply() in spark

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Felix Cheung
I don’t think we should remove any API even in a major release without deprecating it first... From: Mark Hamstra Sent: Sunday, September 16, 2018 12:26 PM To: Erik Erlandson Cc: user@spark.apache.org; dev Subject: Re: Should python-2 be supported in Spark 3.0?

Re: Spark 2.3.1 not working on Java 10

2018-06-21 Thread Felix Cheung
I'm not sure we have completed support for Java 10 From: Rahul Agrawal Sent: Thursday, June 21, 2018 7:22:42 AM To: user@spark.apache.org Subject: Spark 2.3.1 not working on Java 10 Dear Team, I have installed Java 10, Scala 2.12.6 and spark 2.3.1 in my desktop

Re: all calculations finished, but "VCores Used" value remains at its max

2018-05-01 Thread Felix Cheung
Zeppelin keeps the Spark job alive. This is likely a better question for the Zeppelin project. From: Valery Khamenya Sent: Tuesday, May 1, 2018 4:30:24 AM To: user@spark.apache.org Subject: all calculations finished, but "VCores Used" value remains at its max Hi

Re: Problem running Kubernetes example v2.2.0-kubernetes-0.5.0

2018-04-22 Thread Felix Cheung
You might want to check with the spark-on-k8s Or try using kubernetes from the official spark 2.3.0 release. (Yes we don't have an official docker image though but you can build with the script) From: Rico Bergmann Sent: Wednesday, April 11, 2018 11:02:38 PM To:

Re: [Structured Streaming Query] Calculate Running Avg from Kafka feed using SQL query

2018-04-06 Thread Felix Cheung
Instead of write to console you need to write to memory for it to be queryable .format("memory") .queryName("tableName") https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks From: Aakash Basu Sent: Friday, April 6, 20

Re: [Spark R]: Linear Mixed-Effects Models in Spark R

2018-03-26 Thread Felix Cheung
If your data can be split into groups and you can call into your favorite R package on each group of data (in parallel): https://spark.apache.org/docs/latest/sparkr.html#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect _

Re: Custom metrics sink

2018-03-16 Thread Felix Cheung
There is a proposal to expose them. See SPARK-14151 From: Christopher Piggott Sent: Friday, March 16, 2018 1:09:38 PM To: user@spark.apache.org Subject: Custom metrics sink Just for fun, i want to make a stupid program that makes different frequency chimes as ea

Re: How to start practicing Python Spark Streaming in Linux?

2018-03-14 Thread Felix Cheung
It’s best to start with Structured Streaming https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#tab_python_0 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#tab_python_0 _ From: Aakash Basu Sent: Wednesda

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung
For pyspark specifically IMO should be very high on the list to port back... As for roadmap - should be sharing more soon. From: lucas.g...@gmail.com Sent: Friday, March 2, 2018 9:41:46 PM To: user@spark.apache.org Cc: Felix Cheung Subject: Re: Question on Spark

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung
That's in the plan. We should be sharing a bit more about the roadmap in future releases shortly. In the mean time this is in the official documentation on what is coming: https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work This supports started as a fork of the Apache Sp

Re: Spark on K8s - using files fetched by init-container?

2018-02-27 Thread Felix Cheung
Yes you were pointing to HDFS on a loopback address... From: Jenna Hoole Sent: Monday, February 26, 2018 1:11:35 PM To: Yinan Li; user@spark.apache.org Subject: Re: Spark on K8s - using files fetched by init-container? Oh, duh. I completely forgot that file:// is

Re: [graphframes]how Graphframes Deal With BidirectionalRelationships

2018-02-20 Thread Felix Cheung
No it does not support bi directional edges as of now. _ From: xiaobo Sent: Tuesday, February 20, 2018 4:35 AM Subject: Re: [graphframes]how Graphframes Deal With BidirectionalRelationships To: Felix Cheung , So the question comes to does graphframes support

Re: [graphframes]how Graphframes Deal With Bidirectional Relationships

2018-02-19 Thread Felix Cheung
Generally that would be the approach. But since you have effectively double the number of edges this will likely affect the scale your job will run. From: xiaobo Sent: Monday, February 19, 2018 3:22:02 AM To: user@spark.apache.org Subject: [graphframes]how Graphf

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Felix Cheung
Hi - I’m maintaining it. As of now there is an issue with 2.2 that breaks personalized page rank, and that’s largely the reason there isn’t a release for 2.2 support. There are attempts to address this issue - if you are interested we would love for your help.

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Felix Cheung
Yes it is issue with the newer release of testthat. To workaround could you install an earlier version with devtools? will follow up for a fix. _ From: Hyukjin Kwon Sent: Wednesday, February 14, 2018 6:49 PM Subject: Re: SparkR test script issue: unable to run run-te

Re: py4j.protocol.Py4JJavaError: An error occurred while calling o794.parquet

2018-01-10 Thread Felix Cheung
java.nio.BufferUnderflowException Can you try reading the same data in Scala? From: Liana Napalkova Sent: Wednesday, January 10, 2018 12:04:00 PM To: Timur Shenkao Cc: user@spark.apache.org Subject: Re: py4j.protocol.Py4JJavaError: An error occurred while callin

Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread Felix Cheung
And Hadoop-3.x is not part of the release and sign off for 2.2.1. Maybe we could update the website to avoid any confusion with "later". From: Josh Rosen Sent: Monday, January 8, 2018 10:17:14 AM To: akshay naidu Cc: Saisai Shao; Raj Adyanthaya; spark users Subje

Re: Passing an array of more than 22 elements in a UDF

2017-12-26 Thread Felix Cheung
array of more than 22 elements in a UDF To: Felix Cheung Cc: ayan guha , user What's the privilege of using that specific version for this? Please throw some light onto it. On Mon, Dec 25, 2017 at 6:51 AM, Felix Cheung mailto:felixcheun...@hotmail.com>> wrote: Or use it with

Re: Spark 2.2.1 worker invocation

2017-12-26 Thread Felix Cheung
I think you are looking for spark.executor.extraJavaOptions https://spark.apache.org/docs/latest/configuration.html#runtime-environment From: Christopher Piggott Sent: Tuesday, December 26, 2017 8:00:56 AM To: user@spark.apache.org Subject: Spark 2.2.1 worker inv

Re: Passing an array of more than 22 elements in a UDF

2017-12-24 Thread Felix Cheung
Or use it with Scala 2.11? From: ayan guha Sent: Friday, December 22, 2017 3:15:14 AM To: Aakash Basu Cc: user Subject: Re: Passing an array of more than 22 elements in a UDF Hi I think you are in correct track. You can stuff all your param in a suitable data st

Re: [Spark R]: dapply only works for very small datasets

2017-11-28 Thread Felix Cheung
:11 AM Subject: AW: [Spark R]: dapply only works for very small datasets To: Felix Cheung , Thanks for the fast reply. I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as well as on 4 nodes with the same specifications. When I shrink the data to around 100MB, it runs in

Re: [Spark R]: dapply only works for very small datasets

2017-11-27 Thread Felix Cheung
What's the number of executor and/or number of partitions you are working with? I'm afraid most of the problem is with the serialization deserialization overhead between JVM and R... From: Kunft, Andreas Sent: Monday, November 27, 2017 10:27:33 AM To: user@spark

Re: using R with Spark

2017-09-24 Thread Felix Cheung
n/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Georg Heiler [mailto:georg.kf.hei...@gmail.com] Sent: Sunday, September 24, 2017 3:39 PM To: Felix Cheung ; Adaryl Wakefield ; user@spark.apache.org Subject: Re: using R with S

Re: using R with Spark

2017-09-24 Thread Felix Cheung
If you google it you will find posts or info on how to connect it to different cloud and hadoop/spark vendors. From: Georg Heiler Sent: Sunday, September 24, 2017 1:39:09 PM To: Felix Cheung; Adaryl Wakefield; user@spark.apache.org Subject: Re: using R with

Re: using R with Spark

2017-09-24 Thread Felix Cheung
Both are free to use; you can use sparklyr from the R shell without RStudio (but you probably want an IDE) From: Adaryl Wakefield Sent: Sunday, September 24, 2017 11:19:24 AM To: user@spark.apache.org Subject: using R with Spark There are two packages SparkR an

Re: graphframes on cluster

2017-09-20 Thread Felix Cheung
Could you include the code where it fails? Generally the best way to use gf is to use the --packages options with spark-submit command From: Imran Rajjad Sent: Wednesday, September 20, 2017 5:47:27 AM To: user @spark Subject: graphframes on cluster Trying to run

Re: Queries with streaming sources must be executed with writeStream.start()

2017-09-09 Thread Felix Cheung
What is newDS? If it is a Streaming Dataset/DataFrame (since you have writeStream there) then there seems to be an issue preventing toJSON to work. From: kant kodali Sent: Saturday, September 9, 2017 4:04:33 PM To: user @spark Subject: Queries with streaming sour

Re: How to convert Row to JSON in Java?

2017-09-09 Thread Felix Cheung
toJSON on Dataset/DataFrame? From: kant kodali Sent: Saturday, September 9, 2017 4:15:49 PM To: user @spark Subject: How to convert Row to JSON in Java? Hi All, How to convert Row to JSON in Java? It would be nice to have .toJson() method in the Row class. Tha

Re: sparkR 3rd library

2017-09-04 Thread Felix Cheung
Can you include the code you call spark.lapply? From: patcharee Sent: Sunday, September 3, 2017 11:46:40 PM To: spar >> user@spark.apache.org Subject: sparkR 3rd library Hi, I am using spark.lapply to execute an existing R script in standalone mode. This script

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Felix Cheung
Awesome! Congrats!! From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Wednesday, July 12, 2017 12:26:00 PM To: user@spark.apache.org Subject: With 2.2.0 PySpark is now available for pip install from PyPI :) Hi wonderful Python + Spark folks, I'm exc

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Felix Cheung
And perhaps the error message can be improved here? From: Tathagata Das Sent: Monday, June 19, 2017 8:24:01 PM To: kaniska Mandal Cc: Burak Yavuz; user Subject: Re: How save streaming aggregations on 'Structured Streams' in parquet format ? That is not the write

Re: problem initiating spark context with pyspark

2017-06-10 Thread Felix Cheung
Curtis, assuming you are running a somewhat recent windows version you would not have access to c:\tmp, in your command example winutils.exe ls -F C:\tmp\hive Try changing the path to under your user directory. Running Spark on Windows should work :) From: Curt

Re: "java.lang.IllegalStateException: There is no space for new record" in GraphFrames

2017-04-28 Thread Felix Cheung
Can you allocate more memory to the executor? Also please open issue with gf on its github From: rok Sent: Friday, April 28, 2017 1:42:33 AM To: user@spark.apache.org Subject: "java.lang.IllegalStateException: There is no space for new record" in GraphFrames Wh

Re: how to create List in pyspark

2017-04-28 Thread Felix Cheung
Why no use sql functions explode and split? Would perform and be more stable then udf From: Yanbo Liang Sent: Thursday, April 27, 2017 7:34:54 AM To: Selvam Raman Cc: user Subject: Re: how to create List in pyspark ​You can try with UDF, like the following code s

Re: Spark SQL - Global Temporary View is not behaving as expected

2017-04-22 Thread Felix Cheung
Cross session is this context is multiple spark sessions from the same spark context. Since you are running two shells, you are having different spark context. Do you have to you a temp view? Could you create a table? _ From: Hemanth Gudela mailto:hemanth.gud...@qva

Re: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ?

2017-04-21 Thread Felix Cheung
Not currently - how are you planning to use the output from word2vec? From: Radhwane Chebaane Sent: Thursday, April 20, 2017 4:30:14 AM To: user@spark.apache.org Subject: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ? Hi, I've been experimenting wi

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Felix Cheung
Interesting! From: Robert Yokota Sent: Sunday, April 2, 2017 9:40:07 AM To: user@spark.apache.org Subject: Graph Analytics on HBase with HGraphDB and Spark GraphFrames Hi, In case anyone is interested in analyzing graphs in HBase with Apache Spark GraphFrames,

Re: Getting exit code of pipe()

2017-02-12 Thread Felix Cheung
ode of pipe() To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Cool that's exactly what I was looking for! Thanks! How does one output the status into stdout? I mean, how does one capture the status output of pipe() command? On Sat, Feb 11,

Re: Getting exit code of pipe()

2017-02-11 Thread Felix Cheung
Do you want the job to fail if there is an error exit code? You could set checkCode to True spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pipe#pyspark.RDD.pipe Otherwise maybe you want to

Re: Examples in graphx

2017-01-29 Thread Felix Cheung
Which graph do you are thinking about? Here's one for neo4j https://neo4j.com/blog/neo4j-3-0-apache-spark-connector/ From: Deepak Sharma Sent: Sunday, January 29, 2017 4:28:19 AM To: spark users Subject: Examples in graphx Hi There, Are there any examples of usi

Re: Creating UUID using SparksSQL

2017-01-18 Thread Felix Cheung
spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.functions.monotonically_increasing_id ? From: Ninad Shringarpure Sent: Wednesday

Re: what does dapply actually do?

2017-01-18 Thread Felix Cheung
With Spark, the processing is performed lazily. This means nothing much is really happening until you call an "action" - an example that is collect(). Another way is to write the output in a distributed manner - see write.df() in R. With SparkR dapply() passing the data from Spark to R to proce

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
. From: Ankur Srivastava Sent: Thursday, January 5, 2017 3:45:59 PM To: Felix Cheung; d...@spark.apache.org Cc: user@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents Adding DEV mailing list to see if this is a defect with ConnectedComponent or if they can recommend any solution

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
ystem. I tried looking up how to update the default file system but could not find anything in that regard. Thanks Ankur On Thu, Jan 5, 2017 at 12:55 AM, Felix Cheung mailto:felixcheun...@hotmail.com>> wrote: >From the stack it looks to be an error from the explicit call to >had

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
day, January 4, 2017 9:23 PM Subject: Re: Spark GraphFrame ConnectedComponents To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> This is the exact trace from the driver logs Exception in thread "main" java.lang.IllegalArgumentExceptio

Re: Spark GraphFrame ConnectedComponents

2017-01-04 Thread Felix Cheung
Do you have more of the exception stack? From: Ankur Srivastava Sent: Wednesday, January 4, 2017 4:40:02 PM To: user@spark.apache.org Subject: Spark GraphFrame ConnectedComponents Hi, I am trying to use the ConnectedComponent algorithm of GraphFrames but by de

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Felix Cheung
set in the Windows tests. _ From: Md. Rezaul Karim mailto:rezaul.ka...@insight-centre.org>> Sent: Monday, January 2, 2017 7:58 AM Subject: Re: Issue with SparkR setup on RStudio To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: spark users mailto

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Felix Cheung
csv to dataframe in Spark 1.6 To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Hello Felix, I followed the instruction and ran the command: > $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 and I received the fo

Re: Spark Graphx with Database

2016-12-30 Thread Felix Cheung
You might want to check out GraphFrames - to load database data (as Spark DataFrame) and build graphs with them https://github.com/graphframes/graphframes _ From: balaji9058 mailto:kssb...@gmail.com>> Sent: Monday, December 26, 2016 9:27 PM Subject: Spark Graphx with

Re: Difference in R and Spark Output

2016-12-30 Thread Felix Cheung
Could you elaborate more on the huge difference you are seeing? From: Saroj C Sent: Friday, December 30, 2016 5:12:04 AM To: User Subject: Difference in R and Spark Output Dear All, For the attached input file, there is a huge difference between the Clusters i

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Felix Cheung
Have you tried the spark-csv package? https://spark-packages.org/package/databricks/spark-csv From: Raymond Xie Sent: Friday, December 30, 2016 6:46:11 PM To: user@spark.apache.org Subject: How to load a big csv to dataframe in Spark 1.6 Hello, I see there is

Re: Issue with SparkR setup on RStudio

2016-12-29 Thread Felix Cheung
Any reason you are setting HADOOP_HOME? >From the error it seems you are running into issue with Hive config likely >with trying to load hive-site.xml. Could you try not setting HADOOP_HOME From: Md. Rezaul Karim Sent: Thursday, December 29, 2016 10:24:57 AM To

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
There is not a GraphLoader for GraphFrames but you could load and convert from GraphX: http://graphframes.github.io/user-guide.html#graphx-to-graphframe From: zjp_j...@163.com Sent: Sunday, December 18, 2016 9:39:49 PM To: Felix Cheung; user Subject: Re: Re

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Or this is a better link: http://graphframes.github.io/quick-start.html _ From: Felix Cheung mailto:felixcheun...@hotmail.com>> Sent: Sunday, December 18, 2016 8:46 PM Subject: Re: GraphFrame not init vertices when load edges To: mailto:zjp_j...@163.com&g

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Can you clarify? Vertices should be another DataFrame as you can see in the example here: https://github.com/graphframes/graphframes/blob/master/docs/quick-start.md From: zjp_j...@163.com Sent: Sunday, December 18, 2016 6:25:50 PM To: user Subject: GraphFrame n

Re: Spark Dataframe: Save to hdfs is taking long time

2016-12-15 Thread Felix Cheung
What is the format? From: KhajaAsmath Mohammed Sent: Thursday, December 15, 2016 7:54:27 PM To: user @spark Subject: Spark Dataframe: Save to hdfs is taking long time Hi, I am using issue while saving the dataframe back to HDFS. It's taking long time to run.

Re: How to load edge with properties file useing GraphX

2016-12-15 Thread Felix Cheung
Have you checked out https://github.com/graphframes/graphframes? It might be easier to work with DataFrame. From: zjp_j...@163.com Sent: Thursday, December 15, 2016 7:23:57 PM To: user Subject: How to load edge with properties file useing GraphX Hi, I want t

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Felix Cheung
That's correct - currently GraphFrame does not compute PageRank with weighted edges. _ From: Weiwei Zhang mailto:wzhan...@dons.usfca.edu>> Sent: Thursday, December 1, 2016 2:41 PM Subject: [GraphFrame, Pyspark] Weighted Edge in PageRank To: user mailto:user@spark.apac

Re: PySpark to remote cluster

2016-11-30 Thread Felix Cheung
Spark 2.0.1 is running with a different py4j library than Spark 1.6. You will probably run into other problems mixing versions though - is there a reason you can't run Spark 1.6 on the client? _ From: Klaus Schaefers mailto:klaus.schaef...@philips.com>> Sent: Wednes

Re: How to propagate R_LIBS to sparkr executors

2016-11-17 Thread Felix Cheung
Have you tried spark.executorEnv.R_LIBS? spark.apache.org/docs/latest/configuration.html#runtime-environment _ From: Rodrick Brown mailto:rodr...@orchard-app.com>> Sent: Wednesday, November 16, 2016 1:01 PM Subject: How to propagate R_LIBS to sparkr executors To: mailt

Re: Strongly Connected Components

2016-11-10 Thread Felix Cheung
It is possible it is dead. Could you check the Spark UI to see if there is any progress? _ From: Shreya Agarwal mailto:shrey...@microsoft.com>> Sent: Thursday, November 10, 2016 12:45 AM Subject: RE: Strongly Connected Components To: mailto:user@spark.apache.org>> B

Re: Issue Running sparkR on YARN

2016-11-09 Thread Felix Cheung
It maybe the Spark executor is running as a different user and it can't see where RScript is? You might want to try putting Rscript path to PATH. Also please see this for the config property to set for the R command to use: https://spark.apache.org/docs/latest/configuration.html#sparkr __

Re: Substitute Certain Rows a data Frame using SparkR

2016-10-19 Thread Felix Cheung
It's a bit less concise but this works: > a <- as.DataFrame(cars) > head(a) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 > b <- withColumn(a, "speed", ifelse(a$speed > 15, a$speed, 3)) > head(b) speed dist 1 3 2 2 3 10 3 3 4 4 3 22 5 3 16 6 3 10 I think your example could be something

Re: SparkR execution hang on when handle a RDD which is converted from DataFrame

2016-10-13 Thread Felix Cheung
How big is the metrics_moveing_detection_cube table? On Thu, Oct 13, 2016 at 8:51 PM -0700, "Lantao Jin" mailto:jinlan...@gmail.com>> wrote: sqlContext <- sparkRHive.init(sc) sqlString<- "SELECT key_id, rtl_week_beg_dt rawdate, gmv_plan_rate_amt value FROM metrics_moveing_detection_cube " df

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
rsion, Kerberos support etc) _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 11:26 AM Subject: Re: Spark SQL Thriftserver with HBase To: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Cc: mailto:user@spark.apache.or

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
! _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 11:00 AM Subject: Re: Spark SQL Thriftserver with HBase To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Felix, My goal is to use Spark SQL JDBC Thri

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
tab_sql_10). _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 10:40 AM Subject: Re: Spark SQL Thriftserver with HBase To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Felix, The only alternative way is to create

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
HBase has released support for Spark hbase.apache.org/book.html#spark And if you search you should find several alternative approaches. On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" mailto:bbuil...@gmail.com>> wrote: Does anyone know if Spark

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Felix Cheung
ay nicely on a 1.5 standalone cluster. On Saturday, September 10, 2016, Felix Cheung mailto:felixcheun...@hotmail.com>> wrote: You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? And what distribution? On Sun, Sep 4, 2016 at 8:48 PM -0700, &

  1   2   >