Re: Using spark package XGBoost

2016-08-14 Thread Jacek Laskowski
Hi, I've never worked with the library and speaking about sbt setup only. It appears that the project didn't release 2.11-compatible jars (only 2.10) [1] so you need to build the project yourself and uber-jar it (using sbt-assembly plugin). [1] https://spark-packages.org/package/rotationsymmetry

Re: Using spark package XGBoost

2016-08-14 Thread janardhan shetty
Any leads how to do acheive this? On Aug 12, 2016 6:33 PM, "janardhan shetty" wrote: > I tried using *sparkxgboost package *in build.sbt file but it failed. > Spark 2.0 > Scala 2.11.8 > > Error: > [warn] http://dl.bintray.com/spark-packages/maven/ > rotationsymmetry/sparkxgboost/0.2.1-s_2.10/

Re: Using spark package XGBoost

2016-08-12 Thread janardhan shetty
I tried using *sparkxgboost package *in build.sbt file but it failed. Spark 2.0 Scala 2.11.8 Error: [warn] http://dl.bintray.com/spark-packages/maven/rotationsymmetry/sparkxgboost/0.2.1-s_2.10/sparkxgboost-0.2.1-s_2.10-javadoc.jar [warn] :::

Using spark package XGBoost

2016-08-12 Thread janardhan shetty
Is there a dataframe version of XGBoost in spark-ml ?. Has anyone used sparkxgboost package ?

Re: Grid Search using Spark MLLib Pipelines

2016-08-12 Thread Adamantios Corais
del for later use? The following command throws an error: cvModel.bestModel.save("/my/path") Also, is it possible to get the error (a collection of) for each combination of parameters? I am using spark 1.6.2 import org.apache.spark.ml.Pipeline import org.apache.spar

Re: Grid Search using Spark MLLib Pipelines

2016-08-12 Thread Bryan Cutler
gt; The following command throws an error: > > cvModel.bestModel.save("/my/path") > > Also, is it possible to get the error (a collection of) for each > combination of parameters? > > I am using spark 1.6.2 > > import org.apache.spark.ml.Pi

Grid Search using Spark MLLib Pipelines

2016-08-12 Thread Adamantios Corais
) for each combination of parameters? I am using spark 1.6.2 import org.apache.spark.ml.Pipeline import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator import org.apache.spark.ml.tuning.{ParamGridBuilder , CrossValidator} va

Using Spark 2.0 inside Docker

2016-08-04 Thread mhornbech
HTTPBroadcast option has been removed. Are there any alternative means of running a Spark 2.0 cluster in Docker? Morten -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-2-0-inside-Docker-tp27475.html Sent from the Apache Spark User List mailing list

Re: What are using Spark for

2016-08-02 Thread Daniel Siegmann
Yes, you can use Spark for ETL, as well as feature engineering, training, and scoring. ~Daniel Siegmann On Tue, Aug 2, 2016 at 3:29 PM, Mich Talebzadeh wrote: > Hi, > > If I may say, if you spend sometime going through this mailing list in > this forum and see the variety of topics that users

Re: What are using Spark for

2016-08-02 Thread Mich Talebzadeh
Hi, If I may say, if you spend sometime going through this mailing list in this forum and see the variety of topics that users are discussing, then you may get plenty of ideas about Spark application in real life.. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: What are using Spark for

2016-08-02 Thread Karthik Ramakrishnan
We used Storm for ETL, now currently thinking Spark might be advantageous since some ML also is coming our way. - Karthik On Tue, Aug 2, 2016 at 1:10 PM, Rohit L wrote: > Does anyone use Spark for ETL? > > On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote: > >> Hi Rohit, >> >> You can check th

Re: What are using Spark for

2016-08-02 Thread Deepak Sharma
Yes.I am using spark for ETL and I am sure there are lot of other companies who are using spark for ETL. Thanks Deepak On 2 Aug 2016 11:40 pm, "Rohit L" wrote: > Does anyone use Spark for ETL? > > On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote: > >> Hi Ro

Re: What are using Spark for

2016-08-02 Thread Rohit L
Does anyone use Spark for ETL? On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote: > Hi Rohit, > > You can check the powered by spark page for some real usage of Spark. > > https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark > > > On Tuesday, August 2, 2016, Rohit L wrote: > >> Hi

Re: What are using Spark for

2016-08-02 Thread Sonal Goyal
Hi Rohit, You can check the powered by spark page for some real usage of Spark. https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark On Tuesday, August 2, 2016, Rohit L wrote: > Hi Everyone, > > I want to know the real world uses cases for which Spark is used and > hence ca

Re: What are using Spark for

2016-08-01 Thread Rodrick Brown
Each of our micro services logs events to Kafka topic, we then use spark to consume messages from that queue and write it into elastic search. The data from ES is used by a number of support applications graphs, monitoring, reports, dash boards for client services teams etc.. --

Re: What are using Spark for

2016-08-01 Thread Xiao Li
Hi, Rohit, The Spark summit has many interesting use cases. Hopefully, it can answer your question. https://spark-summit.org/2015/schedule/ https://spark-summit.org/2016/schedule/ Thanks, Xiao 2016-08-01 22:48 GMT-07:00 Rohit L : > Hi Everyone, > > I want to know the real world uses case

What are using Spark for

2016-08-01 Thread Rohit L
Hi Everyone, I want to know the real world uses cases for which Spark is used and hence can you please share for what purpose you are using Apache Spark in your project? -- Rohit

Re: Visualization of data analysed using spark

2016-07-31 Thread Sivakumaran S
n. > > - Rerngvit > > On 30 Jul 2016, at 21:45, Tony Lane > <mailto:tonylane@gmail.com>> wrote: > > > > I am developing my analysis application by using spark (in eclipse as the > > IDE) > > > > what is a good way to visualize the data, takin

Re: Visualization of data analysed using spark

2016-07-30 Thread Gourav Sengupta
n > your application domain. > > - Rerngvit > > On 30 Jul 2016, at 21:45, Tony Lane wrote: > > > > I am developing my analysis application by using spark (in eclipse as > the IDE) > > > > what is a good way to visualize the data, taking into consideration i > have mul

Re: Visualization of data analysed using spark

2016-07-30 Thread Rerngvit Yanggratoke
eveloping my analysis application by using spark (in eclipse as the IDE) > > what is a good way to visualize the data, taking into consideration i have > multiple files which make up my spark application. > > I have seen some notebook demo's but not sure how to use my appli

Visualization of data analysed using spark

2016-07-30 Thread Tony Lane
I am developing my analysis application by using spark (in eclipse as the IDE) what is a good way to visualize the data, taking into consideration i have multiple files which make up my spark application. I have seen some notebook demo's but not sure how to use my application with such note

[Error] : Save dataframe to csv using Spark-csv in Spark 1.6

2016-07-24 Thread Divya Gehlot
Hi, I am getting below error when I am trying to save dataframe using Spark-CSV > > final_result_df.write.format("com.databricks.spark.csv").option("header","true").save(output_path) java.lang.NoSuchMethodError: > scala.Predef$.$conform

Re: How to recommend most similar users using Spark ML

2016-07-17 Thread Karl Higley
em. >> >> I would appreciate if somebody could suggest how to limit comparison of >> the >> user to top N neighbors, or some other algorithm that would work better in >> my use case. >> >> Thanks, >> Zoran >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >

Re: How to recommend most similar users using Spark ML

2016-07-15 Thread nguyen duc Tuan
m that would work better in > my use case. > > Thanks, > Zoran > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >

How to recommend most similar users using Spark ML

2016-07-14 Thread jeremycod
omparison of the user to top N neighbors, or some other algorithm that would work better in my use case. Thanks, Zoran -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html Sent from the Apache

Re: Send real-time alert using Spark

2016-07-12 Thread Sivakumaran S
for your language to send the alert. > > Marcin > > On Tue, Jul 12, 2016 at 9:25 AM, Priya Ch <mailto:learnings.chitt...@gmail.com>> wrote: > Hi All, > > I am building Real-time Anomaly detection system where I am using k-means to > detect anomaly. Now in-order t

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
>> The key is the execution here or rather the execution engine. >>>>> >>>>> In general >>>>> >>>>> The standard MapReduce as I know reads the data from HDFS, apply >>>>> map-reduce algorithm and writes back to HDFS.

Re: Send real-time alert using Spark

2016-07-12 Thread Priya Ch
wrote: > >> Hi All, >> >> I am building Real-time Anomaly detection system where I am using >> k-means to detect anomaly. Now in-order to send alert to mobile or an email >> alert how do i send it using Spark itself ? >> >> Thanks, >> Padma CH &

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
tion. On 12 July 2016 at 03:22, ayan guha wrote: > ccHi Mich > > Thanks for showing examples, makes perfect sense. > > One question: "...I agree that on VLT (very large tables), the limitation > in available memory may be the overriding factor in using Spark"...have yo

Send real-time alert using Spark

2016-07-12 Thread Priya Ch
Hi All, I am building Real-time Anomaly detection system where I am using k-means to detect anomaly. Now in-order to send alert to mobile or an email alert how do i send it using Spark itself ? Thanks, Padma CH

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
basically MR with DAG. With Spark you get DAG + in-memory > computing. Think of it as a comparison between a classic RDBMS like Oracle > and IMDB like Oracle TimesTen with in-memory processing. > > The outcome is that Hive using Spark as execution engine is pretty > impressive. Yo

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Jörn Franke
t, compared to the > classical MapReduce algorithm. > > Now Tez is basically MR with DAG. With Spark you get DAG + in-memory > computing. Think of it as a comparison between a classic RDBMS like Oracle > and IMDB like Oracle TimesTen with in-memory processing. > > The outc

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
ith in-memory processing. The outcome is that Hive using Spark as execution engine is pretty impressive. You have the advantage of Hive CBO + In-memory computing. If you use Spark for all this (say Spark SQL) but no Hive, Spark uses its own optimizer called Catalyst that does not have CBO yet plus in

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
gt; > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Monday, July 11, 2016 11:55 PM > *To:* user ; user @spark > *Subject:* Re: Using Spark on Hive with Hive also using Spark as its > execution engine > > > > In my test I did like for like keepi

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread ayan guha
ccHi Mich Thanks for showing examples, makes perfect sense. One question: "...I agree that on VLT (very large tables), the limitation in available memory may be the overriding factor in using Spark"...have you observed any specific threshold for VLT which tilts the favor against

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
: 8 minutes 19 seconds 370 msec > OK > 1 > Time taken: 202.333 seconds, Fetched: 1 row(s) > > So in summary > > Table MR/sec Spark/sec > Parquet 239.53214.38 > ORC 202.333

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
14.38 ORC 202.33317.77 Still I would use Spark if I had a choice and I agree that on VLT (very large tables), the limitation in available memory may be the overriding factor in using Spark. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profil

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
kedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >>

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >>

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Jörn Franke
by far the biggest in terms of community interaction > Tez, typically one thread in a month > Personally started building Tez for Hive from Tez source and gave up as it > was not working. This was my own build as opposed to a distro > if Hive says you should use Spark or Tez then using Spa

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
ution) on how they managed to bring both together. > > On 23 May 2016, at 01:42, Mich Talebzadeh <mailto:mich.talebza...@gmail.com>> wrote: > >> Hi, >> >> I have done a number of extensive tests using Spark-shell with Hive DB and >> ORC tables. >

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
a month 4. Personally started building Tez for Hive from Tez source and gave up as it was not working. This was my own build as opposed to a distro 5. if Hive says you should use Spark or Tez then using Spark is a perfectly valid choice 6. If Tez & LLAP offers you a Spark (DAG

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Ashok Kumar
Hi Mich, Your recent presentation in London on this topic "Running Spark on Hive or Hive on Spark" Have you made any more interesting findings that you like to bring up? If Hive is offering both Spark and Tez in addition to MR, what stopping one not to use Spark? I still don't get why TEZ + LLAP

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread Saisai Shao
| > | >cc| > |"user @spark" > | > | >

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread sychungd
"user @spark" | | Subject| |[Spam][SMG] Re: problem running spark with yarn-client

Poor performance of using spark sql over gzipped json files

2016-06-24 Thread Shuai Lin
Hi, We have tried to use spark sql to process some gzipped json-format log files stored on S3 or HDFS. But the performance is very poor. For example, here is the code that I run over 20 gzipped files (total size of them is 4GB compressed and ~40GB when decompressed) gzfile = 's3n://my-logs-bucke

Re: problem running spark with yarn-client not using spark-submit

2016-06-24 Thread Mich Talebzadeh
Hi, Trying to run spark with yarn-client not using spark-submit here what are you using to submit the job? spark-shell, spark-sql or anything else Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Mich Talebzadeh
ss.com >> >> >> >> On 24 June 2016 at 08:14, puneet kumar >> wrote: >> >>> >>> >>> I am getting below error thrown when I submit Spark Job using Spark >>> Submit on Yarn. Need a quick help o

problem running spark with yarn-client not using spark-submit

2016-06-24 Thread sychungd
Hello guys, Trying to run spark with yarn-client not using spark-submit here but the jobs kept failed while AM launching executor. The error collected by yarn like below. Looks like some environment setting is missing? Could someone help me out with this. Thanks in advance! HY Chung Java

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Jeff Zhang
et kumar wrote: > >> >> >> I am getting below error thrown when I submit Spark Job using Spark >> Submit on Yarn. Need a quick help on what's going wrong here. >> >> 16/06/24 01:09:25 WARN AbstractLifeCycle: FAILED >

Re: Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread Mich Talebzadeh
?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 24 June 2016 at 08:14, puneet kumar wrote: > > > I am getting below error thrown when I submit Spark Job using Spark Submit > on Yarn. Need a quick help on what's going wrong here. > > 16/06/24 01:09:25 W

Error Invoking Spark on Yarn on using Spark Submit

2016-06-24 Thread puneet kumar
I am getting below error thrown when I submit Spark Job using Spark Submit on Yarn. Need a quick help on what's going wrong here. 16/06/24 01:09:25 WARN AbstractLifeCycle: FAILED org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter-791eb5d5: java.lang.IllegalStateException:

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
me spending >> >> the weekend with Spark :)) >> >> >> >> Pozdrawiam, >> >> Jacek Laskowski >> >> >> >> https://medium.com/@jaceklaskowski/ >> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark &

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Ted Yu
ark :)) > >> > >> Pozdrawiam, > >> Jacek Laskowski > >> > >> https://medium.com/@jaceklaskowski/ > >> Mastering Apache Spark http://bit.ly/mastering-apache-spark > >> Follow me at https://twitter.com/jaceklaskowski > >>

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Sat, Jun 18, 2016 at 11:53 AM, Jacek Laskowski wrote: >> > Hi, >> > >> > I'm trying to

Re: How to cause a stage to fail (using spark-shell)?

2016-06-18 Thread Burak Yavuz
laskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sat, Jun 18, 2016 at 11:53 AM, Jacek Laskowski wrote: > > Hi, > > > > I'm trying to see some stats about failing stages in w

Re: How to cause a stage to fail (using spark-shell)?

2016-06-18 Thread Jacek Laskowski
t > to "create" few failed stages. Is this possible using spark-shell at > all? Which setup of Spark/spark-shell would allow for such a scenario. > > I could write a Scala code if that's the only way to have failing stages. > > Please guide. Thanks. > > /me

How to cause a stage to fail (using spark-shell)?

2016-06-18 Thread Jacek Laskowski
Hi, I'm trying to see some stats about failing stages in web UI and want to "create" few failed stages. Is this possible using spark-shell at all? Which setup of Spark/spark-shell would allow for such a scenario. I could write a Scala code if that's the only way to have fai

Re: how to load compressed (gzip) csv file using spark-csv

2016-06-16 Thread Vamsi Krishna
Thanks. It works. On Thu, Jun 16, 2016 at 5:32 PM Hyukjin Kwon wrote: > It will 'auto-detect' the compression codec by the file extension and then > will decompress and read it correctly. > > Thanks! > > 2016-06-16 20:27 GMT+09:00 Vamsi Krishna : > >> Hi, &

Re: how to load compressed (gzip) csv file using spark-csv

2016-06-16 Thread Hyukjin Kwon
It will 'auto-detect' the compression codec by the file extension and then will decompress and read it correctly. Thanks! 2016-06-16 20:27 GMT+09:00 Vamsi Krishna : > Hi, > > I'm using Spark 1.4.1 (HDP 2.3.2). > As per the spark-csv documentation ( > https://githu

how to load compressed (gzip) csv file using spark-csv

2016-06-16 Thread Vamsi Krishna
Hi, I'm using Spark 1.4.1 (HDP 2.3.2). As per the spark-csv documentation (https://github.com/databricks/spark-csv), I see that we can write to a csv file in compressed form using the 'codec' option. But, didn't see the support for 'codec' option to read a csv f

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-15 Thread swetha kasireddy
Hi Mich, No I have not tried that. My requirement is to insert that from an hourly Spark Batch job. How is it different by trying to insert with Hive CLI or beeline? Thanks, Swetha On Tue, Jun 14, 2016 at 10:44 AM, Mich Talebzadeh wrote: > Hi Swetha, > > Have you actually tried doing this in

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Mich Talebzadeh
Hi Swetha, Have you actually tried doing this in Hive using Hive CLI or beeline? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http:/

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Mich Talebzadeh
In all probability there is no user database created in Hive Create a database yourself sql("create if not exists database test") It would be helpful if you grasp some concept of Hive databases etc? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxia

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread swetha kasireddy
t; >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>&

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Sree Eedupuganti
Hi Spark users, i am new to spark. I am trying to connect hive using SparkJavaContext. Unable to connect to the database. By executing the below code i can see only "default" database. Can anyone help me out. What i need is a sample program for Querying Hive results using SparkJavaContext. Need to

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread Bijay Pathak
dwick.com> wrote: >>> >>>> Hello, >>>> >>>> Looks like you are hitting this: >>>> https://issues.apache.org/jira/browse/HIVE-11940. >>>> >>>> Thanks, >>>> Bijay >>>> >>>>

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
gt; Thanks, >>> Bijay >>> >>> >>> >>> On Thu, Jun 9, 2016 at 9:25 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> cam you provide a code snippet of how you are populating the target >>>

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
; https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> &

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
alebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >&g

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-10 Thread Bijay Pathak
; > On 9 June 2016 at 23:43, swetha kasireddy > wrote: > >> No, I am reading the data from hdfs, transforming it , registering the >> data in a temp table using registerTempTable and then doing insert >> overwrite using Spark SQl' hiveContext. >> >> On Th

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Mich Talebzadeh
8Pw>* http://talebzadehmich.wordpress.com On 9 June 2016 at 23:43, swetha kasireddy wrote: > No, I am reading the data from hdfs, transforming it , registering the > data in a temp table using registerTempTable and then doing insert > overwrite using Spark SQl' hiveContext. >

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread swetha kasireddy
No, I am reading the data from hdfs, transforming it , registering the data in a temp table using registerTempTable and then doing insert overwrite using Spark SQl' hiveContext. On Thu, Jun 9, 2016 at 3:40 PM, Mich Talebzadeh wrote: > how are you doing the insert? from an existing table

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Mich Talebzadeh
On 9 June 2016 at 21:16, Stephen Boesch wrote: > How many workers (/cpu cores) are assigned to this job? > > 2016-06-09 13:01 GMT-07:00 SRK : > >> Hi, >> >> How to insert data into 2000 partitions(directories) of ORC/parquet at a >> time using Spark SQL? I

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread swetha kasireddy
400 cores are assigned to this job. On Thu, Jun 9, 2016 at 1:16 PM, Stephen Boesch wrote: > How many workers (/cpu cores) are assigned to this job? > > 2016-06-09 13:01 GMT-07:00 SRK : > >> Hi, >> >> How to insert data into 2000 partitions(directories) of ORC/par

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Stephen Boesch
How many workers (/cpu cores) are assigned to this job? 2016-06-09 13:01 GMT-07:00 SRK : > Hi, > > How to insert data into 2000 partitions(directories) of ORC/parquet at a > time using Spark SQL? It seems to be not performant when I try to insert > 2000 directories of Parquet/

How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread SRK
Hi, How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL? It seems to be not performant when I try to insert 2000 directories of Parquet/ORC using Spark SQL. Did anyone face this issue? Thanks! -- View this message in context: http://apache-spark

Re: Stream reading from database using spark streaming

2016-06-02 Thread Mich Talebzadeh
ok that is fine. so the source is an IMDB something like Oracle TimesTen that I have worked with before. The second source is some organised data (I assume you mean structured tabular data 1. Data is read from source one, the IMDB. The assumption is that within the batch interval that data

Re: Stream reading from database using spark streaming

2016-06-02 Thread Mich Talebzadeh
I don't understand this. How are you going to read from RDBMS database, through JDBC? How often are you going to sample the transactional tables? You may find that a JDBC connection will take longer than your sliding window length. Is this for real time analytics? Thanks Dr Mich Talebzadeh

Re: Stream reading from database using spark streaming

2016-06-02 Thread Ted Yu
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ https://spark.apache.org/docs/1.6.1/api/scala/index.html#org.apache.spark.rdd.JdbcRDD FYI On Thu, Jun 2, 2016 at 6:26 AM, Zakaria Hili wrote: > I want to use spark streaming to read data from RDBMS d

Stream reading from database using spark streaming

2016-06-02 Thread Zakaria Hili
I want to use spark streaming to read data from RDBMS database like mysql. but I don't know how to do this using JavaStreamingContext JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.milliseconds(500));DataFrame df = jssc. ?? I search in the internet but I didn't find anythi

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
sion of TEZ works as execution >>> engine with Hive. >>> >>> Vendors are divided on this (use Hive with TEZ) or use Impala instead of >>> Hive etc as I am sure you already know. >>> >>> Cheers, >>> >>> >>> >>> >

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Jörn Franke
t; in-memory capability. >>>>> >>>>> It would be interesting to see what version of TEZ works as execution >>>>> engine with Hive. >>>>> >>>>> Vendors are divided on this (use Hive with TEZ) or use Impala instead of >&g

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Ovidiu-Cristian MARCU
t;>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >>

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
cPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 29 May 2016 at 20:19, Jörn Franke wrote: >> >>> Very interesting

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> >> On 29 May 2016 at 20:19, Jörn Franke > <mailto:jornfra...@gmail.com>> wrote: >> Very in

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
ess.com > > > > On 29 May 2016 at 20:19, Jörn Franke wrote: > >> Very interesting do you plan also a test with TEZ? >> >> On 29 May 2016, at 13:40, Mich Talebzadeh >> wrote: >> >> Hi, >> >> I did another study of Hive using

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
gt; > >> On 29 May 2016 at 20:19, Jörn Franke wrote: >> Very interesting do you plan also a test with TEZ? >> >>> On 29 May 2016, at 13:40, Mich Talebzadeh wrote: >>> >>> Hi, >>> >>> I did another study of Hive using Spark engi

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
test with TEZ? > > On 29 May 2016, at 13:40, Mich Talebzadeh > wrote: > > Hi, > > I did another study of Hive using Spark engine compared to Hive with MR. > > Basically took the original table imported using Sqoop and created and > populated a new ORC table partition

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
Very interesting do you plan also a test with TEZ? > On 29 May 2016, at 13:40, Mich Talebzadeh wrote: > > Hi, > > I did another study of Hive using Spark engine compared to Hive with MR. > > Basically took the original table imported using Sqoop and created and > p

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Mich Talebzadeh
fine give me an example where you have tried to turn on async for the query using spark sql. Your actual code. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Raju Bairishetti
ny thoughts on this? >> >> In hive, it returns operation handle. This handle can be used for >> fetching the status of query. Is there any similar mechanism in spark sql? >> Looks like almost all the methods in the HiveContext are either protected >> or private. >> >

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Mich Talebzadeh
there any similar mechanism in spark sql? Looks > like almost all the methods in the HiveContext are either protected or > private. > > On Wed, May 18, 2016 at 9:03 AM, Raju Bairishetti > wrote: > >> I am using spark sql for running hive queries also. Is there any way to >&g

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Raju Bairishetti
Bairishetti wrote: > I am using spark sql for running hive queries also. Is there any way to > run hive queries in asyc mode using spark sql. > > Does it return any hive handle or if yes how to get the results from hive > handle using spark sql? > > -- > Thanks, > Raju Bair

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-24 Thread Mich Talebzadeh
seful stats. > > Did you have any benchmark for using Spark as backend engine for Hive vs > using Spark thrift server (and run spark code for hive queries)? We are > using later but it will be very useful to remove thriftserver, if we can. > > On Tue, May 24, 2016 at 9:51 A

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread ayan guha
Hi Thanks for very useful stats. Did you have any benchmark for using Spark as backend engine for Hive vs using Spark thrift server (and run spark code for hive queries)? We are using later but it will be very useful to remove thriftserver, if we can. On Tue, May 24, 2016 at 9:51 AM, Jörn

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Jörn Franke
somewhere described how to manage bringing both together. You may check also Apache Bigtop (vendor neutral distribution) on how they managed to bring both together. > On 23 May 2016, at 01:42, Mich Talebzadeh wrote: > > Hi, > > I have done a number of extensive tests using Spa

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Ashok Kumar
same cluster (libraries clash, paths, etc.) ? Most of the Spark users perform ETL, ML operations on Spark as well. So, we may have 3 Spark installations simultaneously There are two distinct points here. Using Spark as a  query engine. That is BAU and most forum members use it everyday. You

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Mich Talebzadeh
Spark as well. So, > we may have 3 Spark installations simultaneously > > There are two distinct points here. > > Using Spark as a query engine. That is BAU and most forum members use it > everyday. You run Spark with either Standalone, Yarn or Mesos as Cluster > managers. You st

<    1   2   3   4   5   6   7   8   9   10   >