from:"Gmail"

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread write2sivakumar@gmail

Hi Raymond, Your problem is to pass those 100 fields to .toDF() method?? Sent from my Samsung device Original message From: Raymond Xie Date: 31/12/2016 10:46 (GMT+08:00) To: user@spark.apache.org Subject: How to load a big csv to dataframe in Spark 1.6 Hello, I s

Unsubscribe

2016-12-07 Thread Kranthi Gmail

-- Kranthi PS: Sent from mobile, pls excuse the brevity and typos. > On Dec 7, 2016, at 8:05 PM, Siddhartha Khaitan > wrote: > >

Re: Running spark from Eclipse and then Jar

2016-12-07 Thread Gmail

Don't you need to provide your class name "JavaWordCount"? Thanks, Vasu. > On Dec 7, 2016, at 3:18 PM, im281 wrote: > > Hello, > I have a simple word count example in Java and I can run this in Eclipse > (code at the bottom) > > I then create a jar file from it and try to run it from the cmd

Re: Third party library

2016-11-26 Thread Gmail

Maybe you've already checked these out. Some basic questions that come to my mind are: 1) is this library "foolib" or "foo-C-library" available on the worker node? 2) if yes, is it accessible by the user/program (rwx)? Thanks, Vasu. > On Nov 26, 2016, at 5:08 PM, kant kodali wrote: > > If it

Re: aggregateByKey on PairRDD

2016-03-30 Thread write2sivakumar@gmail

Hi, We can use CombineByKey to achieve this. val finalRDD = tempRDD.combineByKey((x: (Any, Any)) => (x),(acc: (Any, Any), x) => (acc, x),(acc1: (Any, Any), acc2: (Any, Any)) => (acc1, acc2)) finalRDD.collect.foreach(println) (amazon,((book1, tech),(book2,tech)))(barns&noble, (book,tech))(eBa

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread DW @ Gmail

Try to reproduce what the spark-submit shell script does, setting up the class path etc. Sent from my rotary phone. > On Nov 9, 2015, at 7:07 AM, Tathagata Das wrote: > > You cannot submit from eclipse to a cluster that easily. You can run locally > (master set to local...), and it should

Re: Does feature parity exist between Scala and Python on Spark

2015-10-06 Thread DW @ Gmail

While I have a preference for Scala ( not surprising as a Typesafe person), the DataFrame API gives feature and performance parity for Python. The RDD API gives feature parity. So, use what makes you most successful for other reasons ;) Sent from my rotary phone. > On Oct 6, 2015, at 4:14 P

Re: How to list all dataframes and RDDs available in current session?

2015-08-24 Thread Dhaval Gmail

Okay but "how?" thats what I am trying to figure out 😀? Any command you would suggest? "Sent from my iPhone, plaese excuse any typos :)" > On Aug 21, 2015, at 11:45 PM, Raghavendra Pandey > wrote: > > You get the list of all the persistet rdd using spark context... >> On Aug 21, 2015 12:06 A

Re: correct Scala Imports for creating DFs from RDDs?

2015-07-14 Thread DW @ Gmail

You are mixing the 1.0.0 Spark SQL jar with Spark 1.4.0 jars in your build file Sent from my rotary phone. > On Jul 14, 2015, at 7:57 AM, ashwang168 wrote: > > Hello! > > I am currently using Spark 1.4.0, scala 2.10.4, and sbt 0.13.8 to try and > create a jar file from a scala file (attached

Re: RDD staleness

2015-05-31 Thread DW @ Gmail

There is no mechanism for keeping an RDD up to date with a changing source. However you could set up a steam that watches for changes to the directory and processes the new files or use the Hive integration in SparkSQL to run Hive queries directly. (However, old query results will still grow sta

Re: Spark streaming

2015-03-27 Thread DW @ Gmail

Show us the code. This shouldn't happen for the simple process you described Sent from my rotary phone. > On Mar 27, 2015, at 5:47 AM, jamborta wrote: > > Hi all, > > We have a workflow that pulls in data from csv files, then originally setup > up of the workflow was to parse the data as it

Re: Spark streaming alerting

2015-03-23 Thread Khanderao Kand Gmail

Akhil You are right in tour answer to what Mohit wrote. However what Mohit seems to be alluring but did not write properly might be different. Mohit You are wrong in saying "generally" streaming works in HDFS and cassandra . Streaming typically works with streaming or queing source like Kafka

Re: Spark + Kafka

2015-03-18 Thread Khanderao Kand Gmail

I have used various version of spark (1.0, 1.2.1) without any issues . Though I have not significantly used kafka with 1.3.0 , a preliminary testing revealed no issues . - khanderao > On Mar 18, 2015, at 2:38 AM, James King wrote: > > Hi All, > > Which build of Spark is best when using K

When uses SparkFiles.get("GeoIP.dat"), got exception in thread "main" java.io.FileNotFoundException

2015-02-07 Thread Gmail

Hi there, Spark version: 1.2 /home/hadoop/spark/bin/spark-submit --class com.litb.bi.CSLog2ES --master yarn --executor-memory 1G --jars /mnt/external/kafka/target/spark-streaming-kafka_2.10-1.2.0.jar,/mnt/external/kafka/target/zkclient-0.3.jar,/mnt/external/kafka/target/metrics-core-2.2.0.jar,

Re: Confusing behavior of newAPIHadoopFile

2014-07-28 Thread gmail

Yes, I can implement like: sc.textFile(“path”).reduce(_ + _).split(“!!”).filter(x => x.trim.length > 0) But the reduce operation is expensive! I tested these two methods on a 6G file, the only operation with the created RDD is take(10).foreach(println), the method using newAPIHadoopFile only ta

Re: Confusing behavior of newAPIHadoopFile

2014-07-28 Thread gmail

the value in (key, value) returned by textFile is exactly one line of the input. But what I want is the field between the two “!!”, hope this makes sense. 常铖 cheng chang Computer Science Dept. Tsinghua Univ. Mobile Phone: 13681572414 WeChat ID: cccjcl =

Re: How to load a big csv to dataframe in Spark 1.6

Unsubscribe

Re: Running spark from Eclipse and then Jar

Re: Third party library

Re: aggregateByKey on PairRDD

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

Re: Does feature parity exist between Scala and Python on Spark

Re: How to list all dataframes and RDDs available in current session?

Re: correct Scala Imports for creating DFs from RDDs?

Re: RDD staleness

Re: Spark streaming

Re: Spark streaming alerting

Re: Spark + Kafka

When uses SparkFiles.get("GeoIP.dat"), got exception in thread "main" java.io.FileNotFoundException

Re: Confusing behavior of newAPIHadoopFile

Re: Confusing behavior of newAPIHadoopFile

16 matches

Site Navigation

Mail list logo

Footer information