Question on take function - Spark Java API

2015-08-25 Thread Pankaj Wahane
Hi community members, > Apache Spark is Fantastic and very easy to learn.. Awesome work!!! > > Question: > > I have multiple files in a folder and and the first line in each file is name > of the asset that the file belongs to. Second line is csv header row and data > starts from third row..

Re: CHAID Decision Trees

2015-08-25 Thread Jatinpreet Singh
Hi Feynman, Thanks for the information. Is there a way to depict decision tree as a visualization for large amounts of data using any other technique/library? Thanks, Jatin On Tue, Aug 25, 2015 at 11:42 PM, Feynman Liang wrote: > Nothing is in JIRA >

Re: use GraphX with Spark Streaming

2015-08-25 Thread ponkin
Hi, Sure you can. StreamingContext has property /def sparkContext: SparkContext/(see docs ). Think about DStream - main abstraction in Spark Streaming, as a sequence of RDD. Each DStream can be

Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Ted Yu
The error in #1 below was not informative. Are you able to get more detailed error message ? Thanks > On Aug 25, 2015, at 6:57 PM, Todd wrote: > > > Thanks Ted Yu. > > Following are the error message: > 1. The exception that is shown on the UI is : > Exception in thread "Thread-113" Excep

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
I think the answer is No. I only see such message on the console..and #2 is the thread stack trace。 I am thinking is that in Spark SQL Perf forks many dsdgen process to generate data when the scalafactor is increased which at last exhaust the JVM When thread exception is thrown on the console a

Re: How to access Spark UI through AWS

2015-08-25 Thread Justin Pihony
I figured it all out after this: http://apache-spark-user-list.1001560.n3.nabble.com/WebUI-on-yarn-through-ssh-tunnel-affected-by-AmIpfilter-td21540.html The short is that I needed to set SPARK_PUBLIC_DNS (not DNS_HOME) = ec2_publicdns then the YARN proxy gets in the way, so I needed to go to:

Re: CHAID Decision Trees

2015-08-25 Thread Feynman Liang
For a single decision tree, the closest I can think of is printDebugString, which gives you a text representation of the decision thresholds and paths down the tree. I don't think there's anything in MLlib for visualizing GBTs or random forests On Tue, Aug 25, 2015 at 9:20 PM, Jatinpreet Singh w

reduceByKey not working on JavaPairDStream

2015-08-25 Thread Deepesh Maheshwari
Hi, I have applied mapToPair and then a reduceByKey on a DStream to obtain a JavaPairDStream>. I have to apply a flatMapToPair and reduceByKey on the DSTream Obtained above. But i do not see any logs from reduceByKey operation. Can anyone explain why is this happening..? find My Code Below - *

Re: SparkSQL saveAsParquetFile does not preserve AVRO schema

2015-08-25 Thread storm
Note: In the code (org.apache.spark.sql.parquet.DefaultSource) I've found this: val relation = if (doInsertion) { // This is a hack. We always set nullable/containsNull/valueContainsNull to true // for the schema of a parquet data. val df = sqlContext.createDataFrame(

BlockNotFoundException when running spark word count on Tachyon

2015-08-25 Thread Todd
I am using tachyon in the spark program below,but I encounter a BlockNotFoundxception. Does someone know what's wrong and also is there guide on how to configure spark to work with Tackyon?Thanks! conf.set("spark.externalBlockStore.url", "tachyon://10.18.19.33:19998") conf.set("spark.ex

<    1   2