Re: hive.contrib.serde2.RegexSerDe not found

2015-07-28 Thread Gianluca Privitera
Try use: org.apache.hadoop.hive.serde2.RegexSerDe GP On 27 Jul 2015, at 09:35, ZhuGe mailto:t...@outlook.com>> wrote: Hi all: I am testing the performance of hive on spark sql. The existing table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPE

Re: Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
...@sigmoidanalytics.com>> wrote: Not quiet sure, but try pointing the spark.history.fs.logDirectory to your s3 Thanks Best Regards On Tue, Jun 16, 2015 at 6:26 PM, Gianluca Privitera mailto:gianluca.privite...@studio.unibo.it>> wrote: In Spark website it’s stated in the View After the

Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
In Spark website it’s stated in the View After the Fact section (https://spark.apache.org/docs/latest/monitoring.html) that you can point the start-history-server.sh script to a directory in order do view the Web UI using the logs as data source. Is it possible to point that script to S3? Maybe

Spark Streaming w/ tshark exception problem on EC2

2014-07-15 Thread Gianluca Privitera
Hi, I’ve got a problem with Spark Streaming and tshark. While I’m running locally I have no problems with this code, but when I run it on a EC2 cluster I get the exception shown just under the code. def dissection(s: String): Seq[String] = { try { Process("hadoop command to create ./lo

Re: window analysis with Spark and Spark streaming

2014-07-04 Thread Gianluca Privitera
You should think about a custom receiver, in order to solve the problem of the “already collected” data. http://spark.apache.org/docs/latest/streaming-custom-receivers.html Gianluca On 04 Jul 2014, at 15:46, alessandro finamore mailto:alessandro.finam...@polito.it>> wrote: Hi, I have a large

Re: Where Can I find the full documentation for Spark SQL?

2014-06-26 Thread Gianluca Privitera
/api/java/org/apache/spark/sql/api/java/JavaSchemaRDD.html> sql(String sqlQuery) Executes a query expressed in SQL, returning the result as a JavaSchemaRDD but what kind of sqlQuery we can execute, is there any more documentation? Xiaobo Gu -- Original ------ From

Re: Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread Gianluca Privitera
You can find something in the API, nothing more than that I think for now. Gianluca On 25 Jun 2014, at 23:36, guxiaobo1982 wrote: > Hi, > > I want to know the full list of functions, syntax, features that Spark SQL > supports, is there some documentations. > > > Regards, > > Xiaobo Gu

Re: Increase storage.MemoryStore size

2014-06-12 Thread Gianluca Privitera
If you are launching your application with spark-submit you can manually edit the spark-class file to make it 1g as baseline. It’s pretty easy to do and to figure out how once you open the file. This worked for me even if it’s not a final solution of course. Gianluca On 12 Jun 2014, at 15:16, e

Re: Access DStream content?

2014-06-12 Thread Gianluca Privitera
You can use ForeachRDD then access RDD data. Hope this works for you. Gianluca On 12 Jun 2014, at 10:06, Wolfinger, Fred mailto:fwolfin...@cyberpointllc.com>> wrote: Good morning. I have a question related to Spark Streaming. I have reduced some data down to a simple count value (by window),

Spark Streaming application not working on EC2 Cluster

2014-06-09 Thread Gianluca Privitera
Hi, I'm think I may have encountered some kind of bug that at the moment prevents the correct running of my application on a EC2 Cluster. I'm saying that because the same exact code works wonderfully locally but has a really strange behaviour on the cluster. val uri = ssc.textFileStream(args(1) +

Spark Streaming window functions bug 1.0.0

2014-06-06 Thread Gianluca Privitera
Is anyone experiencing problems with windows? dstream1.print() val dstream2 = dstream1.groupByKeyAndWindow(Seconds(60)) dstream2.print() In my appslication the first print() prints out all the strings and their keys, but after the window function everything is lost and nothings gets printed.

Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-06 Thread Gianluca Privitera
ur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera <mailto:gianluca.privite...@studio.unibo.it>> wrote: Hi, I've got a weird question but maybe

Re: creating new ami image for spark ec2 commands

2014-06-06 Thread Gianluca Privitera
Hi, I think the best thing you could do is run an empty AMI with that ID, add the stuff you want to add, then copy it through the AWS console, then launch the ec2 script using the new AMI you just created. On 06/06/2014 09:20, Akhil Das wrote: Hi Matt, You will be needing the following on th

Spark Streaming, download a s3 file to run a script shell on it

2014-06-05 Thread Gianluca Privitera
Hi, I've got a weird question but maybe someone has already dealt with it. My Spark Streaming application needs to - download a file from a S3 bucket, - run a script with the file as input, - create a DStream from this script output. I've already got the second part done with the rdd.pipe() API th

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Gianluca Privitera
Hi, if you say you correctly setted your access key id and secret access key then probably it's a problem related to the key.pem file. Try generate a new one, and be sure to be the only one with the right to read it or it wont work. Gianluca On 04/06/2014 09:45, Sam Taylor Steyer wrote: Hi,

EC2 Simple Cluster

2014-06-02 Thread Gianluca Privitera
Hi everyone, I would like to setup a very simple cluster (specifically using 2 micro instances only) of Spark on EC2 and make it run a simple Spark Streaming application I created. Someone actually managed to do that? Because after launching the scripts from this page: http://spark.apache.org/