Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-05 Thread Flavio
That's the workaround for the code as per above: SparkConf conf = new SparkConf().set("spark.sql.warehouse.dir", "file:///C:/Users/marchifl/scalaWorkspace/SparkStreamingApp2/spark-warehouse"); SparkSession spark = SparkSession .builder()

Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread Flavio
Just for clarification, this is the fullstracktrace: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/08/03 18:18:44 INFO SparkContext: Running Spark version 2.0.0 16/08/03 18:18:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... u

Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread flavio marchi
Ho Sean, thanks for the reply. I just omitted the real path which would point on my file system. I just posted the real one. On 3 Aug 2016 19:09, "Sean Owen" wrote: > file: "absolute directory" > does not sound like a valid URI > > On Wed, Aug 3, 2016 at 11

Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread flavio marchi
owse/SPARK-15899> ? > > On Wed, Aug 3, 2016 at 11:05 AM, Flavio wrote: > >> Hello everyone, >> >> I am try to run a very easy example but unfortunately I am stuck on the >> follow exception: >> >> Exception in thread "main" java.lang.Ill

java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread Flavio
" + rmse); RandomForestRegressionModel rfModel = (RandomForestRegressionModel) (model.stages()[1]); System.out.println("Learned regression forest model:\n" + rfModel.toDebugString()); // $example off$ spark.stop();

Tuple join

2015-04-17 Thread Flavio Pompermaier
Hi to all, I have 2 rdd D1 and D2 like: D1: A,p1,a A,p2,a2 A,p3,X B,p3,Y B,p1,b1 D2: X,s,V X,r,2 Y,j,k I'd like to have a unique rdd D3(Tuple4) like A,X,a1,a2 B,Y,b1,null Basically filling with when D1.f2==D2.f0. Is that possible and how? Could you show me a simple snippet? Thanks in advance

Re: Batch of updates

2014-10-28 Thread Flavio Pompermaier
println("Flush legacy entities"); batch.clear } Iterator.empty }) } else { // return an empty Iterator of your return type Iterator.empty } Best, Flavio On Tue, Oct 28, 2014 at 1:26 PM, Kamal Banga wr

Batch of updates

2014-10-27 Thread Flavio Pompermaier
- how can I commit the remainder elements (in mapreduce those elements still in the batch array within cleanup() method)? - if I have to create a connection to a server for pushing updates, is it better to use mapPartitions instead of map? Best, Flavio

Re: Dedup

2014-10-08 Thread Flavio Pompermaier
Maybe you could implement something like this (i don't know if something similar already exists in spark): http://www.cs.berkeley.edu/~jnwang/papers/icde14_massjoin.pdf Best, Flavio On Oct 8, 2014 9:58 PM, "Nicholas Chammas" wrote: > Multiple values may be different, yet s

RE: Does HiveContext support Parquet?

2014-08-16 Thread Flavio Pompermaier
Hi to all, sorry for not being fully on topic but I have 2 quick questions about Parquet tables registered in Hive/sparq: 1) where are the created tables stored? 2) If I have multiple hiveContexts (one per application) using the same parquet table, is there any problem if inserting concurrently fr

Re: Save an RDD to a SQL Database

2014-08-07 Thread Flavio Pompermaier
Isn't sqoop export meant for that? http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1 On Aug 7, 2014 7:59 PM, "Nicholas Chammas" wrote: > Vida, > > What kind of database are you trying to write to? > > For example, I found that for loading into Redshift, by far the ea

Streaming on different store types

2014-07-30 Thread Flavio Pompermaier
Hi everybody, I have a scenario where I would like to stream data to different persistency types (i.e. sql db, graphdb ,hdfs, etc) and perform some filtering and trasformation as the the data comes in. The problem is to maintain consistency between all datastores (maybe some operation could fail)

Shark vs Impala

2014-06-22 Thread Flavio Pompermaier
mpala overcome so much Spark when executing complex queries? Best, Flavio

Spark and RDF

2014-06-19 Thread Flavio Pompermaier
Hi guys, I'm analyzing the possibility to use Spark to analyze RDF files and define reusable Shark operators on them (custom filtering, transforming, aggregating, etc). Is that possible? Any hint? Best, Flavio

Re: Spark streaming and rate limit

2014-06-19 Thread Flavio Pompermaier
Ok, I'll try to start from that when I'll try to implement it. Thanks again for the great support! Best, Flavio On Thu, Jun 19, 2014 at 10:57 AM, Michael Cutler wrote: > Hi Flavio, > > When your streaming job starts somewhere in the cluster the Receiver will > be sta

Re: Spark streaming and rate limit

2014-06-19 Thread Flavio Pompermaier
DStream.scala>..) is how to limit the external service call rate and manage the incoming buffer size (enqueuing). Could you give me some tips for that? Thanks again, Flavio On Thu, Jun 19, 2014 at 10:19 AM, Michael Cutler wrote: > Hello Flavio, > > It sounds to me like the best solutio

Re: Spark streaming and rate limit

2014-06-18 Thread Flavio Pompermaier
but I was thinking that maybe this issue was already addressed in some way (of course there should be some buffer to process high rate streaming)..or not? On Thu, Jun 19, 2014 at 4:48 AM, Soumya Simanta wrote: > Flavio - i'm new to Spark as well but I've done stream processing using

Re: Spark streaming and rate limit

2014-06-18 Thread Flavio Pompermaier
This component can control in input rate to spark. > > > On Jun 18, 2014, at 6:13 PM, Flavio Pompermaier > wrote: > > > > Hi to all, > > in my use case I'd like to receive events and call an external service > as they pass through. Is it possible to limit the

Spark streaming and rate limit

2014-06-18 Thread Flavio Pompermaier
l the buffer of incoming events waiting to be processed? Best, Flavio

Re: Using Spark to analyze complex JSON

2014-05-22 Thread Flavio Pompermaier
Is there a way to query fields by similarity (like Lucene or using a similarity metric) to be able to query something like WHERE language LIKE "it~0.5" ? Best, Flavio On Thu, May 22, 2014 at 8:56 AM, Michael Cutler wrote: > Hi Nick, > > Here is an illustrated example whi

Spark and Solr indexing

2014-05-17 Thread Flavio Pompermaier
e for Solr or could give me some hint about that? Best, Flavio

Re: Schema view of HadoopRDD

2014-05-16 Thread Flavio Pompermaier
Is there any Spark plugin/add-on that facilitate the query to a JSON content? Best, Flavio On Thu, May 15, 2014 at 6:53 PM, Michael Armbrust wrote: > Here is a link with more info: > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html > > > On Wed, May 7

Re: A new resource for getting examples of Spark RDD API calls

2014-05-13 Thread Flavio Pompermaier
Great work!thanks! On May 13, 2014 3:16 AM, "zhen" wrote: > Hi Everyone, > > I found it quite difficult to find good examples for Spark RDD API calls. > So > my student and I decided to go through the entire API and write examples > for > the vast majority of API calls (basically examples for any

Re: RDD collect help

2014-04-18 Thread Flavio Pompermaier
t > want. But sure it is discutable and it's more my personal opinion. > > > 2014-04-17 23:28 GMT+02:00 Flavio Pompermaier : > > Thanks again Eugen! I don't get the point..why you prefer to avoid kyro >> ser for closures?is there any problem with that? >>

Re: RDD collect help

2014-04-17 Thread Flavio Pompermaier
tion you reference an object outside of it and it is > getting ser with your task. To enable kryo ser for closures set > spark.closure.serializer property. But usualy I dont as it allows me to > detect such unwanted references. > Le 17 avr. 2014 22:17, "Flavio Pompermaier" a > é

Re: RDD collect help

2014-04-17 Thread Flavio Pompermaier
Now I have another problem..I have to pass one o this non serializable object to a PairFunction and I received another non serializable exception..it seems that Kyro doesn't work within Functions. Am I wrong or this is a limit of Spark? On Apr 15, 2014 1:36 PM, "Flavio Pompermaier"

Re: RDD collect help

2014-04-15 Thread Flavio Pompermaier
Ok thanks for the help! Best, Flavio On Tue, Apr 15, 2014 at 12:43 AM, Eugen Cepoi wrote: > Nope, those operations are lazy, meaning it will create the RDDs but won't > trigger any "action". The computation is launched by operations such as > collect, count, save to HD

Re: RDD collect help

2014-04-14 Thread Flavio Pompermaier
utes it's due to the fact that java serialization > does not ser/deser attributes from classes that don't impl. Serializable > (in your case the parent classes). > > > 2014-04-14 23:17 GMT+02:00 Flavio Pompermaier : > >> Thanks Eugen for tgee reply. Could you expla

Re: RDD collect help

2014-04-14 Thread Flavio Pompermaier
n.html > > Eugen > > > 2014-04-14 18:21 GMT+02:00 Flavio Pompermaier : > >> Hi to all, >> >> in my application I read objects that are not serializable because I >> cannot modify the sources. >> So I tried to do a workaround creating a dummy cla

RDD collect help

2014-04-14 Thread Flavio Pompermaier
0.9.0-incubating. Best, Flavio

Re: Spark operators on Objects

2014-04-10 Thread Flavio Pompermaier
? Is there any suggestion about how to start? On Wed, Apr 9, 2014 at 11:37 PM, Flavio Pompermaier wrote: > Any help about this...? > On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" wrote: > >> Hi to everybody, >> >> In my current scenario I have complex objec

Re: Spark on YARN performance

2014-04-10 Thread Flavio Pompermaier
e and interoperable with other frameworks..am I wrong? Best, Flavio On Thu, Apr 10, 2014 at 5:55 PM, Mayur Rustagi wrote: > I've had better luck with standalone in terms of speed & latency. I think > thr is impact but not really very high. Bigger impact is towards being able > to m

Re: Spark operators on Objects

2014-04-09 Thread Flavio Pompermaier
Any help about this...? On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" wrote: > Hi to everybody, > > In my current scenario I have complex objects stored as xml in an HBase > Table. > What's the best strategy to work with them? My final goal would be to > define

Spark operators on Objects

2014-04-09 Thread Flavio Pompermaier
some kind of comparison between those objects. What do you suggest me? Is it possible? Best, Flavio

Spark on YARN performance

2014-04-09 Thread Flavio Pompermaier
Hi to everybody, I'm new to Spark and I'd like to know if running Spark on top of YARN or Mesos could affect (and how much) its performance. Is there any doc about this? Best, Flavio

Re: Spark and HBase

2014-04-08 Thread Flavio Pompermaier
Is it correct? Best, Flavio On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang wrote: > Hi Flavio, > > I happened to attend, actually attending the 2014 Apache Conf, I heard a > project called "Apache Phoenix", which fully leverage HBase and suppose to > be 1000x faster than Hive

Spark and HBase

2014-04-08 Thread Flavio Pompermaier
Hi to everybody, in these days I looked a bit at the recent evolution of the big data stacks and it seems that HBase is somehow fading away in favour of Spark+HDFS. Am I correct? Do you think that Spark and HBase should work together or not? Best regards, Flavio