That's the workaround for the code as per above:
SparkConf conf = new SparkConf().set("spark.sql.warehouse.dir",
"file:///C:/Users/marchifl/scalaWorkspace/SparkStreamingApp2/spark-warehouse");
SparkSession spark = SparkSession
.builder()
Just for clarification, this is the fullstracktrace:
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
16/08/03 18:18:44 INFO SparkContext: Running Spark version 2.0.0
16/08/03 18:18:44 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... u
Ho Sean, thanks for the reply. I just omitted the real path which would
point on my file system. I just posted the real one.
On 3 Aug 2016 19:09, "Sean Owen" wrote:
> file: "absolute directory"
> does not sound like a valid URI
>
> On Wed, Aug 3, 2016 at 11
owse/SPARK-15899> ?
>
> On Wed, Aug 3, 2016 at 11:05 AM, Flavio wrote:
>
>> Hello everyone,
>>
>> I am try to run a very easy example but unfortunately I am stuck on the
>> follow exception:
>>
>> Exception in thread "main" java.lang.Ill
" +
rmse);
RandomForestRegressionModel rfModel =
(RandomForestRegressionModel)
(model.stages()[1]);
System.out.println("Learned regression forest model:\n" +
rfModel.toDebugString());
// $example off$
spark.stop();
Hi to all,
I have 2 rdd D1 and D2 like:
D1:
A,p1,a
A,p2,a2
A,p3,X
B,p3,Y
B,p1,b1
D2:
X,s,V
X,r,2 Y,j,k
I'd like to have a unique rdd D3(Tuple4) like
A,X,a1,a2 B,Y,b1,null
Basically filling with
when D1.f2==D2.f0.
Is that possible and how? Could you show me a simple snippet?
Thanks in advance
println("Flush legacy entities");
batch.clear
}
Iterator.empty
})
} else {
// return an empty Iterator of your return type
Iterator.empty
}
Best,
Flavio
On Tue, Oct 28, 2014 at 1:26 PM, Kamal Banga
wr
- how can I commit the remainder elements (in mapreduce those elements
still in the batch array within cleanup() method)?
- if I have to create a connection to a server for pushing updates, is it
better to use mapPartitions instead of map?
Best,
Flavio
Maybe you could implement something like this (i don't know if something
similar already exists in spark):
http://www.cs.berkeley.edu/~jnwang/papers/icde14_massjoin.pdf
Best,
Flavio
On Oct 8, 2014 9:58 PM, "Nicholas Chammas"
wrote:
> Multiple values may be different, yet s
Hi to all, sorry for not being fully on topic but I have 2 quick questions
about Parquet tables registered in Hive/sparq:
1) where are the created tables stored?
2) If I have multiple hiveContexts (one per application) using the same
parquet table, is there any problem if inserting concurrently fr
Isn't sqoop export meant for that?
http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
On Aug 7, 2014 7:59 PM, "Nicholas Chammas"
wrote:
> Vida,
>
> What kind of database are you trying to write to?
>
> For example, I found that for loading into Redshift, by far the ea
Hi everybody,
I have a scenario where I would like to stream data to different
persistency types (i.e. sql db, graphdb ,hdfs, etc) and perform some
filtering and trasformation as the the data comes in.
The problem is to maintain consistency between all datastores (maybe some
operation could fail)
mpala overcome so much Spark when executing
complex queries?
Best,
Flavio
Hi guys,
I'm analyzing the possibility to use Spark to analyze RDF files and define
reusable Shark operators on them (custom filtering, transforming,
aggregating, etc). Is that possible? Any hint?
Best,
Flavio
Ok, I'll try to start from that when I'll try to implement it.
Thanks again for the great support!
Best,
Flavio
On Thu, Jun 19, 2014 at 10:57 AM, Michael Cutler wrote:
> Hi Flavio,
>
> When your streaming job starts somewhere in the cluster the Receiver will
> be sta
DStream.scala>..)
is how to limit the external service call rate and manage the incoming
buffer size (enqueuing).
Could you give me some tips for that?
Thanks again,
Flavio
On Thu, Jun 19, 2014 at 10:19 AM, Michael Cutler wrote:
> Hello Flavio,
>
> It sounds to me like the best solutio
but I was thinking that maybe this issue was already addressed in
some way (of course there should be some buffer to process high rate
streaming)..or not?
On Thu, Jun 19, 2014 at 4:48 AM, Soumya Simanta
wrote:
> Flavio - i'm new to Spark as well but I've done stream processing using
This component can control in input rate to spark.
>
> > On Jun 18, 2014, at 6:13 PM, Flavio Pompermaier
> wrote:
> >
> > Hi to all,
> > in my use case I'd like to receive events and call an external service
> as they pass through. Is it possible to limit the
l the buffer of
incoming events waiting to be processed?
Best,
Flavio
Is there a way to query fields by similarity (like Lucene or using a
similarity metric) to be able to query something like WHERE language LIKE
"it~0.5" ?
Best,
Flavio
On Thu, May 22, 2014 at 8:56 AM, Michael Cutler wrote:
> Hi Nick,
>
> Here is an illustrated example whi
e for Solr or could give me some
hint about that?
Best,
Flavio
Is there any Spark plugin/add-on that facilitate the query to a JSON
content?
Best,
Flavio
On Thu, May 15, 2014 at 6:53 PM, Michael Armbrust wrote:
> Here is a link with more info:
> http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
>
>
> On Wed, May 7
Great work!thanks!
On May 13, 2014 3:16 AM, "zhen" wrote:
> Hi Everyone,
>
> I found it quite difficult to find good examples for Spark RDD API calls.
> So
> my student and I decided to go through the entire API and write examples
> for
> the vast majority of API calls (basically examples for any
t
> want. But sure it is discutable and it's more my personal opinion.
>
>
> 2014-04-17 23:28 GMT+02:00 Flavio Pompermaier :
>
> Thanks again Eugen! I don't get the point..why you prefer to avoid kyro
>> ser for closures?is there any problem with that?
>>
tion you reference an object outside of it and it is
> getting ser with your task. To enable kryo ser for closures set
> spark.closure.serializer property. But usualy I dont as it allows me to
> detect such unwanted references.
> Le 17 avr. 2014 22:17, "Flavio Pompermaier" a
> é
Now I have another problem..I have to pass one o this non serializable
object to a PairFunction and I received another non serializable
exception..it seems that Kyro doesn't work within Functions. Am I wrong or
this is a limit of Spark?
On Apr 15, 2014 1:36 PM, "Flavio Pompermaier"
Ok thanks for the help!
Best,
Flavio
On Tue, Apr 15, 2014 at 12:43 AM, Eugen Cepoi wrote:
> Nope, those operations are lazy, meaning it will create the RDDs but won't
> trigger any "action". The computation is launched by operations such as
> collect, count, save to HD
utes it's due to the fact that java serialization
> does not ser/deser attributes from classes that don't impl. Serializable
> (in your case the parent classes).
>
>
> 2014-04-14 23:17 GMT+02:00 Flavio Pompermaier :
>
>> Thanks Eugen for tgee reply. Could you expla
n.html
>
> Eugen
>
>
> 2014-04-14 18:21 GMT+02:00 Flavio Pompermaier :
>
>> Hi to all,
>>
>> in my application I read objects that are not serializable because I
>> cannot modify the sources.
>> So I tried to do a workaround creating a dummy cla
0.9.0-incubating.
Best,
Flavio
? Is there any suggestion about how to start?
On Wed, Apr 9, 2014 at 11:37 PM, Flavio Pompermaier wrote:
> Any help about this...?
> On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" wrote:
>
>> Hi to everybody,
>>
>> In my current scenario I have complex objec
e and interoperable with
other frameworks..am I wrong?
Best,
Flavio
On Thu, Apr 10, 2014 at 5:55 PM, Mayur Rustagi wrote:
> I've had better luck with standalone in terms of speed & latency. I think
> thr is impact but not really very high. Bigger impact is towards being able
> to m
Any help about this...?
On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" wrote:
> Hi to everybody,
>
> In my current scenario I have complex objects stored as xml in an HBase
> Table.
> What's the best strategy to work with them? My final goal would be to
> define
some kind of
comparison between those objects. What do you suggest me? Is it possible?
Best,
Flavio
Hi to everybody,
I'm new to Spark and I'd like to know if running Spark on top of YARN or
Mesos could affect (and how much) its performance. Is there any doc about
this?
Best,
Flavio
Is it correct?
Best,
Flavio
On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang wrote:
> Hi Flavio,
>
> I happened to attend, actually attending the 2014 Apache Conf, I heard a
> project called "Apache Phoenix", which fully leverage HBase and suppose to
> be 1000x faster than Hive
Hi to everybody,
in these days I looked a bit at the recent evolution of the big data stacks
and it seems that HBase is somehow fading away in favour of Spark+HDFS. Am
I correct?
Do you think that Spark and HBase should work together or not?
Best regards,
Flavio
37 matches
Mail list logo