Hello,
have implemented Java based custom receiver, which consumes from messaging
system say JMS.
once received message, I call store(object) ... Im storing spark Row object.
it run for around 8 hrs, and then goes OOM, and OOM is happening in receiver
nodes.
I also tried to run multiple receiver
Im also facing same problem.
I have implemented Java based custom receiver, which consumes from
messaging system say JMS.
once received message, I call store(object) ... Im storing spark Row object.
it run for around 8 hrs, and then goes OOM, and OOM is happening in
receiver nodes.
I also tried t
Thanks!
On Fri, May 19, 2017 at 4:50 PM, Tathagata Das
wrote:
> Should release by the end of this month.
>
> On Fri, May 19, 2017 at 4:07 PM, kant kodali wrote:
>
>> Hi Patrick,
>>
>> I am using 2.1.1 and I tried the above code you sent and I get
>>
>> "java.lang.UnsupportedOperationException:
Hi Spark community,
I'd like to announce a new release of GraphFrames, a Spark Package for
DataFrame-based graphs!
*We strongly encourage all users to use this latest release for the bug fix
described below.*
*Critical bug fix*
This release fixes a bug in indexing vertices. This may have affect
Hi,
I have a hive external table with the S3 location having no files (but the S3
location directory does exists). When I am trying to use Spark SQL to count the
number of records in the table it is throwing error saying “File s3n://data/xyz
does not exist. null/0”.
select * from tablex limit
Should release by the end of this month.
On Fri, May 19, 2017 at 4:07 PM, kant kodali wrote:
> Hi Patrick,
>
> I am using 2.1.1 and I tried the above code you sent and I get
>
> "java.lang.UnsupportedOperationException: Data source kafka does not
> support streamed writing"
>
> so yeah this prob
I think like all other read operations, it is driven by input format used,
and I think some variation of combine file input format is used by default.
I think you can test it by force a particular input format which gets ine
file per split, then you should end up with same number of partitions as
y
Hi Patrick,
I am using 2.1.1 and I tried the above code you sent and I get
"java.lang.UnsupportedOperationException: Data source kafka does not
support streamed writing"
so yeah this probably works only from Spark 2.2 onwards. I am not sure when
it officially releases.
Thanks!
On Fri, May 19,
Could I be experiencing the same thing?
https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0
On Wed, Nov 16, 2016 at 10:37 AM, Shreya Agarwal
wrote:
> I think that is a bug. I have seen that a lot especially with long running
> jobs where Spark skips a lot of stages because it has pre-co
Hey ya'll,
Trying to migrate from Spark 1.6.1 to 2.1.0.
I use EMR, and launched a new cluster using EMR 5.5, which runs spark 2.1.0.
I updated my dependencies, and fixed a few API changes related to
accumulators, and presto! my application was running on the new cluster.
But the application UI
Part of the problem here is that the static dataframe is designed to be
used a read only abstraction in Spark, and updating that requires the user
to drop the dataframe holding the reference data and recreate it. And in
order for the join to use the recreated dataframe, the query has to be
restarte
Well described, thanks!
On 04-May-2017 4:07 AM, "JayeshLalwani"
wrote:
> In any distributed application, you scale up by splitting execution up on
> multiple machines. The way Spark does this is by slicing the data into
> partitions and spreading them on multiple machines. Logically, an RDD is
Hey all,
A reply on this would be great!
Thanks,
A.B.
On 17-May-2017 1:43 AM, "Daniel Siegmann"
wrote:
> When using spark.read on a large number of small files, these are
> automatically coalesced into fewer partitions. The only documentation I can
> find on this is in the Spark 2.0.0 release
Hi ,
I am in weird situation where the spark job behaves abnormal. sometimes it
runs very fast and some times it takes long time to complete. Not sure what
is going on. Cluster is free most of the times.
Below image shows the shuffle read is still taking more than 3 hours to
write data back into
Hi,
I am doing NLP (Natural Language Processing) processing on my data. The
data is in form of files that can be of type PDF/Text/Word/HTML. These
files are stored in a directory structure on my local disk, even nested
directories. My stand alone Java based NLP parser can read input files,
extract
Hi,I am doing NLP (Natural Language Processing) processing on my data. The
data is in form of files that can be of type PDF/Text/Word/HTML. These files
are stored in a directory structure on my local disk, even nested
directories. My stand alone Java based NLP parser can read input files,
extract t
Hi!
Is this possible possible in spark 2.1.1?
Sent from my iPhone
> On May 19, 2017, at 5:55 AM, Patrick McGloin
> wrote:
>
> # Write key-value data from a DataFrame to a Kafka topic specified in an
> option
> query = df \
> .selectExpr("CAST(userId AS STRING) AS key", "to_json(struct(*))
Hi;
I am trying multiclass text classification with Randomforest Classifier on
my local computer(16 GB RAM, 4 physical cores ).
When i run with the parameters below, i am getting
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.
spark-submit --driver-memory 1G --driver-memory
# Write key-value data from a DataFrame to a Kafka topic specified in an option
query = df \
.selectExpr("CAST(userId AS STRING) AS key", "to_json(struct(*)) AS value") \
.writeStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "host1:port1,host2:port2") \
.option("topic", "to
Is there a Kafka sink for Spark Structured Streaming ?
Sent from my iPhone
21 matches
Mail list logo