Hi,

You need good monitoring tools to send you alarms about disk, network
or  applications errors, but I think it is general dev ops work not
very specific to spark or hadoop.

BR,

Arkadiusz Bicz
https://www.linkedin.com/in/arkadiuszbicz

On Thu, Feb 11, 2016 at 7:09 PM, Andy Davidson
<a...@santacruzintegration.com> wrote:
> We recently started a Spark/Spark Streaming POC. We wrote a simple streaming
> app in java to collect tweets. We choose twitter because we new we get a lot
> of data and probably lots of burst. Good for stress testing
>
> We spun up  a couple of small clusters using the spark-ec2 script. In one
> cluster we wrote all the tweets to HDFS in a second cluster we write all the
> tweets to S3
>
> We were surprised that our HDFS file system reached 100 % of capacity in a
> few days. This resulted with “all data nodes dead”. We where surprised
> because the actually stream app continued to run. We had no idea we had a
> problem until a day or two after the disk became full when we noticed we
> where missing a lot of data.
>
> We ran into a similar problem with our s3 cluster. We had a permission
> problem and where un able to write any data yet our stream app continued to
> run
>
>
> Spark generated mountains of logs,We are using the stand alone cluster
> manager. All the log levels wind up in the “error” log. Making it hard to
> find real errors and warnings using the web UI. Our app is written in Java
> so my guess is the write errors must be unable. I.E. We did not know in
> advance that they could occur . They are basically undocumented.
>
>
>
> We are a small shop. Running something like splunk would add a lot of
> expense and complexity for us at this stage of our growth.
>
> What are best practices
>
> Kind Regards
>
> Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to