Hi Andy,

I suggest to monitor disk usage and in case it is 90% occupation send
alarm to your support team to solve problem, you should not allow your
production system to go down.

Regarding tools you can try set of software as collectd and Spark ->
Graphite -> Grafana -> https://github.com/pabloa/grafana-alerts. I
have not used grafana-alerts but looks promising.

BR,

Arkadiusz Bicz


On Fri, Feb 12, 2016 at 4:38 PM, Andy Davidson
<a...@santacruzintegration.com> wrote:
> Hi Arkadiusz
>
> Do you have any suggestions?
>
> As an engineer I think when I get disk full errors I want the application to
> terminate. Its a lot easier for ops to really there is a problem.
>
>
> Andy
>
>
> From: Arkadiusz Bicz <arkadiusz.b...@gmail.com>
> Date: Friday, February 12, 2016 at 1:57 AM
> To: Andrew Davidson <a...@santacruzintegration.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: best practices? spark streaming writing output detecting disk
> full error
>
> Hi,
>
> You need good monitoring tools to send you alarms about disk, network
> or  applications errors, but I think it is general dev ops work not
> very specific to spark or hadoop.
>
> BR,
>
> Arkadiusz Bicz
> https://www.linkedin.com/in/arkadiuszbicz
>
> On Thu, Feb 11, 2016 at 7:09 PM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>
> We recently started a Spark/Spark Streaming POC. We wrote a simple streaming
> app in java to collect tweets. We choose twitter because we new we get a lot
> of data and probably lots of burst. Good for stress testing
>
> We spun up  a couple of small clusters using the spark-ec2 script. In one
> cluster we wrote all the tweets to HDFS in a second cluster we write all the
> tweets to S3
>
> We were surprised that our HDFS file system reached 100 % of capacity in a
> few days. This resulted with “all data nodes dead”. We where surprised
> because the actually stream app continued to run. We had no idea we had a
> problem until a day or two after the disk became full when we noticed we
> where missing a lot of data.
>
> We ran into a similar problem with our s3 cluster. We had a permission
> problem and where un able to write any data yet our stream app continued to
> run
>
>
> Spark generated mountains of logs,We are using the stand alone cluster
> manager. All the log levels wind up in the “error” log. Making it hard to
> find real errors and warnings using the web UI. Our app is written in Java
> so my guess is the write errors must be unable. I.E. We did not know in
> advance that they could occur . They are basically undocumented.
>
>
>
> We are a small shop. Running something like splunk would add a lot of
> expense and complexity for us at this stage of our growth.
>
> What are best practices
>
> Kind Regards
>
> Andy
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to