>
> * Time series correlation and anomaly detection: AKA: I want an alert for
> that massive memcached bytes_out spike that doesn't also wake me up with
> false positives at 2AM.


Related: Abe Stanway gave a talk at BACON 2013 about Etsy's realtime
anomaly detection and correlation tools, Skyline and Oculus, which form the
Kale stack [0][1].

[0]:
http://devslovebacon.com/conferences/bacon-2013/talks/bring-the-noise-continuously-deploying-under-a-hailstorm-of-metrics
[1] https://codeascraft.com/2013/06/11/introducing-kale/


On Wed, Nov 5, 2014 at 4:22 PM, Toby Negrin <tneg...@wikimedia.org> wrote:
>
> Awesome -- thanks Ori.
>
> On Wed, Nov 5, 2014 at 12:56 AM, Ori Livneh <o...@wikimedia.org> wrote:
>
>> Facebook just published this summary of a summit for database researchers
>> held at Menlo Park last September. I recommend it. It contains a clear and
>> concise description of Facebook's data infrastructure, and a description of
>> the open problems they are thinking about, which is even more interesting.
>>
>>
>> https://research.facebook.com/blog/1522692927972019/facebook-s-top-open-data-problems/
>>
>> To whet your appetite, here are the problems (the summaries mostly my own
>> paraphrase):
>>
>> * Mobile: How should the shift toward mobile devices affect Facebook’s
>> data infrastructure?
>>
>> * Reducing replication: How can we reduce the number of round trips
>> between the application and data layers?
>>
>> * Impact of Caching on Availability (aka "oh no, we just restarted
>> memcached"): How do we harness the efficiency gains provided by caching
>> without being brought to our knees by a sudden drop in cache hit rate?
>>
>> * Sampling at logging time in a distributed environment: How should we
>> sample log streams if we want to maintain accuracy and flexibility to
>> answer post-hoc queries?
>>
>> * Trading storage space and CPU: TL;DR: gzip --best or gzip --fast?
>>
>> * Reliability of pipelines: Pipelines are less reliable than the sum of
>> their parts. A pipeline composed of two systems, each 0.999 reliable,
>> is 0.989 reliable. Much sadness. What to do?
>>
>> * Globally distributed warehouse: consistency models and synchronization
>> problems.
>>
>> * Time series correlation and anomaly detection: AKA: I want an alert for
>> that massive memcached bytes_out spike that doesn't also wake me up with
>> false positives at 2AM.
>>
>>
>>
>> _______________________________________________
>> Engineering mailing list
>> engineer...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/engineering
>>
>>
>
> _______________________________________________
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>
>
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to