Repository: samza Updated Branches: refs/heads/master e5ea9bef1 -> 0b0a0cabf
Clean-up the case-studies page for Ebay, add a diagram Author: Jagadish <jvenkatra...@linkedin.com> Reviewers: Jagadish<jagad...@apache.org> Closes #724 from vjagadish1989/website-reorg17 Project: http://git-wip-us.apache.org/repos/asf/samza/repo Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/0b0a0cab Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/0b0a0cab Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/0b0a0cab Branch: refs/heads/master Commit: 0b0a0cabf3a6edfe4cf9c9de26c2d5185b677d0f Parents: e5ea9be Author: Jagadish <jvenkatra...@linkedin.com> Authored: Fri Oct 12 17:58:39 2018 -0700 Committer: Jagadish <jvenkatra...@linkedin.com> Committed: Fri Oct 12 17:58:39 2018 -0700 ---------------------------------------------------------------------- docs/_case-studies/ebay.md | 57 ++++++++++++------- .../learn/documentation/case-study/ebay.png | Bin 0 -> 27064 bytes 2 files changed, 35 insertions(+), 22 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/samza/blob/0b0a0cab/docs/_case-studies/ebay.md ---------------------------------------------------------------------- diff --git a/docs/_case-studies/ebay.md b/docs/_case-studies/ebay.md index e967558..96821f0 100644 --- a/docs/_case-studies/ebay.md +++ b/docs/_case-studies/ebay.md @@ -1,7 +1,7 @@ --- layout: case-study hide_title: true # so we have control in case-study layout, but can still use page -title: Low Latency Web Scale Fraud Prevention +title: Low Latency Web-Scale Fraud Prevention study_domain: ebay.com menu_title: eBay excerpt_separator: <!--more--> @@ -27,30 +27,43 @@ How Samza powers low-latency, web-scale fraud prevention at Ebay? <!--more--> -eBay Enterprise is the worldâs largest omni-channel commerce provider with -hundreds millions of units shipped annually, as commerce gets more -convenient and complex, so does fraud. The engineering team at eBay -Enterprise selected Samza as the platform to build the horizontally -scalable, realtime (sub-seconds) and fault tolerant abnormality detection -system. For example, the system computes and evaluates key metrics to -detect abnormal behaviors +eBay Enterprise is the worldâs largest omni-channel commerce provider. The engineering team at eBay chose Apache Samza to build _PreCog_, their +horizontally scalable anomaly detection system. -- Transaction velocity (#tnx/day) and change (#tnx/day vs #tnx/day over n days) -- Amount velocity ($tnx/day) and change ($tnx/day vs $tnx/day over n days) +_PreCog_ extensively leverages Samza's high-performance, fault-tolerant local storage. Its architecture had the following requirements, for which Samza perfectly fit the bill: <br/> -A wide range of realtime and historical adjunct data from various sources -including people, places, interests, social and connections are ingested -through Kafka, and stored in local RocksDB state store with changelog -enabled for recovery. Incoming transaction data is aggregated using -windowing and then joined with adjunct data stores in multiple stages. -The system generates potential fraud cases for review real time. Finally, -the engineering team at eBay Enterprise has built an OpenTSDB and Grafana -based monitoring system using metrics collected through JMX. +_Web-scale:_ Scale to a large number of users and large volume of data per-user. Additionally, should be possible to add more commodity hardware and scale horizontally. <br/> +_Low-latency:_ Process customer interactions real-time by reacting in milliseconds instead of hours. <br/> +_Fault-tolerance:_ Gracefully tolerate and handle hardware failures. <br/> -Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*, -*JMX-metrics* +![diagram-large](/img/{{site.version}}/learn/documentation/case-study/ebay.png) -More information +The PreCog anomaly-detection system comprises of multiple tiers, with each tier consisting of multiple Samza jobs, which process the output of the previous tier. + +_Ingestion tier:_ In this tier, a variety of historical and realtime data from various +sources including people, places etc., is ingested into Kafka. + +_Fanout tier:_ This tier consists of Samza jobs which process the Kafka events, fan them out and re-partition them based on various +facets like email-address, ip-address, credit-card number, shipping address etc. + +_Compute tier:_ The Samza jobs in this tier consume messages from the fan-out tier and compute various key metrics and derived features. Features used to evaluate fraud include: + +1. Number of transactions per-customer per-day <br/> +2. Change in the number of daily transactions over the past few days <br/> +3. Amount value ($$) of each transaction per-day <br/> +4. Change in the amount value of transactions over a sliding time-window <br/> +5. Number of transactions per shipping-address + +_Assembly tier:_ This tier comprises of Samza jobs which join the output of the compute-tier with other additional data-sources +and make a final determination on transaction-fraud. + +For monitoring the _PreCog_ pipeline, EBay leverages Samza's [JMXMetricsReporter](/learn/documentation/{{site.version}}/operations/monitoring.html) and ingests the reported metrics into OpenTSDB/ HBase. The metrics are then +visualzed using [Grafana](https://grafana.com/). + + +Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*, *JMX-metrics* + +More information: - [https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends](https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends) -- [http://ebayenterprise.com/](http://ebayenterprise.com/) +- [http://ebayenterprise.com/](http://ebayenterprise.com/) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/samza/blob/0b0a0cab/docs/img/versioned/learn/documentation/case-study/ebay.png ---------------------------------------------------------------------- diff --git a/docs/img/versioned/learn/documentation/case-study/ebay.png b/docs/img/versioned/learn/documentation/case-study/ebay.png new file mode 100644 index 0000000..a9976ac Binary files /dev/null and b/docs/img/versioned/learn/documentation/case-study/ebay.png differ