Nick Allen created METRON-594:
---------------------------------

             Summary: Replay Telemetry Data through Profiler
                 Key: METRON-594
                 URL: https://issues.apache.org/jira/browse/METRON-594
             Project: Metron
          Issue Type: Improvement
            Reporter: Nick Allen


The Profiler currently consumes live telemetry, in real-time, as it is streamed 
through Metron.  A useful extension of this functionality would allow the 
Profiler to also consume archived, historical telemetry.  Allowing a user to 
selectively replay archived, historical raw telemetry through the Profiler has 
a number of applications. The following use cases help describe why this might 
be useful.

Use Case 1 - Model Development
When developing a new model, I often need a feature set of historical data on 
which to train my model.  I can either wait days, weeks, months for the 
Profiler to generate this based on live data or I could re-run the raw, 
historical telemetry through the Profiler to get started immediately.  It is 
much simpler to use the same mechanism to create this historical data set, than 
a separate batch-driven tool to recreate something that approximates the 
historical feature set.

Use Case 2 - Model Deployment 
When deploying an analytical model to a new environment, like production, on 
day 1 there is often no historical data for the model to work with.  This often 
leaves a gap between when the model is deployed and when that model is actually 
useful.  If I could replay raw telemetry through the profiler a historical 
feature set could be created as part of the deployment process.  This allows my 
model to start functioning on day 1.

Use Case 3 - Profile Validation
When creating a Profile, it is difficult to understand how the configured 
profile might behave against the entire data set.  By creating the profile and 
watching it consume real-time streaming data, I only have an understanding of 
how behaves on that small segment of data.  If I am able to replay historical 
telemetry, I can instantly understand how it behaves on a much larger data set 
along with all the anomalies and exceptions that exist in all large data sets.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to