For what it is worth here are my two bits:  like Simon said, the best approach 
depends on your use case:  if you are interested in Janus Graph, like Simon 
suggested, it is part of the Hortonworks distribution and is used internally by 
Atlas, so it should not be too difficult to integrate...not sure what version 
of HDP started including Atlas with a Janus Graph backend, but I am using 3.1, 
which does.  How or if that integrates with the Metronome stack is another 
issue that will require some research or an answer from one of the Merton 
Devs...

If you are only after network visualization with no substantial graph algorithm 
support, your starting point should be getting your data into an edge list.  
The easiest way would be to do a sql query something like SELECT src_ip, 
dst_ip, COUNT(*) FROM ..., as well as a distinct list of src and dst 
IP’s,,,many graph representations use this format...either as JSON, XML, or 
CSV...but in any case, that would be the starting point for 
exporting/transforming your data to a useful format to be ingested by a graph 
database or visualization library.

Sent from my iPhone

> On Jan 2, 2019, at 07:36, Simon Elliston Ball <si...@simonellistonball.com> 
> wrote:
> 
> Graph enables a number of interesting use cases, and it really depends on 
> what you’re after as to which tech makes sense. 
> 
> Spark graphx is a strong contender for analytics of things like betweenness 
> and community linkage on HDFS indexed data. That would tend to be batch and 
> through something like zeppelin. The very latest zeppelin also supports a 
> network visualisation method which gives a graph like visual option.
> 
> For more interactive, streaming graph and alerting on graph an actual graph 
> database makes more sense. I’ve seen some work done around Metron stacks with 
> janusgraph, which leans on solr and Hbase so avoids adding too much 
> complexity. Janus is not an apache project, but should be includable. At 
> present I’ve only seen that used in Metron based distributions rather than 
> Metron core.
> 
> Simon 
> 
> 
>> On 2 Jan 2019, at 11:59, Otto Fowler <ottobackwa...@gmail.com> wrote:
>> 
>> Pieter,
>> Can you create a jira with your use case?  It is important to capture.  We 
>> have some outstanding jira’s around graph support.
>> 
>> 
>>> On January 2, 2019 at 04:40:23, Stefan Kupstaitis-Dunkler 
>>> (stefan....@gmail.com) wrote:
>>> 
>>> Hi Pieter,
>>> 
>>>  
>>> 
>>> Happy new year!
>>> 
>>>  
>>> 
>>> I believe that always depends on a lot of factors and applies to any kind 
>>> of visualization problem with big amounts of data:
>>> 
>>> How fast do you need the visualisations available?
>>> How up-to-date do they need to be?
>>> How complex?
>>> How beautiful/custom modified?
>>> How familiar are you with these frameworks? (could be a reason not to use a 
>>> lib if they are otherwise equal in capabilities)
>>>  
>>> 
>>> It sounds like you want to create a simple histogram across the full 
>>> history of stored data. So I’ll throw in another option, that is commonly 
>>> used for such use cases:
>>> 
>>> Zeppelin notebook:
>>> Access data stored in HDFS via Hive.
>>> A bit of preparation in Hive is required (and can be scheduled), e.g. 
>>> creating external tables and converting data into a more efficient format, 
>>> such as ORC.
>>>  
>>> 
>>> Best,
>>> 
>>> Stefan
>>> 
>>>  
>>> 
>>> From: Pieter Baele <pieter.ba...@gmail.com>
>>> Reply-To: "user@metron.apache.org" <user@metron.apache.org>
>>> Date: Wednesday, 2. January 2019 at 07:50
>>> To: "user@metron.apache.org" <user@metron.apache.org>
>>> Subject: Graphs based on Metron or PCAP data
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>>  
>>> 
>>> (and good New Year to all as well!)
>>> 
>>>  
>>> 
>>> What would you consider as the easiest approach to create a Graph based 
>>> primarly on ip_dst and ip_src adresses and the number (of connections) of 
>>> those?
>>> 
>>>  
>>> 
>>> I was thinking:
>>> 
>>> - graph functionality in Elastic stack, but limited (ex only recent data in 
>>> 1 index?)
>>> 
>>> - interfacing with Neo4J
>>> 
>>> - GraphX using Spark?
>>> 
>>> - using R on data stored in HDFS?
>>> 
>>> - using Python: plotly? Pandas?
>>> 
>>>  
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Sincerely
>>> 
>>> Pieter

Reply via email to