Re: Question from a data analytics/log management dude

Maximilian Michels Wed, 25 May 2016 06:00:02 -0700

Hi Stephan,

I can certainly imagine future DSLs on top of Apache Beam. However,
melting all the features of the Beam API into a DSL is not that easy
though. Likely, you will end up with something similar complex to use
as the existing API :)


There are projects that try to simplify Big Data processing and visualization:

Apache NiFi
https://nifi.apache.org/

Apache Zeppelin (incubating)
https://zeppelin.incubator.apache.org/

I would love to see those integrate with Apache Beam. Both of these
projects have integrated with Apache Flink in the past.

Best,
Max

On Wed, May 25, 2016 at 1:43 PM, Stephan Buys <[email protected]> wrote:
> Hi all,
>
> Hope I'm in the right forum, I'm someone with about a decade's worth of log 
> management/event analytics experience - for the last 2 years though we've 
> been building our own solutions based on a variety of open source 
> technologies. As hopefully some of you might appreciate, whenever you want to 
> do something interesting, or at scale with timeseries/event data a lot of the 
> tools are lacking.
>
> I started off working in Splunk and it sort off spoiled me with 
> end-user/administrator functionality from the get go (even if it 
> prohibitively expensive and slow). In Splunk the 'sandpit' that you play in 
> has all the toys a non developer can ask for: built-in map/reduce + 
> streaming, and manipulation of results/streams through a simple DSL familiar 
> to anyone with a bit of Unix CLI/Bash experience. (ie. search something | 
> filter | map | eval | visualise 
> http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchlanguagesyntax)
>
> At the moment we spend our days in logstash + elasticsearch (and sundry).
>
> I looked into Beam and Flink a bit and from a technical perspective it seems 
> like the ideal direction to go, combining many sources of data (such as 
> elasticsearch, influxdb, rethinkdb, etc) and many analytics use-cases. The 
> only gotcha seems to be that, from what I can see, the target audience is 
> almost always developers. This isn't a problem for myself, but ideally I 
> would want to bolt a simple DSL (submittable via simple interfaces, such as 
> cli) on top of my datasets but have all of the stream/batch processing 
> capabilities that project like Flink allow.
>
> Is anyone aware of projects/efforts along these lines? Ideas on how we could 
> there from a project such as Apache Beam? (Am I being naive?)
>
> Your input/perspectives are most welcome!
>
> Kind regards,
> Stephan Buys
>
>
>
>
>

Re: Question from a data analytics/log management dude

Reply via email to