I think it would be a good if you could add a message or something at the front page of the github repo to state that this project will soon be discontinued. Because if I were someone looking for a new tool to use, I would most definitely be impressed and go ahead and start using it - only to find that its going to be discontinued!
Regards On 3 June 2016 at 19:03, Rob Miller <rmil...@mozilla.com> wrote: > As I mentioned in the first message, Hindsight carries forward a lot of > the ideas that made Heka so useful, and pretty much any Lua code that was > written for Heka will work in Hindsight with little or no change. > > Another idea that I might explore is generating a stripped down subset of > Heka that only includes inputs, along with a Hindsight input that works > with this stripped down subset. It's not ideal in that it would be two > processes instead of one, but it could provide a bridge so that any input > code that hasn't yet been ported to Hindsight could still be used to feed > into a Hindsight centered pipeline. > > -r > > > On 06/03/2016 04:19 AM, Mac Stork wrote: > >> Hi all, >> >> Ali, I share your opinion concerning Heka's strengths. I also think that >> Heka stands out because of the flexibility of its filters. There are few >> to none lightweight data collectors/shippers that allow to process >> events with that many decoders/filters/encoders, with the possibility of >> chaining them. The numerous filtering possibilities was what made us use >> Heka. >> >> Concerning the alternative to Heka, i.e elastic's Beats: there is >> obviously a lack of outputs. However things might take a turn and you >> should look (might even participate) at this recent ticket about having >> community-maintained outputs: >> https://github.com/elastic/beats/pull/1681 >> >> Vincent >> >> On 2 June 2016 at 22:22, Ali <h...@alijnabavi.info >> <mailto:h...@alijnabavi.info>> wrote: >> >> Thanks, Rob! >> >> I have to say, I'm EXTREMELY DISAPPOINTED to hear this. >> >> I have been away from Heka for a while (working on other projects at >> work) and am now able to refocus on designing our new data >> collection/analysis/reporting system. Once I read this e-mail, I >> started looking around to see what else was out there and what has >> changed over the last several months. Elastic's Beats >> <https://www.elastic.co/products/beats> project, particularly >> Filebeat >> < >> https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html >> >, >> seemed like a really interesting and welcome development. However, >> compared to the flexibility of Heka's ins and outs, Filebeats seems >> to be wanting badly. >> >> Suffice it to say, Heka still seems to stand alone in this space. >> Its flexibility is amazing. (Again, mostly talking about inputs and >> outputs here.) The closest I can come to it is nxlog >> <http://nxlog-ce.sourceforge.net/about>, and I just really dislike >> that it's not more transparent and open-source. >> >> Anyway, I understand the rationale behind this decision and am >> hopeful that another org will continue work on this project. Thanks >> for all of your efforts, Rob et al! >> >> -Ali >> >> P.S. If anyone's interested, here's my situation right now: >> >> https://www.reddit.com/r/bigdata/comments/4m81vo/which_log_collectors_to_use_for_robust_handling/ >> and >> >> https://discuss.elastic.co/t/how-can-i-get-data-from-filebeat-to-flume/51734 >> >> >> On Fri, May 6, 2016 at 12:51 PM Rob Miller <rmil...@mozilla.com >> <mailto:rmil...@mozilla.com>> wrote: >> >> Hi everyone, >> >> I'm loooong overdue in sending out an update about the current >> state of >> and plans for Heka. Unfortunately, what I have to share here will >> probably be disappointing for many of you, and it might impact >> whether >> or not you want to continue using it, as all signs point to Heka >> getting >> less support and fewer updates moving forward. >> >> The short version is that Heka has some design flaws that make >> it hard >> to incrementally improve it enough to meet the high throughput and >> reliability goals that we were hoping to achieve. While it would >> be >> possible to do a major overhaul of the code to resolve most of >> these >> issues, I don't have the personal bandwidth to do that work, >> since most >> of my time is consumed working on Mozilla's immediate data >> processing >> needs rather than general purpose tools these days. Hindsight >> (https://github.com/trink/hindsight), built around the same Lua >> sandbox >> technology as Heka, doesn't have these issues, and internally >> we're >> using it more and more instead of Heka, so there's no >> organizational >> imperative for me (or anyone else) to spend the time required to >> overhaul the Go code base. >> >> Heka is still in use here, though, especially on our edge nodes, >> so it >> will see a bit more improvement and at least a couple more >> releases. >> Most notably, it's on my list to switch to using the most recent >> Lua >> sandbox code, which will move most of the protobuf processing to >> custom >> C code, and will likely improve performance as well as remove a >> lot of >> the problematic cgo code, which is what's currently keeping us >> from >> being able to upgrade to a recent Go version. >> >> Beyond that, however, Heka's future is uncertain. The code >> that's there >> will still work, of course, but I may not be doing any further >> improvements, and my ability to keep up with support requests >> and PRs, >> already on the decline, will likely continue to wane. >> >> So what are the options? If you're using a significant amount of >> Lua >> based functionality, you might consider transitioning to >> Hindsight. Any >> Lua code that works in Heka will work in Hindsight. Hindsight is >> a much >> leaner and more solid foundation. Hindsight has far fewer i/o >> plugins >> than Heka, though, so for many it won't be a simple transition. >> >> Also, if there's someone out there (an organization, most >> likely) that >> has a strong interest in keeping Heka's codebase alive, through >> funding >> or coding contributions, I'd be happy to support that endeavor. >> Some >> restrictions apply, however; the work that needs to be done to >> improve >> Heka's foundation is not beginner level work, and my time to help >> is >> very limited, so I'm only willing to support folks who >> demonstrate that >> they are up to the task. Please contact me off-list if you or your >> organization is interested. >> >> Anyone casually following along can probably stop reading here. >> Those of >> you interested in the gory details can read on to hear more >> about what >> the issues are and how they might be resolved. >> >> First, I'll say that I think there's a lot that Heka got right. >> The >> basic composition of the pipeline (input -> split -> decode -> >> route -> >> process -> encode -> output) seems to hit a sweet spot for >> composability >> and reuse. The Lua sandbox, and especially the use of LPEG for >> text >> parsing and transformation, has proven to be extremely efficient >> and >> powerful; it's the most important and valuable part of the Heka >> stack. >> The routing infrastructure is efficient and solid. And, perhaps >> most >> importantly, Heka is useful; there are a lot of you out there >> using it >> to get work done. >> >> There was one fundamental mistake made, however, which is that we >> shouldn't have used channels. There are many competing opinions >> about Go >> channels. I'm not going to get in to whether or not they're >> *ever* a >> good idea, but I will say unequivocally that their use as the >> means of >> pushing messages through the Heka pipeline was a mistake, for a >> number >> of reasons. >> >> First, they don't perform well enough. While Heka performs many >> tasks >> faster than some other popular tools, we've consistently hit a >> throughput ceiling thanks to all of the synchronization that >> channels >> require. And this ceiling, sadly, is generally lower than is >> acceptable >> for the amount of data that we at Mozilla want to push through our >> aggregators single system. >> >> Second, they make it very hard to prevent message loss. If >> unbuffered >> channels are used everywhere, performance plummets unacceptably >> due to >> context-switching costs. But using buffered channels means that >> many >> messages are in flight at a time, most of which are sitting in >> channels >> waiting to be processed. Keeping track of which messages have >> made it >> all the way through the pipeline requires complicated coordination >> between chunks of code that are conceptually quite far away from >> each other. >> >> Third, the buffered channels mean that Heka consumes much more >> RAM than >> would be otherwise needed, since we have to pre-allocate a pool of >> messages. If the pool size is too small, then Heka becomes >> susceptible >> to deadlocks, with all of the available packs sitting in channel >> queues, >> unable to be processed because some plugin is blocked on waiting >> for an >> available pack. But cranking up the pool size causes Heka to use >> more >> memory, even when it's idle. >> >> Hindsight avoids all of these problems by using disk queues >> instead of >> RAM buffers between all of the processing stages. It's a bit >> counterintuitive, but at high throughput performance is actually >> better >> than with RAM buffers, because a) there's no need for >> synchronization >> locks and b) the data is typically read quickly enough after it's >> written that it stays in the disk cache. >> >> There's much less chance of message loss, because every plugin is >> holding on to only one message in memory at a time, while using a >> written-to-disk cursor file to track the current position in the >> disk >> buffer. If the plug is pulled mid-process, some messages that were >> already processed might be processed again, but nothing will be >> lost, >> and there's no need for complex coordination between different >> stages of >> the pipeline. >> >> Finally, there's no need for a pool of messages. Each plugin is >> holding >> some small number of packs (possibly as few as one) in its own >> memory >> space, and those packs never escape that plugin's ownership. RAM >> usage >> doesn't grow, and pool exhaustion related deadlocks are a thing >> of the past. >> >> For Heka to have a viable future, it would basically need to be >> updated >> to work almost exactly like Hindsight. First, all of the APIs >> would need >> to be changed to no longer refer to channels. (The fact that we >> exposed >> channels to the APIs is another mistake we made... it's now >> generally >> frowned upon in Go land to expose channels as part of your >> public APIs.) >> There's already a non-channel based API for filters and outputs, >> but >> most of the plugins haven't yet been updated to use the new API, >> which >> would need to happen. >> >> Then the hard work would start; a major overhaul of Heka's >> internals, to >> switch from channel based message passing to disk queue based >> message >> passing. The work that's been done to support disk buffering for >> filters >> and outputs is useful, but not quite enough, because it's not >> scalable >> for each plugin to have its own queue; the number of open file >> descriptors would grow very quickly. Instead it would need to >> work like >> Hindsight, where there's one queue that all of the inputs write >> to, and >> another that filters write to. Each plugin reads through its >> specified >> input queue, looking for messages that match its message matcher, >> writing its location in the queue back to the shared cursors file. >> >> There would also be some complexity in reconciling Heka's >> breakdown of >> the input stage into input/splitter/decoder with Hindsight's >> encapsulation of all of these stages into a single sandbox. >> >> Ultimately I think this would be at least 2-3 months full time >> work for >> me. I'm not the fastest coder around, but I know where the >> bodies are >> buried, so I'd guess it would take anyone else at least as long, >> possibly longer if they're not already familiar with how >> everything is >> put together. >> >> And that's about it. If you've gotten this far, thanks for >> reading. >> Also, thanks to everyone who's contributed to Heka in any way, >> be it by >> code, doc fixes, bug reports, or even just appreciation. I'm >> sorry for >> those of you using it regularly that there's not a more stable >> future. >> >> Regards, >> >> -r >> _______________________________________________ >> Heka mailing list >> Heka@mozilla.org <mailto:Heka@mozilla.org> >> https://mail.mozilla.org/listinfo/heka >> >> >> _______________________________________________ >> Heka mailing list >> Heka@mozilla.org <mailto:Heka@mozilla.org> >> https://mail.mozilla.org/listinfo/heka >> >> >> > _______________________________________________ > Heka mailing list > Heka@mozilla.org > https://mail.mozilla.org/listinfo/heka >
_______________________________________________ Heka mailing list Heka@mozilla.org https://mail.mozilla.org/listinfo/heka