GitHub user usbrandon added a comment to the discussion: Hop Training, Commerical Support & Production Deployments
Hello there, I am glad that you are taking a serious look at Hop. Having done a lot of major deployments using Hop and the product suite that gave rise to it more than 10 years ago, there are a few things that I would do given present advances in technology and improvements in connectivity over the years. I recommend that you pick a good, universal orchestrator. For example, Apache Airflow. The reason why is for complete visibility of what has run, how long it has taken, and the flexibility to add things at the beginning, middle or end of work that Hop is doing. You also will want scheduled entries to be easy to add and edit. To give an example of how not to do it, AWS Eventbridge. When you have 20 pages of single line schedules with submenus, it is not the best experience. That opens the door for other things that you may find you eventually need, which is a universal catalog of all data assets, and lineage from source to destination, not only of what become tables, but what end up becoming columns consumed by content. A data catalog might be something like Datahub. Column based lineage could be achieved by posting the right way to Openlineage. Hop is enormously flexible and to a large extent can do all of the acrobatics. There are certain things, like SLAs, killing halted processes that tools like Airflow handle well. We love having a UI to visualize our workflows and pipelines. Another aspect of production deployment is that idempotence concept. By forcing yourself to commit to running Hop workflows and pipelines in a container, it will force good practices to take shape. You might want to clone from a repo in front of the Hop container, and make sure the environment that the container is running in get roles or permissions from your environment (segregating security concerns) from what is within the containers, etc. Even if you run them locally on a single host, the design philosophy will let you scale. Hops projects, environments, and ability to let the entire configuration spring to life from a JSON file of key, value pairs, is enormously powerful. Designing your workflows and pipelines to be dev, test, prod agnostic is part of the idempotence portion, it is just an environmental input. There are certain features to explore like logging pipelines so you can craft and output your logging into various systems for analytics. Injecting configuration into components that execute on the canvas is also a superpower. Be aware of certain current weaknesses too. For example, it is difficult to get lineage out of the box. Column based lineage does not exist like it does in DBT. If you execute processes outside of Hop, your pipelines and jobs can and will hang forever if those processes do not die or clean up by themselves. A lot of these things require design behaviors and patterns to deal with. Not good, not bad, just objectively so. Know.Bi is good and helpful and so are a growing community of experts that want to help. You can enjoy Hop quite a lot. Brandon On Tue, Nov 26, 2024 at 2:09 PM zoomingrocket ***@***.***> wrote: > Dear Team, > > We are currently evaluating Apache Hop as a key technology for our revised > data engineering practice. One of the primary concerns from our leadership > is ensuring that the initial foundation is set up correctly to provide a > robust platform for future growth. With this in mind, could you please > provide recommendations for key Hop fundamentals training, commercial > support, or implementation vendors to assist with the initial setup? > > Additionally, we have questions regarding the production readiness of > Apache Hop. The project appears to be in its third year and seems to have > originated from a strong Pentaho foundation. Could you share insights on > how other users are deploying Apache Hop in active production environments? > > Thank you for your inputs in advance. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/hop/discussions/4626>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAJNF5WXKEGD3S2C623I5RD2CTIRHAVCNFSM6AAAAABSRIKGCWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGU3DAOJXG4> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> > GitHub link: https://github.com/apache/hop/discussions/4626#discussioncomment-11408750 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
