GitHub user usbrandon added a comment to the discussion: Hop Training, 
Commerical Support & Production Deployments

Hello there,

I am glad that you are taking a serious look at Hop.  Having done a lot of
major deployments using Hop and the product suite that gave rise to it more
than 10 years ago, there are a few things that I would do given present
advances in technology and improvements in connectivity over the years.  I
recommend that you pick a good, universal orchestrator.  For example,
Apache Airflow.  The reason why is for complete visibility of what has run,
how long it has taken, and the flexibility to add things at the beginning,
middle or end of work that Hop is doing.  You also will want
scheduled entries to be easy to add and edit.  To give an example of how
not to do it, AWS Eventbridge.  When you have 20 pages of single line
schedules  with submenus, it is not the best experience. That opens the
door for other things that you may find you eventually need, which is a
universal catalog of all data assets, and lineage from source to
destination, not only of what become tables, but what end up becoming
columns consumed by content.  A data catalog might be something like
Datahub.  Column based lineage could be achieved by posting the right way
to Openlineage.  Hop is enormously flexible and to a large extent can do
all of the acrobatics.  There are certain things, like SLAs, killing halted
processes that tools like Airflow handle well.  We love having a UI to
visualize our workflows and pipelines.  Another aspect of production
deployment is that idempotence concept.  By forcing yourself to commit to
running Hop workflows and pipelines in a container, it will force good
practices to take shape.  You might want to clone from a repo in front of
the Hop container, and make sure the environment that the container is
running in get roles or permissions from your environment (segregating
security concerns) from what is within the containers, etc.  Even if you
run them locally on a single host, the design philosophy will let you
scale.  Hops projects, environments, and ability to let the entire
configuration spring to life from a JSON file of key, value pairs, is
enormously powerful.  Designing your workflows and pipelines to be dev,
test, prod agnostic is part of the idempotence portion, it is just an
environmental input.  There are certain features to explore like logging
pipelines so you can craft and output your logging into various systems for
analytics.  Injecting configuration into components that execute on the
canvas is also a superpower.
Be aware of certain current weaknesses too.  For example, it is difficult
to get lineage out of the box.  Column based lineage does not exist like it
does in DBT.  If you execute processes outside of Hop, your pipelines and
jobs can and will hang forever if those processes do not die or clean up by
themselves.  A lot of these things require design behaviors and patterns to
deal with.  Not good, not bad, just objectively so.  Know.Bi is good and
helpful and so are a growing community of experts that want to help.  You
can enjoy Hop quite a lot.

Brandon





On Tue, Nov 26, 2024 at 2:09 PM zoomingrocket ***@***.***>
wrote:

> Dear Team,
>
> We are currently evaluating Apache Hop as a key technology for our revised
> data engineering practice. One of the primary concerns from our leadership
> is ensuring that the initial foundation is set up correctly to provide a
> robust platform for future growth. With this in mind, could you please
> provide recommendations for key Hop fundamentals training, commercial
> support, or implementation vendors to assist with the initial setup?
>
> Additionally, we have questions regarding the production readiness of
> Apache Hop. The project appears to be in its third year and seems to have
> originated from a strong Pentaho foundation. Could you share insights on
> how other users are deploying Apache Hop in active production environments?
>
> Thank you for your inputs in advance.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/hop/discussions/4626>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAJNF5WXKEGD3S2C623I5RD2CTIRHAVCNFSM6AAAAABSRIKGCWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGU3DAOJXG4>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>


GitHub link: 
https://github.com/apache/hop/discussions/4626#discussioncomment-11408750

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to