Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4193221044 We're always allowed to brainstorm. What you describe is already being done in Hop by the Beam implementation, see the `HopPipelineMetaToBeamPipelineConverter`. Granted, it's easier since Beam and Hop are alike but for sure it can be done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mhamedbenjmaa commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4193188343 Are we allowed to brainstorm here ? if yes , I agree with @hansva there is no need to switch to python or use it, java is more than enough specially that all connector and stuff are ready, just read the XML of the pipeline and generate SQL Apache calcite, is already there and ready to take standard SQL and adapt it for any vendo, only thing left to do are 1) validate that the pipeline is ELT ready ( no funny stuff , no multiple sources , no CSV destination , there is no unsupported functions etc...) 2) Generate standard SQL 3) adapted to destination vendor (using calsite , or any other ready to use java stuff) 4) push it to destination we can start simple let say we only support simple pipe like select from , join , copy, merge(union) insert into , and add more stuff later step by step how about that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4193112085 I wouln't mind adding more support for `dbt` though. I'm not sure how that would look like though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
hansva commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4192769975 That would allow you to construct a pipeline, but won't allow you to create a plugin. I think multi language pipelines are not something on our roadmap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4192422004 @CarlosJuncher03 Not yet. PyHop is work in progress: https://github.com/mattcasters/hop/blob/8fec313419c65873c61f2d7c20f8ea3a043a34a3/docs/hop-user-manual/modules/ROOT/pages/hop-tools/hop-python/hop-python.adoc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
CarlosJuncher03 commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4192306553 Currently, there isn't an SDK available for me to develop a Python plugin for Apache Hop, right? Only Java? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4190567431 These are all great ideas. Perhaps someone will write the code for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
CarlosJuncher03 commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4189857988 One idea would be to develop a plugin that reads the pipeline XML and generates the SQL for the transformation types. I don't know if it's the easiest way, but it would be an idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mhamedbenjmaa commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4186332043 I'm just suggesting, I would love to participate also @hansva actually under the hood , most of the vendors when they apply this technic, they will not generate SQL directly, they will generate dbt code , more reliable and technology agnostic again it can be super handy when we deal with cloud stuff and we want to avoid traffic cost. Datastage for instance is top gartner for years now, they do not implement stuff with no reason So if you like the idea I'll be happy to discuss and contribute -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mhamedbenjmaa commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4185141755 Actually this is a killing feature it remove the need the think about dbt of stuff like that ,here how its suppose to work: lets assume a simple job input T1 --> derivation A*B --> sort by C -- FIlter (remove null on D) --> destination T2 you run the job by default ETL mode, all the suff is happening in the engine and then insert on T2 so far so good now we want to add ELT mode (for any reason , we are using snowflake , or we want to avoid igress or whatever reason) , if we check the box 'Run this job in ELT mode) apache hop will analyse the job components and instead of working as ETL it will generate this SQL Inset into T2 Select A,B,C,D,A*B as derivation from T1 where D is not null sort by C asc and push this SQL at the destination , BINGO we are in ELT Mode now , no engine, no data movement no nothing Major ELT vendor are proposing this feature now here an example https://dataplatform.cloud.ibm.com/docs/content/dstage/dsnav/topics/elt-mode.html?context=cpdaas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4185335003 Most of these databases are CPU bound these days and in general always have been in practice. This is mainly because of database licensing per core. I have seen ETL operations being faster in a database, but in general it's simply not true. It's a fairytale told by the likes of Oracle to sell more expensive contracts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
hansva commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4185306408 It would be another engine, I like the idea but someone would have to write the engine. Another question would be if we could write all that translation in ANSI sql or if it would be different engine types for snowflake, postgresql,... So the main question is, is this something you are willing to work on/develop or is this something for the idea box until a developer shows up that wants to do the job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [Feature Request]: Build ETL/ELT alternative (hop)
mattcasters commented on issue #6907: URL: https://github.com/apache/hop/issues/6907#issuecomment-4180795100 I must be missing what exactly you're suggesting. ELT in the context of Hop would be running a pipeline in Apache Spark, as an example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
