GitHub user HaruspexSan created a discussion: Using Hop to transform data and write to another database
Hello Apache-Hop Community, Disclaimer: I am new to Apache Hop and using it for my Master's thesis. I appreciate any guidance! The Goal: I am building a pipeline to populate a BI fact table from a production Postgres database. Source: A Users table containing raw transactional data (Granularity: 1 row per user). Fields: ID, created_at, deleted_at, status etc. Target: A Daily_Stats BI table (Granularity: 1 row per day). Desired Fields: Date, NewUsersThatDay, TotalUserCount (Cumulative). The Problem: I am struggling to transform the row-level user data into the aggregated daily format within a single pipeline. Especially with the issue of different numbers of input fields per stream part. Furthermore, users with the deleted_at field not being null be deleted after 30 days. Which means I need to add the functionality that after the status changes from active to deleted, i need to reduce the running count by 1 as the user would otherwise be counted in the "rolling count" eventhough it was deleted. So my questions are: Is all of this possible in one big pipeline? Or do I have to split it up into several? How do i use scripting or SQL or any other transform to calculate the rolling count, especially because i need the last rolling count as input from the database and table i will be writing my output to? How do i calculate the NewUsersThatDay field as i have to aggregate not within a row but with the rows from that day? Thank you so much in advance GitHub link: https://github.com/apache/hop/discussions/6308 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
