GitHub user HaruspexSan created a discussion: Using Hop to transform data and 
write to another database

Hello Apache-Hop Community,

Disclaimer: I am new to Apache Hop and using it for my Master's thesis. I 
appreciate any guidance!

The Goal: I am building a pipeline to populate a BI fact table from a 
production Postgres database.

Source: A Users table containing raw transactional data (Granularity: 1 row per 
user).
Fields: ID, created_at, deleted_at, status etc.

Target: A Daily_Stats BI table (Granularity: 1 row per day).
Desired Fields: Date, NewUsersThatDay, TotalUserCount (Cumulative).

The Problem: I am struggling to transform the row-level user data into the 
aggregated daily format within a single pipeline.
Especially with the issue of different numbers of input fields per stream part.

Furthermore, users with the deleted_at field not being null be deleted after 30 
days. Which means I need to add the functionality that after the status changes 
from active to deleted, i need to reduce the running count by 1 as the user 
would otherwise be counted in the "rolling count" eventhough it was deleted. 

So my questions are:
Is all of this possible in one big pipeline? Or do I have to split it up into 
several?
How do i use scripting or SQL or any other transform to calculate the rolling 
count, especially because i need the last rolling count as input from the 
database and table i will be writing my output to?
How do i calculate the NewUsersThatDay field as i have to aggregate not within 
a row but with the rows from that day?

Thank you so much in advance

GitHub link: https://github.com/apache/hop/discussions/6308

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to