Actually, they could use Drools (and in theory, Javascript or Groovy rule
engines) right?  Even if not directly supported, they could be added in
some of the Scripting modules.

Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/


On Wed, Jun 15, 2022 at 10:27 AM Matt Casters <[email protected]>
wrote:

> Great topic!
>
> So to add to that, data profiling can be done right now using the
> common statistical features but it's geared towards operational data
> profiling.  Everything you can aggregate (min, max, count all, count
> non-null, ...) or checksum can be kept track of and I would consider it
> "best-practice" to do this in scenarios where data is being staged before
> processing for example.   That way you can set alerts on the profiling data
> (counts) to see that not too many records are being rejected.  Another
> example would be to put alerts on certain fields in data sets to see that
> they're not over 80% null (again as an example).
>
> I kept this sort of "operational data profiling" in mind when I was
> architecting the new monitoring and logging functionality for the post 2.0
> versions.
>
> As far as the user interfaces are concerned for "online data profiling",
> usually used to profile input data sources and so on, I'm going to join
> Bart in inviting the community to submit requirements.  I'm convinced
> there's a lot we can do with little effort but I still think it's always
> better to start from those fresh requirements.
>
> Thanks in advance!
> Matt
>
> On Wed, Jun 15, 2022 at 4:58 PM Bart Maertens <[email protected]>
> wrote:
>
>> Hi Kevin,
>>
>> There are no dedicated data profiling/quality transforms in Hop (yet),
>> while simultaneously, everything can be used to build data
>> profiling/quality checks.
>> You can build your own data quality checks and profiling in a Hop project
>> or framework. We'll probably do more on both quality and profiling in
>> future releases, but that functionality is not available yet.
>> Feel free to create an improvement ticket in JIRA so we can keep track of
>> it.
>>
>> Regards,
>> Bart
>>
>>
>>
>> On Wed, Jun 15, 2022 at 4:48 PM Kevin L Kitts <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>>
>>>
>>> I saw in the documentation “Getting Started” section a reference to
>>> “Data Profiling”. I’d like to find more information on how data profiling
>>> and data quality related tasks are accomplished in hop. Is there a section
>>> of the documentation that describes data profiling/data quality features of
>>> hop?
>>>
>>>
>>>
>>> Thanks!
>>>
>>
>
> --
> Neo4j Chief Solutions Architect
> *✉   *[email protected]
>
>
>
>

Reply via email to