Feature Request: Filtered Output Option for riot --validate

Adrian Gschwend Fri, 24 Jan 2025 08:30:29 -0800

Hi group,

I've been using riot --validate regularly to identify issues in RDFdatasets, and it has been a great tool for ensuring data quality. I’venoticed that it currently doesn’t offer a way to produce a "cleaned"version of a dataset as output. Unless I’m overlooking something, thiscould be a helpful addition.

What I’m envisioning is an option to generate a reduced datasetcontaining only valid triples. Ideally, this could be implemented in twomodes:

1. "Super strict" mode: Filters out everything that triggers warnings orerrors.

2. "Clean" mode: Strips out only the triples with errors while retainingthose with warnings.

This would be particularly useful for scenarios where a "strict" versionof a dataset is required. Currently, I resort to some creative grepscripting to manually filter out problematic triples based on the issuesflagged by riot --validate, but this is far from ideal and slow.


In this proposed mode, it would also be great if riot could:

- Avoid stopping on errors and simply log them instead.

- Optionally write warnings and/or error triples to a separate file forlater analysis and fixes at the source.

I understand this may not align with everyone's use cases, but for thoseof us who often need to work with cleaned datasets for downstreamprocessing, this could be a very helpful enhancement.

In all the years I have not found another tool that is as useful as riot--validate.


What do you think?

regards

Adrian

Feature Request: Filtered Output Option for riot --validate

Reply via email to