Hi group,

I've been using riot --validate regularly to identify issues in RDF datasets, and it has been a great tool for ensuring data quality. I’ve noticed that it currently doesn’t offer a way to produce a "cleaned" version of a dataset as output. Unless I’m overlooking something, this could be a helpful addition.

What I’m envisioning is an option to generate a reduced dataset containing only valid triples. Ideally, this could be implemented in two modes:

1. "Super strict" mode: Filters out everything that triggers warnings or errors.

2. "Clean" mode: Strips out only the triples with errors while retaining those with warnings.

This would be particularly useful for scenarios where a "strict" version of a dataset is required. Currently, I resort to some creative grep scripting to manually filter out problematic triples based on the issues flagged by riot --validate, but this is far from ideal and slow.

In this proposed mode, it would also be great if riot could:

- Avoid stopping on errors and simply log them instead.
- Optionally write warnings and/or error triples to a separate file for later analysis and fixes at the source.

I understand this may not align with everyone's use cases, but for those of us who often need to work with cleaned datasets for downstream processing, this could be a very helpful enhancement.

In all the years I have not found another tool that is as useful as riot --validate.

What do you think?

regards

Adrian

Reply via email to