Hi,
In a context of ugly data, I am trying to find an efficient way to parse a
kafka stream of CSV lines into a clean data model and route lines in error
in a specific topic.

Generally I do this:
1. First a map to split my lines with the separator character (";")
2. Then a filter where I put all my conditions (number of fields...)
3. Then subtract the first with the second to get lines in error and save
it to a topic

Problem with this approach is that I cannot efficiently test the parsing of
String fields in other types like Int or Date. I would like to:
- test incomplete lines (arra length < x)
- test empty fields
- test field casting into Int, Long...
- some errors can be evicting, some aren't (use Try getOrElse ?)

How do you generally achieve this ? I cannot find any good data cleaning
example...

Reply via email to