Hi, In a context of ugly data, I am trying to find an efficient way to parse a kafka stream of CSV lines into a clean data model and route lines in error in a specific topic.
Generally I do this: 1. First a map to split my lines with the separator character (";") 2. Then a filter where I put all my conditions (number of fields...) 3. Then subtract the first with the second to get lines in error and save it to a topic Problem with this approach is that I cannot efficiently test the parsing of String fields in other types like Int or Date. I would like to: - test incomplete lines (arra length < x) - test empty fields - test field casting into Int, Long... - some errors can be evicting, some aren't (use Try getOrElse ?) How do you generally achieve this ? I cannot find any good data cleaning example...