Re: UTF-8 well-formedness for SimpleTextCodec

2023-12-19 Thread Adrien Grand
Hey Michael, Writing well-formed UTF-8 with SimpleTextformat sounds desirable indeed, e.g. your PR makes sense. I don't think we would want to be heroic about it, but if we can serialize the same information easily, then it sounds like something we should do. Thanks for improving SimpleTextCodec!

UTF-8 well-formedness for SimpleTextCodec

2023-12-18 Thread Michael Froh
Hi there, I was recently writing up a short Lucene file format tutorial ( https://msfroh.github.io/lucene-university/docs/DirectoryFileContents.html), using SimpleTextCodec for educational purposes. I found that SimpleTextSegmentInfo tries to output the segment ID as raw bytes, which will often