First of all: you do not mention any important information like which tesseract version you use, which language model etc.
Next: " -c tessedit_write_image=1" produces Could not set option: tessedit_write_image=1 ;-) Next: If you want to avoid tesseract binarization (Otsu), you must provide realy binarized image [1] as input. Yours my_bin.png image is using format 256 color/ 8 BitsPerPixel image And last: I am not able to reproduce your problem with the latest tesseract code: tesseract real_bin.png real_bin2 -c tessedit_write_images=1 -l chi_tra see attached tessinput.tif - it is different from yours tess_my_bin.tif.... [1] https://github.com/tesseract-ocr/tesseract/blob/e910b3c20b831017b3152378bdaa4c567e62c65a/src/ccmain/thresholder.cpp#L185-L199 Zdenko št 2. 7. 2020 o 11:54 xian <chenux...@gmail.com> napísal(a): > For the Chinese words, I found that binarization in tesseract makes really > bad results. > I use -c tessedit_write_image=1 to get the result image from tesseract's > binarization. > > As attachments, > original > tess_bin -> tesseract binarize the original.png > my_bin -> my preprocessing to the original.png > tess_my_bin -> tesseract binarize the my_bin.png > > You can find that some characters disappear. > Before I pass all the images to the tesseract, I want to use my own > function (pre-processing) first. > But tesseract's binarization make result worse. > > > I want to handle the image preprocessing part by mysl > How can I disable tesseract's image preprocessing? ....Or the only chance > to do this is to modify the source code? > Thanks!! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/fe0850ae-6138-4736-a855-fb691b16056co%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/fe0850ae-6138-4736-a855-fb691b16056co%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xxzxaj%2Byeas_pyMt9vXn%3DWnf2WAerv%2BR3VXYUyEp9Zsg%40mail.gmail.com.