If you increase the iterations then the plus type of training will not give
good result, i.e. the other letters will lose accuracy.
You can try to reduce the training text size while still keeping all the
characters that you need as part of the training text,
On Tue, Jun 18, 2019 at 2:24 AM
Yes, each iteration is one line.
For eng, the langdata training text is about 80 lines and you add 15
symbols for plus minus. With 30 fonts, you will have about 2400 lines. So
in 3600 iterations, all samples will be seen and trained.
For chi_sim with larger training text it will be different.
I guess the cotent of training text is important when you add new
characters. I had the same issue at first and then shree suggested
a larger text and more iterations. I thought variation in the text would
matter as well. I'm getting good results after I prepared good training
text.
Now,
Raspberry Pi 3B is enough for me. It takes 1 to 2 days depending on what
training.
2019年6月18日火曜日 7時50分04秒 UTC+9 Mox Betex:
>
> I was thinking of paying for Dedicated Server on
>> https://www.germanvps.com/hg-linux-kvm-hosting.php to train data.
>>
>
> Can someone tell me is this server enough
>
> I was thinking of paying for Dedicated Server on
> https://www.germanvps.com/hg-linux-kvm-hosting.php to train data.
>
Can someone tell me is this server enough to train data fast? How long can
training last with this specification?
- 8 Core Intel Xeon 2.60GHz, 32GB DDR4
--
You
when I checked with --debug_interval -1 I found that although ± is in the
GROUND TRUTH, it always showed as + or something else but not ± in the BEST
OCR TEXT. What can I do in this situation?
在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道:
>
> How big was your training text? How many iterations? Did
I was only using two different fonts and It only achieved lowest error rate
of 11.271 after the training, does this mean I really need to increase the
iterations?
在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道:
>
> How big was your training text? How many iterations? Did the fonts you use
> for
The training text was only about 2200 lines (200kB) and I used iteration of
3600. The fonts I used support ±.
What do you mean by 'whether ± is being picked for training'? When I set
--debug_interval -1 I found in every iteration it only outputs one line,
does that mean in every iteration
Can I "bump" this?
Even if I only get a high-level description of the process?
- How to make a box file (for v4) of unicode chars
- How to make the training size invariant?
Etc.
Many thanks!
On Tuesday, May 21, 2019 at 10:09:57 AM UTC-4, Jason wrote:
>
> I would like to be able to detect
How big was your training text? How many iterations? Did the fonts you use
for training support the plus minus sign?
You can run training with -- debug-level of -1 so that you can see whether
the plus minus is being picked for training in the console messages.
On Mon, 17 Jun 2019, 23:29 Jingjing
Thanks. It works. The new character I added was there.
Do you have any idea why after fine tuning tesseract still couldn't
recognize the new character I added? When I tried to add '±' to eng it
works, but when I tried to add '±' to chi_sim, it couldn't work (explained
below). Is there anything
I don't think you need training to improve results.
You need to pre-process the image, straighten it. Use a separate tool to
identify each cell of data and then OCR that. You will get best results
like that.
On Mon, Jun 17, 2019 at 6:07 PM phucp...@gmail.com
wrote:
> Thanks shree for your
combine_tessdata -u new.traineddata new.
will unpack the traineddata file. check new.lstm-unicharset in it
On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:
>
> I tried to fine tune the model and add a new character via training, but
> it seems it still couldn't recognize
I tried to fine tune the model and add a new character via training, but it
seems it still couldn't recognize this new character using the new
traineddata generated. To debug I want to check whether this new character
is in the .unicharset in the new traineddata generated. Is there a way to
do
Thanks shree for your reply. I see that you are very busy to answer a lot
of questions here. Thanks again for taking some time for me
>
> Your files have prefix of jpn, so I assume you are training for Japanese,
> but the image in question has only numbers in it.
>
Well I forgot to mention, my
Can Tesseract (or any other software) extract words or lines from images in
a image form, not text form.
I have a lot of scanned images, and for training data I need to extract
words and lines from those images in order make tiff/txt files for
training. Is there a way to do that with some
Your files have prefix of jpn, so I assume you are training for Japanese,
but the image in question has only numbers in it.
Getting good results on eval data but bad results on OCR could be the
result of overfitting the model, if you have used a small sample and
trained for large number of
17 matches
Mail list logo