what the question is?
> >
> > -Original Message-----
> > From: question.answer...@gmail.com [mailto:question.answer...@gmail.com]
> > Sent: Wednesday, September 14, 2016 11:50 AM
> > To: Allison, Timothy B. <talli...@mitre.org>
> > Subject: Re: 訂正 :A
m>
>
>
>
> > Sorry, can't tell what the question is?
> >
> > -Original Message-
> > From: question.answer...@gmail.com [mailto:question.answer...@gmail.com]
> > Sent: Wednesday, September 14, 2016 11:50 AM
> > To: Allison, Timothy B
to statistical inference on only a few observations (small amount of
> > bytes). :)
> >
> > -----Original Message-
> > From: question.answer...@gmail.com [mailto:question.answer...@gmail.com]
> > Sent: Wednesday, September 14, 2016 11:06 AM
> > To: user@tika.apa
Sorry, can't tell what the question is?
-Original Message-
From: question.answer...@gmail.com [mailto:question.answer...@gmail.com]
Sent: Wednesday, September 14, 2016 11:50 AM
To: Allison, Timothy B. <talli...@mitre.org>
Subject: Re: 訂正 :Apache Tikaで、EUCやshift-jisコードのhtmlの読込みで文字
ilto:question.answer...@gmail.com]
Sent: Wednesday, September 14, 2016 11:06 AM
To: user@tika.apache.org
Cc: Allison, Timothy B. <talli...@mitre.org>
Subject: Re: 訂正 :Apache Tikaで、EUCやshift-jisコードのhtmlの読込みで文字化け
Thank you for your answer.
I, character code of the file can not be determined EUC
Thank you for your answer.
I, character code of the file can not be determined EUC or Shift-JIS,
UTF-8, etc. in advance.
I, or JAVA library, I want you to determine to Tika.
I want to know the determination method.
私は、ファイルの文字コードがEUCやShift-JIS、UTF-8などを事前に判断できない。
私は、JAVAのライブラリか、Tikaに判断してほしい。
Again, relying on Google translate.
The problem with these files is that they don't self identify their encoding
via http metaheaders, and they contain very little content so Mozilla's
UniversalChardet and ICU4J don't have enough to work with. IE, Chrome and
Firefox all fail on these files,