AutoDetectParser is not parsing UTF-16 content types

2012-08-29 Thread chraj007
t its printing along with all html tags. Thank You, Rajesh Chejerla -- View this message in context: http://lucene.472066.n3.nabble.com/AutoDetectParser-is-not-parsing-UTF-16-content-types-tp4004075.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-29 Thread chraj007
http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html I Uploaded the parsing file also. -- View this message in context: http://lucene.472066.n3.nabble.com/AutoDetectParser-is-not-parsing-UTF-16-content-types-tp4004075p4004078.html Sent from the Apache Tika - Development

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-29 Thread Jukka Zitting
Hi, On Wed, Aug 29, 2012 at 6:02 PM, chraj007 wrote: > http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html Looks like that file has an incorrect http-equiv declaration: The encoding of the file is not UTF-16. Can you file a TIKA issue about this? Tika should be able to a

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-30 Thread Ken Krugler
On Aug 29, 2012, at 9:24am, Jukka Zitting wrote: > Hi, > > On Wed, Aug 29, 2012 at 6:02 PM, chraj007 wrote: >> http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html > > Looks like that file has an incorrect http-equiv declaration: > > > > The encoding of the file is not UT

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-30 Thread Ken Krugler
On Aug 29, 2012, at 8:55am, chraj007 wrote: > Hello, > Im trying to parse a file whose content type is UTF-16. Im unable to > parse the document using the following code. Please Help me. > > ContentHandler textHandler = new BodyContentHandler(); >TeeContentHandler teeHandler