Hello,

I am running nutch 1.19 and I am getting the following error:

2023-07-21 14:55:38,013 ERROR o.a.n.p.t.TikaParser [parse-0] Error parsing
file:/RMS/sha256/a0/ec/b0/a0/e0/ef/80/74/a0ecb0a0e0ef80747871563e2060b028c3abd330cb644ef7ee86fa9b133cbc67
org.apache.tika.exception.TikaException: Failed to parse an email message
        at
org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:110) ~[?:?]
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
~[tika-core-2.3.0.jar:2.3.0]
        at
org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:151) ~[?:?]
        at
org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:90) ~[?:?]
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:34)
~[apache-nutch-1.19.jar:?]
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:23)
~[apache-nutch-1.19.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: org.apache.james.mime4j.io.MaxHeaderLimitException: Maximum
header limit (1000) exceeded
        at
org.apache.james.mime4j.stream.MimeEntity.nextField(MimeEntity.java:254)
~[?:?]
        at
org.apache.james.mime4j.stream.MimeEntity.advance(MimeEntity.java:296)
~[?:?]
        at
org.apache.james.mime4j.stream.MimeTokenStream.next(MimeTokenStream.java:374)
~[?:?]
        at
org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:176)
~[?:?]
        at
org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:98) ~[?:?]


Is there a way to increase the header limit in nutch-site.xml or elsewhere?
I looked through the nutch-defaults.xml and didn't see the property but
maybe I missed it?

Thanks,
Steve Cohen

Reply via email to