Yah Giuseppe!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.go
Hello All,
Please welcome Giuseppe Totaro as he joins us as the latest Tika committer and
PMC Member.
He's recently been VOTEd in and now has his account all set up so is ready to
roll!
Giuseppe, please feel free to say a bit about yourself as an introduction to
the group.
Welcome aboard,
Da
[
https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487708#comment-14487708
]
Konstantin Gribov commented on TIKA-1519:
-
It can be either refinement or not. E.g.
[
https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487696#comment-14487696
]
Ken Krugler commented on TIKA-1519:
---
After thinking about this more, I don't think it's a
[
https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487673#comment-14487673
]
Tim Allison commented on TIKA-1519:
---
In the above example, would we want the Content-Type
[
https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-1519:
---
The fix for this led to ignoring valid encoding detection when the overall id
was identified as {{applicat
> From: Allison, Timothy B.
> Sent: April 9, 2015 9:02:44am PDT
> To: dev@tika.apache.org
> Subject: RE: [VOTE] Release Apache Tika 1.8 Candidate #1
>
> I just finished the against govdocs1 with 1.7 vs. 1.8-rc1, and all looks good
> with one major change... on first glance.
>
> Because of my "f
Hi, Tim.
I think, whitelisting on content-type from meta tag can be a solution. We
can whitelist "text/html" + options (like "text/html; charset=...") and
"application/xhtml+xml" + options. So, users, who had valid (text/html or
application/xhtml+xml in ) will have
same behavior as it was in 1.7 a
For those who want to take a look at the reports (much more work is needed on
processing stack traces for SORT_STACK_TRACE):
https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_7_v_1_8-rc1.zip
I just finished the against govdocs1 with 1.7 vs. 1.8-rc1, and all looks good
with one major change... on first glance.
Because of my "fix" on TIKA-1519 and the law of unintended consequences, files
that start like so:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
http://www.w3.org
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487012#comment-14487012
]
Julien Nioche commented on TIKA-1599:
-
FWIW we've just added a JSoup based parser to
s
11 matches
Mail list logo