[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194543#comment-14194543
]
Tim Allison commented on TIKA-1302:
---
[~anjackson], the google docs link is down at the
Hong-Thai Nguyen created TIKA-1463:
--
Summary: TesseractOCRParser does work in Windows
Key: TIKA-1463
URL: https://issues.apache.org/jira/browse/TIKA-1463
Project: Tika
Issue Type: Bug
Tim Barrett created TIKA-1464:
-
Summary: Too many open files in system when parsing thousands of
files
Key: TIKA-1464
URL: https://issues.apache.org/jira/browse/TIKA-1464
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194685#comment-14194685
]
Nick Burch commented on TIKA-1464:
--
Firstly, make sure you're closing the InputStream /
[
https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194691#comment-14194691
]
Tim Barrett commented on TIKA-1464:
---
I double checked the input stream closing thoroughly
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194694#comment-14194694
]
Hong-Thai Nguyen commented on TIKA-1463:
Fixed in r1636382
TesseractOCRParser
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong-Thai Nguyen updated TIKA-1463:
---
Summary: TesseractOCRParser does not work in Windows (was:
TesseractOCRParser does work in
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong-Thai Nguyen updated TIKA-1463:
---
Description:
STR:
* Case 1:
** Setting tesseractPath to a common installation path of
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong-Thai Nguyen closed TIKA-1463.
--
Resolution: Fixed
TesseractOCRParser does not work in Windows
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194712#comment-14194712
]
Hudson commented on TIKA-1463:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #297 (See
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194733#comment-14194733
]
Hudson commented on TIKA-1463:
--
SUCCESS: Integrated in tika-trunk-jdk1.6 #277 (See
[
https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194753#comment-14194753
]
Tim Barrett commented on TIKA-1464:
---
Built using 1.7-SNAPSHOT from
On Oct. 31, 2014, 3:22 p.m., Lewis McGibbney wrote:
File Attachment: GribParser - GribParser.java
https://reviews.apache.org/r/27414/#fcomment48
Is this always available? What happens if we read an InputStream and
not a File? Can we still populate Metadata.RESOURCE_NAME_KEY?
Lewis John McGibbney created TIKA-1465:
--
Summary: Implement extraction of non-global variables from netCDF3
and netCDF4
Key: TIKA-1465
URL: https://issues.apache.org/jira/browse/TIKA-1465
On Nov. 2, 2014, 5:39 p.m., Tyler Palsulich wrote:
File Attachment: GribParser - GribParser.java
https://reviews.apache.org/r/27414/#fcomment51
Need a corresponding `xhtml.endElement(ul);`.
Corrected!
- Vineet Ghatge
---
On Nov. 2, 2014, 5:39 p.m., Tyler Palsulich wrote:
File Attachment: GribParser - GribParser.java
https://reviews.apache.org/r/27414/#fcomment52
Need a corresponding `xhtml.endElement(ul);`.
Corrected!
- Vineet Ghatge
---
On Oct. 31, 2014, 3:22 p.m., Lewis McGibbney wrote:
File Attachment: GribParser - GribParser.java
https://reviews.apache.org/r/27414/#fcomment49
Formatting and TikaException message is not correct. I would suggest
that we stick to GRIB parse error.
Additionally, I don't
On Nov. 2, 2014, 3:01 a.m., Chris Mattmann wrote:
trunk/tika-parsers/pom.xml, line 84
https://reviews.apache.org/r/27414/diff/1/?file=745304#file745304line84
shouldn't this replace the above dependency
I am not sure if there are components to which depend on it. I know that netcdf
[
https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195206#comment-14195206
]
Luis Filipe Nassif commented on TIKA-1464:
--
You can attach the file leak detector
[
https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195222#comment-14195222
]
Luis Filipe Nassif commented on TIKA-1463:
--
Curious, because here I can run
Dear All,
I am Phuong Linh,
I am using Tika to extract content form Html file to search. But HtmlParser
cannot parse all tag of Html. ( I get Html page by Nutch, then use Tika to
extract the important information, after then use Solr to search.)
Can you tell me what i can do to parse all tag of
From: Linh Tang
Sent: November 3, 2014 2:30:46pm PST
To: dev@tika.apache.org
Subject: Parse Html with Tika
Dear All,
I am Phuong Linh,
I am using Tika to extract content form Html file to search. But HtmlParser
cannot parse all tag of Html.
I'm not sure what you mean by cannot
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27562/
---
Review request for tika, Lewis McGibbney, Chris Mattmann, Tyler Palsulich, and
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27562/
---
(Updated Nov. 4, 2014, 5:17 a.m.)
Review request for tika, Lewis McGibbney,
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27562/#review59728
---
Patch is looking good. I am testing
- Lewis McGibbney
On Nov. 4,
On Nov. 4, 2014, 5:23 a.m., Lewis McGibbney wrote:
Patch is looking good. I am testing
Yes @mattmann, the unit test passes for me
- Vineet Ghatge
---
This is an automatically generated e-mail. To reply, visit:
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27414/
---
(Updated Nov. 4, 2014, 5:36 a.m.)
Review request for tika, Lewis McGibbney,
On Nov. 4, 2014, 5:23 a.m., Lewis McGibbney wrote:
Patch is looking good. I am testing
Vineet Ghatge Hemantkumar wrote:
Yes @mattmann, the unit test passes for me
what is the grib file please? Where can I find it?
- Lewis
On Nov. 4, 2014, 5:23 a.m., Lewis McGibbney wrote:
Patch is looking good. I am testing
Vineet Ghatge Hemantkumar wrote:
Yes @mattmann, the unit test passes for me
Lewis McGibbney wrote:
what is the grib file please? Where can I find it?
This is the grib file -
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27562/#review59734
---
OK, test is also failing for me with Tika trunk as follows
1
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27414/
---
(Updated Nov. 4, 2014, 5:48 a.m.)
Review request for tika, Lewis McGibbney,
On Nov. 4, 2014, 5:45 a.m., Lewis McGibbney wrote:
OK, test is also failing for me with Tika trunk as follows
1
---
2 Test set: org.apache.tika.parser.grib.GribParserTest
3
On Nov. 4, 2014, 5:45 a.m., Lewis McGibbney wrote:
OK, test is also failing for me with Tika trunk as follows
1
---
2 Test set: org.apache.tika.parser.grib.GribParserTest
3
On Nov. 4, 2014, 5:45 a.m., Lewis McGibbney wrote:
OK, test is also failing for me with Tika trunk as follows
1
---
2 Test set: org.apache.tika.parser.grib.GribParserTest
3
Hi Linh
You can specify a mapper to control what the html parser will filter or not.
see
https://github.com/DigitalPebble/storm-crawler/commit/27364cb7ddb3998f973ab6e09f384e28cc5b7639
for an example
Julien
On Monday, 3 November 2014, Linh Tang ttplinh2...@gmail.com wrote:
Dear All,
I am
On Nov. 4, 2014, 5:45 a.m., Lewis McGibbney wrote:
OK, test is also failing for me with Tika trunk as follows
1
---
2 Test set: org.apache.tika.parser.grib.GribParserTest
3
36 matches
Mail list logo