Ok, subject to the two security safeguards discussed, if people are ok with
this, please can the 'fileUrl' functionality be schedules to be added back in
the next release ?
-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: 14 September 2016 17:55
To:
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaleb Akalework updated TIKA-2080:
--
Attachment: nihao2.pdf
This is the input file I used
> PDFParser tika-parsers-1.13.jar not
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494305#comment-15494305
]
Kaleb Akalework commented on TIKA-2080:
---
Opened ticket at PDFBOX under Tim Allisons advice
>
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494300#comment-15494300
]
Kaleb Akalework commented on TIKA-2080:
---
Under Tim Allisons advice, I opened a ticket under PDFBOX
>
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494287#comment-15494287
]
Tim Allison commented on TIKA-2080:
---
Under More->Attach Files. Make sure to share it on the PDFBox
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494272#comment-15494272
]
Kaleb Akalework commented on TIKA-2080:
---
How can I share. I don't see how to upload a file.
>
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494269#comment-15494269
]
Tim Allison commented on TIKA-2080:
---
Please open a new issue on PDFBox's JIRA and link it to this one.
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494056#comment-15494056
]
Kaleb Akalework edited comment on TIKA-2080 at 9/15/16 5:45 PM:
Thanks. I
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494056#comment-15494056
]
Kaleb Akalework edited comment on TIKA-2080 at 9/15/16 5:44 PM:
Thanks. I
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494056#comment-15494056
]
Kaleb Akalework commented on TIKA-2080:
---
Thanks. I still see the problem with the new PDFBox2.0.3
Hi Sergey,
I definitely get the challenges. In fact recently we merged the PDF
module into the Multimedia module due to the tight coupling around the
TesseractOCR[1] [2]. We could look into separating the PDF parser out
again but I'm a bit short on a simple way to do it with TesseractOCR in
[
https://issues.apache.org/jira/browse/TIKA-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493943#comment-15493943
]
Hudson commented on TIKA-2055:
--
SUCCESS: Integrated in Jenkins build Tika-trunk #1100 (See
[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493924#comment-15493924
]
Nick Burch commented on TIKA-2069:
--
Yes! If you wrote a VB Script, and zipped it up, it'd be a
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493904#comment-15493904
]
Tim Allison commented on TIKA-2080:
---
I just updated our wiki (see link above) to include the literal
[
https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493861#comment-15493861
]
Kaleb Akalework commented on TIKA-2080:
---
So far I have been using the parser directly from Tika, but
[
https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493888#comment-15493888
]
Tim Allison commented on TIKA-1194:
---
Y, this is still failing.
> Missing text from MS Word (DOC) file
>
[
https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1437.
---
Resolution: Cannot Reproduce
Accents seem to work as expected with trunk. This may have been fixed
[
https://issues.apache.org/jira/browse/TIKA-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493869#comment-15493869
]
Tim Allison commented on TIKA-1829:
---
When would the parseContext be null? Sorry for our delay!
>
[
https://issues.apache.org/jira/browse/TIKA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493838#comment-15493838
]
Tim Allison commented on TIKA-1760:
---
We've upgraded to PDFBox 2.0 as of Tika 1.13 Can you confirm that
Kaleb Akalework created TIKA-2080:
-
Summary: PDFParser tika-parsers-1.13.jar not parsing Japanese and
Chinese Characters correctly
Key: TIKA-2080
URL: https://issues.apache.org/jira/browse/TIKA-2080
Sergey, your point is well taken.
Y, you'd need most parsers, but you can _probably_ live without advanced or
scientific (sorry, Chris!).
I'd be hesitant to change the structure much. We should definitely document
this well, though!
-Original Message-
From: Sergey Beryozkin
The Apache Jenkins build system has built tika-2.x-windows (build #46)
Status: Still Failing
Check console output at https://builds.apache.org/job/tika-2.x-windows/46/ to
view the results.
Hi All
As Tim educated me, PDF (and indeed other formats) may have all sort of
embedded attachments.
In my demo I've been working with Tika 2.0-SNAPSHOT which offers a nice
option for users to pick up only individual parsers. So I've added
PDFParser & OpenDocumentParser and tike-core to the
[
https://issues.apache.org/jira/browse/TIKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1864.
---
Resolution: Won't Fix
Question for users list.
> org.apache.poi.hssf.record.formula.UnaryPlusPtg
[
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493792#comment-15493792
]
Tim Allison commented on TIKA-1997:
---
[~gagravarr], any recommendations on this one?
> Problem in
[
https://issues.apache.org/jira/browse/TIKA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1838.
---
Resolution: Won't Fix
Question for users' list
> Just a quick question regarding compatibility
>
[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493780#comment-15493780
]
Tim Allison commented on TIKA-2069:
---
Makes sense, although I'd prefer to write one parser rather than
[
https://issues.apache.org/jira/browse/TIKA-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2055.
---
Resolution: Fixed
Fix Version/s: 1.14
2.0
> Exception on parsing .docx file
Let me touch back in a month. ;)
Looks like PDFBox 2.0.3 and POI-3.15-beta3 or POI-3.15-final will be out
shortly.
Any blockers/wishes on 1.14?
-Original Message-
From: lewis john mcgibbney [mailto:lewi...@apache.org]
Sent: Friday, August 12, 2016 7:51 PM
To: dev@tika.apache.org
[
https://issues.apache.org/jira/browse/TIKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493303#comment-15493303
]
Tim Allison commented on TIKA-2079:
---
First five bytes of the attached files: 00 01 00 00 B4
> Unknown
[
https://issues.apache.org/jira/browse/TIKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2079:
--
Description:
We recently modified how we're extracting OLE wrapped embedded objects within
ppts. On a
[
https://issues.apache.org/jira/browse/TIKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2079:
--
Attachment: Root Entry_46.ttf
Root Entry_44.ttf
Root Entry_41.ttf
Tim Allison created TIKA-2079:
-
Summary: Unknown embedded image file in ppt
Key: TIKA-2079
URL: https://issues.apache.org/jira/browse/TIKA-2079
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492536#comment-15492536
]
Tim Barrett commented on TIKA-2058:
---
private void processFileEmbeddedInMsg(InformationGranule msgGranule,
[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492526#comment-15492526
]
Tim Barrett commented on TIKA-2058:
---
Note the poifsSileSyetm.close that is commented out there. I think
[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492522#comment-15492522
]
Tim Barrett commented on TIKA-2058:
---
private void processMsgEmbeddedInMsg(InformationGranule msgGranule,
36 matches
Mail list logo