Re: Jar packaging issue

2013-02-04 Thread Nick Burch
On Mon, 4 Feb 2013, karl.wri...@nokia.com wrote: We recently ran into something people might not be fully aware of. Specifically, because codec jars require META-INF/services files in order to be discovered, and each codec has the same files, it's not a straightforward operation to glom all

RE: Cannot instantiate SPI class

2013-01-09 Thread Nick Burch
On Wed, 9 Jan 2013, Igal Sapir wrote: The syntax is CFML / CFScript (ColdFusion Script). Railo is an open source, high performance, ColdFusion server. http://getrailo.arg/ I will re-download the Lucene jars and try again. I'll let you know what I find. It may be worth double-checking that

RE: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-15 Thread Nick Burch
On Mon, 14 Dec 2009, Uwe Schindler wrote: Can you open an issue? This is a problem in SnowballAnalyzer missing to add the set ctor. Sure, I have done - http://issues.apache.org/jira/browse/LUCENE-2165 Nick - To unsubscribe,

SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Nick Burch
Hi All I'm upgrading my code from 2.4 to 2.9, and I've hit an issue with deprecations. My old code was: new SnowballAnalyzer(English, StopAnalyzer.ENGLISH_STOP_WORDS); Looking at the JavaDocs, I'd expected that the new format would be: new

Re: What does out of order mean?

2009-11-30 Thread Nick Burch
On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek wzzelfz...@abas.de wrote: I'd do, but was not successful to get the svn repo some months ago. I have to claim the sys admin for any svn repo to open a door through the firewall. Gave up due to $ nmap -p3690 svn.apache.org     PORT     STATE    

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-28 Thread Nick Burch
On Fri, 27 Jun 2008, Hasan Diwan wrote: The new ODF-compatible Office 2007 is not supported by POI. Actually, it is, just not the version in trunk. You can download nightly builds of the ooxml branch from http://encore.torchbox.com/poi-svn-build/OOXML-Branch/ And there ought to be a

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-12 Thread Nick Burch
On Mon, 12 May 2008, Lukas Vlcek wrote: I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with POI and/or other [commercial] Java library for text

Re: PowerPoint Extraction

2007-09-12 Thread Nick Burch
On Wed, 12 Sep 2007, Krista Leopold wrote: I realize that I am asking a just barely Lucene question, but I am certain someone on this list knows the answer to what I am on a quest for. I want to use the HSLF portion of apache's POI to do text extraction for my index, but I am having a really

Re: Exchange/PST/Mail parsing

2007-07-02 Thread Nick Burch
On Sun, 1 Jul 2007, Grant Ingersoll wrote: Anyone have any recommendations on a decent, open (doesn't have to be Apache license, but would prefer non-GPL if possible), extractor for MS Exchange and/or PST files? There has been an offer to contribute a PST parser to Apache POI. We're hoping

Re: Indexing MS Powerpoint files with Lucene

2006-09-07 Thread Nick Burch
On Thu, 7 Sep 2006, Tomi NA wrote: On 9/7/06, Venkateshprasanna [EMAIL PROTECTED] wrote: Is there any filter available for extracting text from MS Powerpoint files and indexing them? The lucene website suggests the POI project, which, it seems does not support PPT files as of now.

Re: Word files Build vs. Buy?

2006-02-14 Thread Nick Burch
On Thu, 9 Feb 2006, Christiaan Fluit wrote: Yes, that's exactly what I'm doing. Having this in POI would benefit me a lot though, as I hardly understand the POI basics to be honest (my fault, not POI's). OK, that's now in POI (you'll need a scratchpad build from late yesterday or today, see

Re: Word files Build vs. Buy?

2006-02-09 Thread Nick Burch
On Thu, 9 Feb 2006, Christiaan Fluit wrote: My experience is that the WordDocument class crashes on about 25% of the documents, i.e. it throws some sort of Exception. I've tested POI 2.5.1-final as well as the current code in CVS, but both produce this result. I even suspect the output to be

Re: http://www.textmining.org/ is hacked

2005-11-25 Thread Nick Burch
On Thu, 24 Nov 2005, Guilherme Barile wrote: The project seems somehow abandoned Ryan (the guy behind it) has gone to work for a firm that has the full word format documentation from Microsoft, so he's no longer able to contribute to open source projects working with word documents. Also

Re: hslf ppt files

2005-08-23 Thread Nick Burch
On Tue, 23 Aug 2005, Derya Kasapoglu wrote: is there anybody who have the poi hslf classes to extract text from Power Point files. I know the classes are on the poi sites but they are not packaged in a jar! You'll need to either download it yourself from CVS and compile with ant, or grab a