lucene and ejb applications

2004-08-20 Thread Rupinder Singh Mazara
hi all purely due to a policy decision, we would like to host our lucene search application , in a j2ee container, preferable by means of a ejb. Since access to java.io is restricted by the ejb specification, what would be the best way to create desgin the application ? i have taken a look at

Re: lucene and ejb applications

2004-08-20 Thread Erik Hatcher
What would be the best way? Use Lucene outside of EJB. It's quite silly to make such a decision "purely due to a policy decision" when the technicalities of it show that it is an unwise decision. You're going to navigate Hits through a session bean? And as you said, the EJB spec says not to

RE: lucene and ejb applications

2004-08-20 Thread Rupinder Singh Mazara
hi erik thanks for the warning and the code. Let me re-phrase the question, i have a index generated by lucene, i need to have the search capabilty to have a high availabilty. What solutions would be the most optimal Currentlly i have two senarions in mind a) setup a RMI based app. that o

pdf search

2004-08-20 Thread Santosh
Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hy

Fw: pdf search

2004-08-20 Thread Santosh
How can I search through PDF? - Original Message - From: Santosh To: Lucene Users List Sent: Friday, August 20, 2004 5:59 PM Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is au

Re: Fw: pdf search

2004-08-20 Thread Ben Litchfield
In order to search through a PDF document the text must be extracted from the PDF document. There are several libraries to do that, including http://www.pdfbox.org After you have the text from the PDF document you just add it to the lucene index like any other text document. You should go thr

RE: pdf search

2004-08-20 Thread David Townsend
Hi Santosh, Lucene doesn't search pdfs per se. To make anything searchable you have to first extract the content and then put it in lucene in a form it understands (i.e document objects). So in order to search your pdfs you first need to extract the info from the PDFs using something like PDF

pdfboxhelp

2004-08-20 Thread Santosh
hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" -

Re: pdfboxhelp

2004-08-20 Thread Don Vaillancourt
What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar

RE: pdf search

2004-08-20 Thread Karthik N S
hi What is that u intend to Search and What is this own 'search words' First Explain properly u'r requirement to the form to get intented results. with regards Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 5:59 PM To: Lucene Users L

RE: lucene and ejb applications

2004-08-20 Thread Otis Gospodnetic
Option b) sounds simpler and sufficient to me. I don't see why you would need to involve RMI for something as simple as this. I use something similar to your b) option for some indices behind http://www.simpy.com/ . I don't store IndexSearcher in the servlet context, though - I just have some lo

Re: Debian build problem with 1.4.1

2004-08-20 Thread Otis Gospodnetic
Hello Jeff, I don't have Debian to try this out, and this is going to be a stupid question and suggestion, but where/how is the CLASSPATH set? Are any of those commands actually using Lucene's build.xml? I'm asking, because it looks like your compiler is not finding Reader and IOException classe

Re: pdf search

2004-08-20 Thread Santosh
hi karthik, I have a website with some items, each contain html and pdf documents , I have to store keywords against each item, whenever a user enters any search word if it matches with any one of the existing keyword list then it should show the link to particular Item. - Original Message

Re: pdfboxhelp

2004-08-20 Thread Santosh
exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have

Re: Lucene Search Applet

2004-08-20 Thread Simon mcIlwaine
Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDef

Re: pdfboxhelp

2004-08-20 Thread Don Vaillancourt
Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf");             // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile);     Santosh wrot

Re: pdfboxhelp

2004-08-20 Thread Santosh
- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf");

continuous index updates

2004-08-20 Thread Crump, Michael
Hello, I am currently working on a server app that will require the ability to make index additions/deletions at any time. I want to cache/reuse index searchers and readers. I know that once an index has changed only newly opened readers will see the changes. Creating a new reader to see the

Re: pdfboxhelp

2004-08-20 Thread Don Vaillancourt
Did I leave you speechless!?  :-) Santosh wrote: - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; Fi

Re: pdfboxhelp

2004-08-20 Thread Santosh
Iam sorry, mail has been sent accidentally - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 8:02 PM Subject: Re: pdfboxhelp Did I leave you speechless!? :-) Santosh wrote: - Original Message - From: Don Vailla

Indexing and Searching Database Values in Lucene Search Engine

2004-08-20 Thread sivalingam T
How to index and search database values using Lucene Search Engine? By T.Sivalingam. Sivalingam T

Indexing and Searching Database in Lucene

2004-08-20 Thread sivalingam T
Hi Can we index and search database in Lucene Search Engine? if anybody have please send reply. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92

RE: Indexing and Searching Database in Lucene

2004-08-20 Thread Aviran
You need to create a lucene index from the database. Just index the columns and the records from the database. It will be useful to have also a field in lucene that contains the database's primary key, so you can retrieve the actual record from the database Aviran -Original Message- From

Re: Debian build problem with 1.4.1

2004-08-20 Thread Jeff Breidenbach
Hi Otis, >I'm asking, because it looks like your compiler is not finding Reader >and IOException classes, both of which are in java.io.* package, which >I see imported in StandardTokenizer.java as 'import java.io.*;'. In my copy of StandardTokenizer.java, there is no 'import java.io.*;' (and i

Re: Indexing and Searching Database in Lucene

2004-08-20 Thread Don Vaillancourt
Funy thing is I was thinking of doing something like this just today.  This is especially good when you perform a lot of queries using the LIKE statement.  Lucene would increase search performance a great deal. Aviran wrote: You need to create a lucene index from the database. Just index t

Re: lucene and ejb applications

2004-08-20 Thread Erik Hatcher
On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote: hi erik thanks for the warning and the code. Let me re-phrase the question, i have a index generated by lucene, i need to have the search capabilty to have a high availabilty. What solutions would be the most optimal I'm guessing from y

Re: Debian build problem with 1.4.1

2004-08-20 Thread Erik Hatcher
On Aug 20, 2004, at 11:12 AM, Jeff Breidenbach wrote: Hi Otis, I'm asking, because it looks like your compiler is not finding Reader and IOException classes, both of which are in java.io.* package, which I see imported in StandardTokenizer.java as 'import java.io.*;'. In my copy of StandardTokeniz

Re: lucene and ejb applications

2004-08-20 Thread Praveen Peddi
Infact we do the same exact thing. Session bean method called search() delegates to a POJO SearchService. We lazy load the IndexSearch cache it in memory and invalidate that object when someone else modifies the index. This trick works wonderfually for us. The search has become faster after caching

Re: Debian build problem with 1.4.1

2004-08-20 Thread Jeff Breidenbach
>I don't understand this. StandardTokenizer.java hasn't changed since >last year. I have packaged Lucene such that 'ant javacc' is called at package build time. I now see the problem - 'import java.io.*;' has been removed from StandardTokenizer.jj in Lucene 1.4.1. When I put that line back in

memory leek in lucene?

2004-08-20 Thread iouli . golovatyi
Doing query against lucene I run into memomry problem, i.e. it's look like it's not giving memory back after the query have been executed. I use ParallelMultiSearcher ant call close method after results are displayed. hits=null; // Hits class if (ms!=null) ms.close(); //ParallelMultiSearch

Re: Debian build problem with 1.4.1

2004-08-20 Thread Jeff Breidenbach
Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have enough time to percolate before the sarge release. Now that that is taken care of, I'm curious about the status of gcj compilation. Packaging Lucene as a native library might be useful for projects such as PyLucene, and it is al

Re: Debian build problem with 1.4.1

2004-08-20 Thread Erik Hatcher
On Aug 20, 2004, at 12:36 PM, Jeff Breidenbach wrote: I don't understand this. StandardTokenizer.java hasn't changed since last year. I have packaged Lucene such that 'ant javacc' is called at package build time. I now see the problem - 'import java.io.*;' has been removed from StandardTokenizer.

Re: continuous index updates

2004-08-20 Thread Otis Gospodnetic
I just create a new IndexSearcher, leave the old IndexSearcher alone, and JVM's garbage collection cleans it up. Otis --- "Crump, Michael" <[EMAIL PROTECTED]> wrote: > Hello, > > > > I am currently working on a server app that will require the ability > to > make index additions/deletions at

RE: continuous index updates

2004-08-20 Thread Crump, Michael
So the finalizer on the underlying reader closes file handles? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 2:41 PM To: Lucene Users List Subject: Re: continuous index updates I just create a new IndexSearcher, leave the old IndexSearc

Re: NegativeArraySizeException when creating a new IndexSearcher

2004-08-20 Thread Doug Cutting
Looks to me like you're using an older version of Lucene on your Linux box. The code is back-compatible, it will read old indexes, but Lucene 1.3 cannot read indexes created by Lucene 1.4, and will fail in the way you describe. Doug Sven wrote: Hi! I have a problem to port a Lucene based knowl

Lucene with English and Spanish Best Practice?

2004-08-20 Thread Chad Small
Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to fo

Lucene with English and Spanish Best Practice?

2004-08-20 Thread Chad Small
Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to fo

Re: Debian build problem with 1.4.1

2004-08-20 Thread Doug Cutting
I can successfully use gcc 3.4.0 with Lucene as follows: ant jar jar-demo gcj -O3 build/lucene-1.5-rc1-dev.jar build/lucene-demos-1.5-rc1-dev.jar -o indexer --main=org.apache.lucene.demo.IndexHTML ./indexer -create docs It runs pretty snappy too! However I don't know if there's much milage in p

speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley
Hi, I'm trying to figure out how to speed up queries to a large index. I'm currently getting 133 req/sec, which isn't bad, but isn't too close to MySQL, which is getting 500 req/sec on the same hardware with the same set of documents. Setup info & Stats: - 4.3M documents, 12 keyword fields per do

Custom filter

2004-08-20 Thread roy-lucene-user
Hi guys! I was hoping someone here could help me out with a custom filter. We have an index of emails and do some searches on the text of an email message and also searches based on the email addresses in a To, From or CC. Since we also do searches on a bunch of emails, we created a custom filt

Re: Custom filter

2004-08-20 Thread Erik Hatcher
Have you considered using the built-in QueryFilter for this? Why isn't it sufficient for your needs? Erik On Aug 20, 2004, at 6:32 PM, [EMAIL PROTECTED] wrote: Hi guys! I was hoping someone here could help me out with a custom filter. We have an index of emails and do some searches on t

Re: Debian build problem with 1.4.1

2004-08-20 Thread Jeff Breidenbach
>It's easy enough for folks to compile Lucene this way I'm having trouble, warnings and error messages appended. This is for Lucene 1.4.1. One of the few Debian specific changes was to call the jarball 1.4 instead of the default 1.5-rc1-dev designation in build.xml. rode:~> gcj --version gcj (G

Re: Custom filter

2004-08-20 Thread roy-lucene-user
We're currently in lucene 1.2... haven't moved to 1.3 yet. Roy. On Fri, 20 Aug 2004 18:46:29 -0400, Erik Hatcher wrote > Have you considered using the built-in QueryFilter for this? Why > isn't it sufficient for your needs?

RE: memory leek in lucene?

2004-08-20 Thread Terence Lai
Are you calling ParallelMultiSearcher.search(Query query, Sort sort) to do your search? If so, I am currently having a similar problem. Terence > > Doing query against lucene I run into memomry problem, i.e. it's look like > it's not giving memory back after the > query have been executed. >

Re: Custom filter

2004-08-20 Thread Erik Hatcher
On Aug 20, 2004, at 6:48 PM, [EMAIL PROTECTED] wrote: We're currently in lucene 1.2... haven't moved to 1.3 yet. Skip 1.3 and go straight to 1.4.1 :) Upgrade - why not? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For a

Re: speeding up queries (MySQL faster)

2004-08-20 Thread Otis Gospodnetic
The bottleneck seems to be disk IO. Since this is a read-only index, why not spread some of the frequently scanned index files over multiple disks, or put the index on SCSI disks hooked up in a RAID. Maybe this is already the case, but you didn't mention in. Oh, I already answered a similar quest

Re: Lucene Search Applet

2004-08-20 Thread Jon Schuster
I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using differen

Re: speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley
--- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > The bottleneck seems to be disk IO. But it's not. Linux is caching the whole file, and there really isn't any disk activity at all. Most of the threads are blocked on InputStream.refill, not waiting for the disk, but waiting for their turn into

Re: speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley
--- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > The bottleneck seems to be disk IO. But it's not. Linux is caching the whole file, and there really isn't any disk activity at all. Most of the threads are blocked on InputStream.refill, not waiting for the disk, but waiting for their turn into