RE: clear index

2007-08-21 Thread Chris Hostetter
: I'm just seeing if there's an easy/performant way of doing it with Solr. : For a solution with raw Lucene, creating a new index with the same : directory cleared out an old index (even on Windows with it's file : locking) quickly. there has been talk of optimizing delete by query in the case of

RE: Commit performance

2007-08-21 Thread Chris Hostetter
: chance to look into it deeper. What I have noticed is when there are : Searchers registered commits take a lot longer time. Perhaps looking at that's probably the warming time taken to reopen the new searcher ... waitSearcher="false" should cause those commits to reutrn much faster (the down

Replacing existing documents

2007-08-21 Thread Lance Norskog
Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. We have a few use cases in this area and I'm researching whether it is effective to check for a document via Solr queries, or wh

Re: Structured Lucene documents

2007-08-21 Thread Pieter Berkel
On 21/08/07, Pierre-Yves LANDRON <[EMAIL PROTECTED]> wrote: > > It seems the highlights fields must be specified, and that I can't use the > * completion to do so. > Am I true ? Is there a way to go throught this obligation ? As far as I know, dynamic fields are used mainly at during indexing and

RE: clear index

2007-08-21 Thread Lance Norskog
It might be worthwhile to have a "hibernate" mode for solr, where waits until all requests are finished, then closes all files and rejects all new requests. Later a command would bring it back online. During this time, a remotely controlled job could remove the data directory. This "hibernate" mo

RE: clear index

2007-08-21 Thread Sundling, Paul
Trying the query approach with a 3GB indexing takes over a minute to clear the index. The reason why to not stop the servlet container and delete the files manually is that in a particular environment the person testing may not have access to the filesystem directly. Usually you want to do perfor

RE: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Teruhiko Kurosaka
Christian, This is interesting. I have been always thinking that Solr shouldn't be in the business of parsing; it's responsibility of the Solr client. But what Peter suggested, adding a parsing capability to the Solr as a request handler does make sense. One thing that I noticed this approach ca

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
I cant find the documentation, but I believe apache's max url is 8192, so I would assume a lot of other apps like tomcat and jetty would be similar. I havn't run into any problems yet. Maybe shoot Eric an email and see if he would be interested in adapting the code to take XML as well so that you

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Vish D.
On 8/21/07, Vish D. <[EMAIL PROTECTED]> wrote: > > On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > > > I am a little confused how you have things setup, so these meta data > > files contain certain information and there may or may not be a pdf, > > xls, doc that it is associated with? > > >

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Vish D.
On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > I am a little confused how you have things setup, so these meta data > files contain certain information and there may or may not be a pdf, > xls, doc that it is associated with? Yes, you have it right. If that is the case, if it were me I w

Re: Index HotSwap

2007-08-21 Thread Chris Hostetter
: I'm wondering what's the best way to completely change a big index : without loosing any requests. use the snapinstaller script -- or adopt the same atomic copying approach it uses. : - Between the two mv's, the directory dir does not exists, which can : cause some solr failure. this shoul

Re: Using MMapDirectory instead of FSDirectory

2007-08-21 Thread Chris Hostetter
: Is there a way to use a MMapDirectory instead of FSDirectory within Solr ? i'm not very familiar with MMapDirectory but according to the javadocs... To use this, invoke Java with the System property org.apache.lucene.FSDirectory.class set to org.apache.lucene.store.MM

Re: Commit performance

2007-08-21 Thread Chris Hostetter
: How long should a take? I've got about 9.8G of data for 9M of : records. (Yes, I'm indexing too much data.) My commits are taking 20-30 the low levels of updating aren't my forte, but as i recall the dominant factor in how long it takes to execute a commit is the number of deleted documents (i

Re: solved: quering UTF-8 encoded CSV files

2007-08-21 Thread Chris Hostetter
: The conclusion is that setting URIEncoding="UTF-8" in the : section in server.xml is not enough : : I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java : startup options (in catalina.bat) seeing how you resolved this problem, has got me thinking ... how did you index the CSV file

RE: Commit performance

2007-08-21 Thread Gunther, Andrew
I've seen even longer commit times with our 2GB index and have not had a chance to look into it deeper. What I have noticed is when there are Searchers registered commits take a lot longer time. Perhaps looking at the optional attributes for commit (waitSearcher, waitFlush) would help. Since we

RE: Index HotSwap

2007-08-21 Thread Gunther, Andrew
I guess the first question is why you have to swap in a big index, instead of rsyc'ng or another method. I've entertained the idea of putting a load balancer in front of two solr instances. In this scenario take one off-line swap in the index, bring it back on and then bring down the other. N

Index HotSwap

2007-08-21 Thread Jérôme Etévé
Hi all, I'm wondering what's the best way to completely change a big index without loosing any requests. That's how I do at the moment: solr index is a soft link to a directory dir. When I want to install a new index (in dir.new), I do a mv dir dir.old ; mv dir.new dir Then I ask for a relo

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
I am a little confused how you have things setup, so these meta data files contain certain information and there may or may not be a pdf, xls, doc that it is associated with? If that is the case, if it were me I would write something to parse the meta data files, and if there is a binary file asso

Using MMapDirectory instead of FSDirectory

2007-08-21 Thread Jérôme Etévé
Hi ! Is there a way to use a MMapDirectory instead of FSDirectory within Solr ? Our index is quite big and It takes a long time to go up in the OS cached memory. I'm wondering if an MMapDirectory could help to have our data in memory quicker (our index on disk is bigger than our memory availabl

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Vish D.
Pete, Thanks for the great explanation. Thinking it through my process, I am not sure how to use it: I have a bunch of docs that pretty much contain a lot of meta-data, some which include full-text files (.pdf, .ppt, etc...). I use these docs correctly to index/update into Solr. The next step no

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
Installing the patch requires downloading the latest solr via subversion and applying the patch to the source. Eric has updated his patch with various revisions of subversion. To make sure it will compile I suggest getting the revision he lists. As for using the features of this patch. This is

RE: How to read values of a field efficiently

2007-08-21 Thread Ard Schrijvers
> > > > I am deeply hurt by your distrust. > > > > :-) > > Shame on me :-$ haha :-) >

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Vish D.
There seems to be some code out for Tika now (not packaged/announced yet, but...). Could someone please take a look at it and see if that could fit in? I am eagerly waiting for a reply back from tika-dev, but no luck yet. http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apach

RE: How to read values of a field efficiently

2007-08-21 Thread Martin Grotzke
On Tue, 2007-08-21 at 11:52 +0200, Ard Schrijvers wrote: > > > you're missing the key piece that Ard alluded to ... the > > there is one > > > ordere list of all terms stored in the index ... a TermEnum lets you > > > iterate over this ordered list, and the > > IndexReader.terms(Term) method > >

Re: Embedded solr - reload searcher

2007-08-21 Thread Erik Hatcher
For other Solr instances (whether embedded or not) to refresh their index searchers, send a message to them. Erik On Aug 21, 2007, at 7:33 AM, sinking wrote: Hello, I have tried to use the EmbeddedSolr (http://wiki.apache.org/solr/ EmbeddedSolr) because i want to work directly with

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
Christian, Eric Pugh created implemented this functionality for a project we were doing and has released to code on JIRA. We have had very good results with it. If I can be of any help using it beyond the Java code itself let me know. The last revision I used with it was 552853, so if the build

Embedded solr - reload searcher

2007-08-21 Thread sinking
Hello, I have tried to use the EmbeddedSolr (http://wiki.apache.org/solr/EmbeddedSolr) because i want to work directly with the document. When i index a document (using the embeddedsolr) and calls commit(), when searching with embeddedsolr it works perfectly (apparently reloads the searchers e

RE: How to read values of a field efficiently

2007-08-21 Thread Ard Schrijvers
> > you're missing the key piece that Ard alluded to ... the > there is one > > ordere list of all terms stored in the index ... a TermEnum lets you > > iterate over this ordered list, and the > IndexReader.terms(Term) method > > lets you efficiently start at an arbitrary term. if you are only

solved: quering UTF-8 encoded CSV files

2007-08-21 Thread Ben Shlomo, Yatir
My problem is resolved: The problem happened on tomcat running on win xp When indexing utf-encoded csv files The conclusion is that setting URIEncoding="UTF-8" in the section in server.xml is not enough I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java startup options (in ca

RE: How to read values of a field efficiently

2007-08-21 Thread Martin Grotzke
On Mon, 2007-08-20 at 11:41 -0700, Chris Hostetter wrote: > : > TermEnum terms = searcher.getReader().terms(new Term(field, "")); > : > while (terms.term() != null && terms.term().field() == field){ > : > //do things > : > terms.next(); > : > } > > : while( te.next() )

Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Christian Klinger
Hi Solr Users, i have set up a Solr-Server with a custom Schema. Now i have updated the index with some content form xml-files. Now i try to update the contents of a folder. The folder consits of various document-types (pdf,doc,xls,...). Is there anywhere an howto how can i parse the documents,