: I'm just seeing if there's an easy/performant way of doing it with Solr.
: For a solution with raw Lucene, creating a new index with the same
: directory cleared out an old index (even on Windows with it's file
: locking) quickly.
there has been talk of optimizing delete by query in the case of
: chance to look into it deeper. What I have noticed is when there are
: Searchers registered commits take a lot longer time. Perhaps looking at
that's probably the warming time taken to reopen the new searcher ...
waitSearcher="false" should cause those commits to reutrn much faster (the
down
Recently someone mentioned that it would be possible to have a 'replace
existing document' feature rather than just dropping and adding documents
with the same unique id. We have a few use cases in this area and I'm
researching whether it is effective to check for a document via Solr
queries, or wh
On 21/08/07, Pierre-Yves LANDRON <[EMAIL PROTECTED]> wrote:
>
> It seems the highlights fields must be specified, and that I can't use the
> * completion to do so.
> Am I true ? Is there a way to go throught this obligation ?
As far as I know, dynamic fields are used mainly at during indexing and
It might be worthwhile to have a "hibernate" mode for solr, where
waits until all requests are finished, then closes all files and rejects all
new requests. Later a command would bring it back online. During
this time, a remotely controlled job could remove the data directory. This
"hibernate" mo
Trying the query approach with a 3GB indexing takes over a minute to
clear the index.
The reason why to not stop the servlet container and delete the files
manually is that in a particular environment the person testing may not
have access to the filesystem directly. Usually you want to do
perfor
Christian,
This is interesting. I have been always thinking that Solr shouldn't
be in the business of parsing; it's responsibility of the Solr client.
But
what Peter suggested, adding a parsing capability to the Solr
as a request handler does make sense.
One thing that I noticed this approach ca
I cant find the documentation, but I believe apache's max url is 8192,
so I would assume a lot of other apps like tomcat and jetty would be
similar. I havn't run into any problems yet.
Maybe shoot Eric an email and see if he would be interested in
adapting the code to take XML as well so that you
On 8/21/07, Vish D. <[EMAIL PROTECTED]> wrote:
>
> On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote:
> >
> > I am a little confused how you have things setup, so these meta data
> > files contain certain information and there may or may not be a pdf,
> > xls, doc that it is associated with?
>
>
>
On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote:
>
> I am a little confused how you have things setup, so these meta data
> files contain certain information and there may or may not be a pdf,
> xls, doc that it is associated with?
Yes, you have it right.
If that is the case, if it were me I w
: I'm wondering what's the best way to completely change a big index
: without loosing any requests.
use the snapinstaller script -- or adopt the same atomic copying approach
it uses.
: - Between the two mv's, the directory dir does not exists, which can
: cause some solr failure.
this shoul
: Is there a way to use a MMapDirectory instead of FSDirectory within Solr ?
i'm not very familiar with MMapDirectory but according to the javadocs...
To use this, invoke Java with the System property
org.apache.lucene.FSDirectory.class set to
org.apache.lucene.store.MM
: How long should a take? I've got about 9.8G of data for 9M of
: records. (Yes, I'm indexing too much data.) My commits are taking 20-30
the low levels of updating aren't my forte, but as i recall the dominant
factor in how long it takes to execute a commit is the number of deleted
documents (i
: The conclusion is that setting URIEncoding="UTF-8" in the
: section in server.xml is not enough
:
: I also needed to add -Dfile.encoding=UTF-8 to the tomcatâs java
: startup options (in catalina.bat)
seeing how you resolved this problem, has got me thinking ... how did you
index the CSV file
I've seen even longer commit times with our 2GB index and have not had a
chance to look into it deeper. What I have noticed is when there are
Searchers registered commits take a lot longer time. Perhaps looking at
the optional attributes for commit (waitSearcher, waitFlush) would help.
Since we
I guess the first question is why you have to swap in a big index, instead of
rsyc'ng or another method. I've entertained the idea of putting a load
balancer in front of two solr instances. In this scenario take one off-line
swap in the index, bring it back on and then bring down the other. N
Hi all,
I'm wondering what's the best way to completely change a big index
without loosing any requests.
That's how I do at the moment:
solr index is a soft link to a directory dir.
When I want to install a new index (in dir.new), I do a
mv dir dir.old ; mv dir.new dir
Then I ask for a relo
I am a little confused how you have things setup, so these meta data
files contain certain information and there may or may not be a pdf,
xls, doc that it is associated with?
If that is the case, if it were me I would write something to parse
the meta data files, and if there is a binary file asso
Hi !
Is there a way to use a MMapDirectory instead of FSDirectory within Solr ?
Our index is quite big and It takes a long time to go up in the OS
cached memory. I'm wondering if an MMapDirectory could help to have
our data in memory quicker (our index on disk is bigger than our
memory availabl
Pete,
Thanks for the great explanation.
Thinking it through my process, I am not sure how to use it:
I have a bunch of docs that pretty much contain a lot of meta-data, some
which include full-text files (.pdf, .ppt, etc...). I use these docs
correctly to index/update into Solr. The next step no
Installing the patch requires downloading the latest solr via
subversion and applying the patch to the source. Eric has updated his
patch with various revisions of subversion. To make sure it will
compile I suggest getting the revision he lists.
As for using the features of this patch. This is
> >
> > I am deeply hurt by your distrust.
> >
> > :-)
>
> Shame on me :-$
haha :-)
>
There seems to be some code out for Tika now (not packaged/announced yet,
but...). Could someone please take a look at it and see if that could fit
in? I am eagerly waiting for a reply back from tika-dev, but no luck yet.
http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apach
On Tue, 2007-08-21 at 11:52 +0200, Ard Schrijvers wrote:
> > > you're missing the key piece that Ard alluded to ... the
> > there is one
> > > ordere list of all terms stored in the index ... a TermEnum lets you
> > > iterate over this ordered list, and the
> > IndexReader.terms(Term) method
> >
For other Solr instances (whether embedded or not) to refresh their
index searchers, send a message to them.
Erik
On Aug 21, 2007, at 7:33 AM, sinking wrote:
Hello,
I have tried to use the EmbeddedSolr (http://wiki.apache.org/solr/
EmbeddedSolr) because i want to work directly with
Christian,
Eric Pugh created implemented this functionality for a project we were
doing and has released to code on JIRA. We have had very good results
with it. If I can be of any help using it beyond the Java code itself
let me know. The last revision I used with it was 552853, so if the
build
Hello,
I have tried to use the EmbeddedSolr
(http://wiki.apache.org/solr/EmbeddedSolr) because i want to work
directly with the document.
When i index a document (using the embeddedsolr) and calls commit(),
when searching with embeddedsolr it works perfectly (apparently reloads
the searchers e
> > you're missing the key piece that Ard alluded to ... the
> there is one
> > ordere list of all terms stored in the index ... a TermEnum lets you
> > iterate over this ordered list, and the
> IndexReader.terms(Term) method
> > lets you efficiently start at an arbitrary term. if you are only
My problem is resolved:
The problem happened on tomcat running on win xp
When indexing utf-encoded csv files
The conclusion is that setting URIEncoding="UTF-8" in the section
in server.xml is not enough
I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java startup options
(in ca
On Mon, 2007-08-20 at 11:41 -0700, Chris Hostetter wrote:
> : > TermEnum terms = searcher.getReader().terms(new Term(field, ""));
> : > while (terms.term() != null && terms.term().field() == field){
> : > //do things
> : > terms.next();
> : > }
>
> : while( te.next() )
Hi Solr Users,
i have set up a Solr-Server with a custom Schema.
Now i have updated the index with some content form
xml-files.
Now i try to update the contents of a folder.
The folder consits of various document-types
(pdf,doc,xls,...).
Is there anywhere an howto how can i parse the
documents,
31 matches
Mail list logo