AW: solr + carrot2

2007-08-01 Thread Burkamp, Christian
Hi, In my opinion the results from carrot2 clustering could be used in the same way that facet results are used. That's the way I'm planning to use them. The user of the search application can narrow the search by selecting one of the facets presented in the search result presentation. These f

AW: Highlighting in large text fields

2007-06-25 Thread Burkamp, Christian
or me to implement this right now). --Christian -Ursprüngliche Nachricht- Von: Mike Klaas [mailto:[EMAIL PROTECTED] Gesendet: Montag, 25. Juni 2007 19:34 An: solr-user@lucene.apache.org Betreff: Re: Highlighting in large text fields On 25-Jun-07, at 4:59 AM, Burkamp, Christian wrote: >

Highlighting in large text fields

2007-06-25 Thread Burkamp, Christian
Hi list, Highlighting does not work for words that are not located near the beginning of a text field. In my index the whole text is stored in a text field for highlighting purpose. This field is just stored but not indexed. The maxFieldLength was set to 10. The document content can be retriev

Problem with surrogate characters in utf-8

2007-06-14 Thread Burkamp, Christian
Hi all, I have a problem after updating to solr 1.2. I'm using the bundled jetty that comes with the latest solr release. Some of the contents that are stored in my index contain characters from the unicode private section above 0x10. (They are used by some proprietary software and the text ex

AW: SOLR Indexing/Querying

2007-05-31 Thread Burkamp, Christian
Hi there, It looks alot like using Solr's standard "WordDelimiterFilter" (see the sample schema.xml) does what you need. It splits on alphabetical to numeric boundaries and on the various kinds of intra word delimiters like "-", "_" or ".". You can decide whether the parts are put together agai

AW: Re[2]: add and delete docs at same time

2007-05-25 Thread Burkamp, Christian
Thierry, If you always start from scratch you could even reset the index completely (i.e. delete the index directory). Solr will create a new index automatically at startup. If you don't like to delete the files another approach would be to use a query that returns all documents. You do not nee

AW: UTF-8 2-byte vs 4-byte encodings

2007-05-02 Thread Burkamp, Christian
Gereon, The four bytes do not look like a valid utf-8 encoded character. 4-byte characters in utf-8 start with the binary sequence "0...". (For reference see the excellent wikipedia article on utf-8 encoding). Your problem looks like someone interpreted your valid 2-byte utf-8 encoded chara

AW: Help with Setup

2007-04-27 Thread Burkamp, Christian
Hi, You can use curl with a file if you put the "@" char in front of it's name. (Otherwise curl expects the data on the commandline). curl http://localhost:8080/solr/update --data-binary @articles.xml -Ursprüngliche Nachricht- Von: Sean Bowman [mailto:[EMAIL PROTECTED] Gesendet: Donner

AW: Avoiding caching of special filter queries

2007-04-23 Thread Burkamp, Christian
EMAIL PROTECTED] Gesendet: Freitag, 20. April 2007 22:33 An: solr-user@lucene.apache.org Betreff: Re: Avoiding caching of special filter queries On 4/20/07, Burkamp, Christian <[EMAIL PROTECTED]> wrote: > Hi Erik, > > No, what I need to do is > > &q="my funny query

AW: Avoiding caching of special filter queries

2007-04-20 Thread Burkamp, Christian
, 20. April 2007 15:43 An: solr-user@lucene.apache.org Betreff: Re: Avoiding caching of special filter queries On Apr 20, 2007, at 7:11 AM, Burkamp, Christian wrote: > I'm using filter queries to implement document level security with > solr. > The caching mechanism for filters sep

Avoiding caching of special filter queries

2007-04-20 Thread Burkamp, Christian
Hi, I'm using filter queries to implement document level security with solr. The caching mechanism for filters separate from queries comes in handy and the system performs well once all the filters for the users of the system are stored in the cache. However, I'm storing full document content in t

AW: Leading wildcards

2007-04-19 Thread Burkamp, Christian
Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser inst

AW: Index arbitrary xml-elments in only one field without copying

2007-03-14 Thread Burkamp, Christian
You can even put multiple entries into one document. The text field needs to be defined multi-valued for this to work. You can put each chunk of data to its own text field. Perhaps this approach is best suited for what you want to do? --Christian -Ursprüngliche Nachricht- Von: Er

AW: solr performance

2007-02-20 Thread Burkamp, Christian
I do agree. There's probably no need to go to the index directly. My current solr test server has more than 5M documents and a size of about 60GB. I still index at 13 docs per second and this still includes filtering of the documents. (If you have your content ready in XML format performance will

AW: highlight search keywords on html page

2007-02-19 Thread Burkamp, Christian
I was thinking about the same thing. It shouldn't be too difficult to subclass SolrRequestHandler and build a special HighlightingRequestHandler that uses the builtin highlighting utils to do the job. I wonder if it's possible to get access to the http request body inside a SolrRequestHandler su

AW: Using solr on windows

2007-02-14 Thread Burkamp, Christian
Hi, if you have a copy of curl installed this script should work as a windows replacement for post.sh. You could name it post.bat. Don't forget to adjust the hostname and port if you don't have Solr running on the local machine. --snip--- rem echo off setlocal set URL="http://localhost:

Re: performance testing practices

2007-02-05 Thread Burkamp, Christian
Hi there, I am working on some performance numbers too. This is part of my evaluation of solr. I'm planning to replace a legacy search engine and have to find out if this is possible with solr. I have loaded 1,1 million documents into solr by now. Indexing speed is not a big concern for me. I h