Re: WordDelimiterFilter and acronyms normalization

2009-11-26 Thread AHMET ARSLAN
> Is there any ready-for-use filter which performs acronyms > normalization such > as "I.N.C."->"INC"? > > I see that Lucene's StandardFilter can do this but we can't > use it as we're > using WhitespaceTokenizer instead of StandardTokenizer. > I am bad at regular expressions but if you can wri

Re: Re: Sending Tika parse result to Solr

2009-11-26 Thread Daniel Knapp
>> Hello, >> >> >> i want to send the Tika parse results of my data to my Solr-Server. >> My File-Server is not my Solr-Server, so Solr Cell is no option for me. >> >> In Lucene i can pass my Reader Object (as an result of the parsing) to a >> Lucene Document for indexing. >> >> Is this also p

Re: Retrieving large num of docs

2009-11-26 Thread Andrey Klochkov
Hi We obtain ALL documents for every query, the index size is about 50k. We use number of stored fields. Often the result set size is several thousands of docs. We performed the following things to make it faster: 1. Use EmbeddedSolrServer 2. Patch Solr to avoid unnecessary marshalling while usi

Re: Wildcard searches within phrases to use proximity

2009-11-26 Thread AHMET ARSLAN
> That'd be great. Please open an issue in Jira and attach a > patch. See > http://wiki.apache.org/solr/HowToContribute > Hi Shalin, I opened an issue (SOLR-1604) and attached a patch as well as a maven project to enable this feature without applying the patch. I couldn't consume ComplexPhraseQ

Re: Deduplication in 1.4

2009-11-26 Thread Martijn v Groningen
Two sites that use field-collapsing: 1) www.ilocal.nl 2) www.welke.nl I'm not sure what you mean with double-tripping? The sites mentioned do not have performance problems that are caused by field collapsing. Field-collapsing currently only supports quasi distributed field-collapsing (as I have de

Retrieving large num of docs

2009-11-26 Thread Raghuveer Kancherla
Hi, I am using Solr1.4 for searching through half a million documents. The problem is, I want to retrieve nearly 200 documents for each search query. The query time in Solr logs is showing 0.02 seconds and I am fairly happy with that. However Solr is taking a long time (4 to 5 secs) to return the r

Maximum number of fields allowed in a Solr document

2009-11-26 Thread Alex Wang
Hi, We are in the process of designing a Solr app where we might have millions of documents and within each of the document, we might have thousands of dynamic fields. These fields are small and only contain an integer, which needs to be retrievable and sortable. My questions is: 1. Is the

Re: Multi-Term Synonyms

2009-11-26 Thread Patrick Jungermann
Hi Brad, I was trying this, too, and there is a possibility how to get multi-term synonyms to work properly. I wrote my solution already on this list. My solution was as follows: [cite] after your hints that had partially confirmed my considerations, I had made some tests with the FieldQParser.

Re: configure solr

2009-11-26 Thread dipti khullar
Hi 1. Issue with jetty: When you start the jetty server by running start.jar, just look at the logs to verify whether jetty has started successfully or not. At times, the port you are using to start jetty(in your case 8983) could be used by some other apps, which can cause issues in start up. 2.

Re: Creating Facets

2009-11-26 Thread dipti khullar
Examples can be found out at: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr Simple configuration works with setting facet=true&facet.field=xyz Thanks Dipti On Wed, Nov 25, 2009 at 3:29 AM, Lance Norskog wrote: > There is nothing special to configu

Re: solr+jetty logging to syslog?

2009-11-26 Thread Marc Sturlese
With 1.4 -Add log4j jars to Solr -Configure de SyslogAppender with something like: log4j.appender.solrLog=org.apache.log4j.net.SyslogAppender log4j.appender.solrLog.Facility=LOCAL0 log4j.appender.solrLog.SyslogHost=127.0.0.1 log4j.appender.solrLog.layout=org.apache.log4j.PatternLayout log4j.appe

Intensive querying give odd results in search

2009-11-26 Thread jmsm
Hi, All. I have a problem regarding intensive query requesting. I'm using SolrJ client through http in the client side and Solr 1.4 and tomcat 6.0.20 on the server side. My purpose is to execute 3 different queries for each word in a list of words and get the number of results. On the client *

WordDelimiterFilter and acronyms normalization

2009-11-26 Thread Andrey Klochkov
Hi all! Is there any ready-for-use filter which performs acronyms normalization such as "I.N.C."->"INC"? I see that Lucene's StandardFilter can do this but we can't use it as we're using WhitespaceTokenizer instead of StandardTokenizer. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics

RE: schema-based Index-time field boosting

2009-11-26 Thread Ian Smith
Hi Chris, thanks for replying! OK, now I'm going to take the bait ;) I am talking about field boosting rather than document boosting, ie. I would like some fields (say eg. title) to be "louder" than others, across ALL documents. I believe you are at least partially talking about document boos

SolrException caused by illegal character

2009-11-26 Thread György Frivolt
Hi, I upgradeed to Solr 1.4 and tried to reindex the data. After few thousand of reindexed documents an exception is thrown, I did not meet this using 1.3 before. Do you have any idea what caused the problem? Thanks. SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR,

Re: param "version" and diferences in /admin/ping response

2009-11-26 Thread Nestor Oviedo
Tank you Chris... I didn't see it. I was looking for something related with the PingRequestHandler. Regards. Nestor Oviedo On Wed, Nov 25, 2009 at 7:09 PM, Chris Hostetter wrote: > : Hi everyone! > : Can anyone tell me what's the meaning of the param "version" ?? There > : isn't anything about it

Re: Deduplication in 1.4

2009-11-26 Thread Otis Gospodnetic
Hi Martijn, - Original Message > From: Martijn v Groningen > To: solr-user@lucene.apache.org > Sent: Thu, November 26, 2009 3:19:40 AM > Subject: Re: Deduplication in 1.4 > > Field collapsing has been used by many in their production > environment. Got any pointers to public sites

Re: Looking for Best Practices: Analyzers vs. UpdateRequestProcessors?

2009-11-26 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 9:52 PM, Andreas Kahl wrote: > Hello, > > are there any general criteria when to use Analyzers to implement an > indexing function and when it is better to use UpdateRequestProcessors? > > The main difference I found in the documentation was that > UpdateRequestProcessors

Re: Fulltext crawler

2009-11-26 Thread Shalin Shekhar Mangar
On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz wrote: > *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice > and Ms Office documets,PDF and muchmore formates. > How indexed you the Data? > > Maby you can help me to find a Crawler. > If you need a web crawler, look at Nutch. Ot

Lock on old index files

2009-11-26 Thread Branca Marco
Hi everybody, I'm experiencing a problem with my Solr-based web application running on a Sun Solaris OS. It seems that the application still holds file-descriptors to index files even if these last ones are removed. It can be observed mainly when the snapinstaller script is executed, but we can

Re: Fulltext crawler

2009-11-26 Thread Christian Weyand
As far as i know "Nutch" will satisfy your needs, altough i didn't test it myself yet.. *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice and Ms Office documets,PDF and muchmore formates. How indexed you the Data? Maby you can help me to find a Crawler. King -- ek

Fulltext crawler

2009-11-26 Thread Jörg Agatz
*Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice and Ms Office documets,PDF and muchmore formates. How indexed you the Data? Maby you can help me to find a Crawler. King

Re: Deduplication in 1.4

2009-11-26 Thread Martijn v Groningen
Field collapsing has been used by many in their production environment. The last few months the stability of the patch grew as quiet some bugs were fixed. The only big feature missing currently is caching of the collapsing algorithm. I'm currently working on that and I will put it in a new patch in