Re: indexing Chienese langage

2009-06-04 Thread Fer-Bj
We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert)

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. On Thu, Jun 4, 2009 at 3:22 AM, Robert Purdy rdpu...@gmail.com wrote: Hey there, Anyone got any advice on

Re: Token filter on multivalue field

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
isn't better to use an UpdateProcessor for this? On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, It's ugly, but the first thing that came to mind was ThreadLocal.  Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

RE: Strange behaviour with copyField

2009-06-04 Thread Radha C.
What is the defaultOperator set in your solrconfig.xml? Are you sure that it matches for au and not author? -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, June 04, 2009 2:53 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour with

Re: Field Compression

2009-06-04 Thread Fer-Bj
Is it correct to assume that using field compression will cause performance issues if we decide to allow search over this field? ie: field name=id type=sint indexed=true stored=true required=true / field name=title type=textindexed=true stored=true omitNorms=true/

Re: spell checking

2009-06-04 Thread Michael Ludwig
Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the

HashDocSet's maxSize and loadFactor

2009-06-04 Thread Marc Sturlese
Hey there, I am trying to optimize the setup of hasDocSet. Have read the documentation here: http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab But can't exactly understand it. Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or

Re: indexing Chienese langage

2009-06-04 Thread Erick Erickson
Hmmm, are you quite sure that you emptied the index first and didn'tjust add all the documents a second time to the index? Also, when you say the index almost doubled, were you looking only at the size of the *directory*? SOLR might have been holding a copy of the old index open while you built a

Re: Field Compression

2009-06-04 Thread Erick Erickson
Warning: This is from a Lucene perspective I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to *storing* the data, not putting the tokens in the index (this latter is what's serached)... It *will* cause performance issues if you load that field for a large number of

SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Michael Ludwig
Shalin Shekhar Mangar wrote: | If you use spellcheck.q parameter for specifying | the spelling query, then the field's analyzer will | be used [...] If you use the q parameter, then the | SpellingQueryConverter is used. http://markmail.org/message/k35r7qmpatjvllsc - message

Re: spell checking

2009-06-04 Thread Walter Underwood
query suggest --wunder On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote: Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Yonik Seeley
2009/6/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com: FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. Gets are cheaper, but evictions are more

Re: Questions regarding IT search solution

2009-06-04 Thread Walter Underwood
Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi,

Faceting on text fields

2009-06-04 Thread Yao Ge
I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters). I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each

statistics about word distances in solr

2009-06-04 Thread Jens Fischer
Hi, I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. At present I return the most frequent terms using facetSearch which returns for each word in the result documents the number ob occurences

Re: Field Compression

2009-06-04 Thread Grant Ingersoll
On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even

Re: SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Shalin Shekhar Mangar
On Thu, Jun 4, 2009 at 7:24 PM, Michael Ludwig m...@as-guides.com wrote: Shalin Shekhar Mangar wrote: | If you use spellcheck.q parameter for specifying | the spelling query, then the field's analyzer will | be used [...] If you use the q parameter, then the | SpellingQueryConverter is

Re: Faceting on text fields

2009-06-04 Thread Yonik Seeley
Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. -Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote: I am index a database with over 1 millions rows. Two of fields contain unstructured text

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Robert Purdy
Thanks for the Good information :) Well I haven't had any evictions in any of the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So in your opinion should the documentCache and queryResultCache use

Re: HashDocSet's maxSize and loadFactor

2009-06-04 Thread Yonik Seeley
On Thu, Jun 4, 2009 at 7:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I am trying to optimize the setup of hasDocSet. Be aware that in the latest versions of Solr 1.4, HashDocSet is no longer used by Solr. https://issues.apache.org/jira/browse/SOLR-1169 Have read the

Index Comma Separated numbers

2009-06-04 Thread Jianbin Dai
Hi, One of the fields to be indexed is price which is comma separated, e.g., 12,034.00. How can I indexed it as a number? I am using DIH to pull the data. Thanks.

Re: Faceting on text fields

2009-06-04 Thread Yao Ge
Yes. I am using 1.3. When is 1.4 due for release? Yonik Seeley-2 wrote: Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. -Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote: I am

How to disable posting updates from a remote server

2009-06-04 Thread ashokc
Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is

Re: How to disable posting updates from a remote server

2009-06-04 Thread Eric Pugh
Take a look at the security section in the wiki, u could do this with firewall rules or password access. On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote: Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So

Re: Customizing results

2009-06-04 Thread Otis Gospodnetic
Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has

Re: Questions regarding IT search solution

2009-06-04 Thread Alexandre Rafalovitch
I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the

Re: Is there Downside to a huge synonyms file?

2009-06-04 Thread Yonik Seeley
On Tue, Jun 2, 2009 at 11:28 PM, anuvenk anuvenkat...@hotmail.com wrote: I'm using query time synonyms. These don't currently work if the synonyms expand to more than one option, and those options have a different number of words. -Yonik http://www.lucidimagination.com

Re: indexing Chienese langage

2009-06-04 Thread Otis Gospodnetic
I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Fer-Bj fernando.b...@gmail.com To:

Re: Customizing results

2009-06-04 Thread Otis Gospodnetic
Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may

Re: Questions regarding IT search solution

2009-06-04 Thread Otis Gospodnetic
My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your

Determining Search Query Category

2009-06-04 Thread ram_sj
Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion

Re: how to do exact serch with solrj

2009-06-04 Thread Jianbin Dai
I still have a problem with exact matching. query.setQuery(title:\hello the world\); This will return all docs with title containing hello the world, i.e., hello the world, Jack will also be matched. What I want is exactly hello the world. Setting this field to string instead of text doesn't

Re: indexing Chienese langage

2009-06-04 Thread Fer-Bj
What we usually do to reindex is: 1. stop solr 2. rmdir -r data (that is to remove everything in /opt/solr/data/ 3. mkdir data 4. start solr 5. start reindex. with this we're sure about not having old copies or index.. To check the index size we do: cd data du -sh Otis Gospodnetic

Re: Field Compression

2009-06-04 Thread Fer-Bj
Here is what we have: for all the documents we have a field called small_body , which is a 60 chars max text field that were we store the abstract for each article. We have about 8,000,000 documents indexed, and usually we display this small_body on our listing pages. For each listing page we

Re: Questions regarding IT search solution

2009-06-04 Thread silentsurfer77
Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be

Re: Questions regarding IT search solution

2009-06-04 Thread Jeff Hammerbacher
Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration. Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM,

using UpdateRequestProcessor from a custom analyzer

2009-06-04 Thread Kir4
Is it possible to create a custom analyzer (index time) that uses UpdateRequestProcessor to add new fields to posts, based on the tokens generated by the other analyzers that have been run (before my custom analyzer)? The content of said fields must differ from post to post based on the tokens

Re: Determining Search Query Category

2009-06-04 Thread Otis Gospodnetic
Ram, Typical queries are short, so they are hard to categorize using statistical approaches. Maybe categorization of queries would work with a custom set of rules applied to queries? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From:

Re: how to do exact serch with solrj

2009-06-04 Thread Otis Gospodnetic
I don't think there is anything ready to be used in Solr (but would be easy to add), but if you indexed your with a custom beginning of string and end of string anchors, you'll be able to get your exact matching working. For example, convert hello the world to $hello the world$ before indexing

Re: how to do exact serch with solrj

2009-06-04 Thread Otis Gospodnetic
I re-read your original request. Here is the recipe that should work: * Define new field type that: Uses KeywordTokenizer Uses LowerCaseFilter * Make your field be of the above type. * Use those begin/end anchor characters at index and search time. I believe that should work. Please

Re: indexing Chienese langage

2009-06-04 Thread James liu
first: u not have to restart solr,,,u can use new data to replace old data and call solr to use new search..u can find something in shell script which with solr two: u not have to restart solr,,,just keep id is same..example: old id:1,title:hi, new id:1,title:welcome,,just index new data,,it will

Re: indexing Chienese langage

2009-06-04 Thread James liu
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. are u sure ur analyzer can do it good? if not sure, u can use

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Jun 4, 2009 at 11:29 PM, Robert Purdy rdpu...@gmail.com wrote: Thanks for the Good information :) Well I haven't had any evictions in any of the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in documentCache, 1.00 in the fieldValueCache, and 0.99 in the

Re: Determining Search Query Category

2009-06-04 Thread Avlesh Singh
If you haven't already given this a thought, you may want to try out an auto-complete feature, suggesting those categories upfront. Cheers Avlesh On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote: Hi, I have more than 20 categories for my search application. I'm

Re: Index Comma Separated numbers

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you try the NumberFormatTransformer ? On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, One of the fields to be indexed is price which is comma separated, e.g., 12,034.00.  How can I indexed it as a number? I am using DIH to pull the data. Thanks. --

Re: Determining Search Query Category

2009-06-04 Thread Walter Underwood
Can you analyze the logs to see which categories people choose for each query? When there are enough queries and a clear preference, you can highlight that choice. wunder On 6/4/09 9:21 PM, Avlesh Singh avl...@gmail.com wrote: If you haven't already given this a thought, you may want to try

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
How are you accessing Solr? SolrJ? does this help? https://issues.apache.org/jira/browse/SOLR-1129 On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: Otis,        With that solution, the client has to accept all type location fields (location_de_de,

Re: Customizing results

2009-06-04 Thread Avlesh Singh
Nice suggestion Noble! If you are using SolrJ, then this particular binding can be an answer to your question. Cheers Avlesh 2009/6/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com How are you accessing Solr? SolrJ? does this help? https://issues.apache.org/jira/browse/SOLR-1129 On

Re: how to do exact serch with solrj

2009-06-04 Thread Avlesh Singh
And the field should be of type, text, right Otis? Does one still need those anchors if the type is string with the filters you suggested? Cheers Avlesh On Fri, Jun 5, 2009 at 6:35 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I re-read your original request. Here is the recipe

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Otis, is it a good idea to provide as aliasing feature for Solr similar to the SQL 'as' in SQL we can do select location_da_dk as location Solr may have fl.alias=location_da_dk:location --Noble On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aha,

Re: Customizing results

2009-06-04 Thread Avlesh Singh
Generally a good idea, but be prepared to entertain requests that should also ask you to be able to perform the query using those aliases. I mean when you talk about something similar to aliases in SQL, those aliases can be used in SQL scripts in the where clause too. Cheers Avlesh 2009/6/5

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jun 5, 2009 at 10:20 AM, Avlesh Singh avl...@gmail.com wrote: Generally a good idea, but be prepared to entertain requests that should also ask you to be able to perform the query using those aliases. I mean when you talk about something similar to aliases in SQL, those aliases can be