We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
the index size went from 1.5 Gb to 2.7 Gb.
Is that some expected behavior ?
Is there any switch or trick to avoid having a double + index file size?
Koji Sekiguchi-2 wrote:
CharFilter can normalize (convert)
FastLRUCache is designed to be lock free so it is well suited for
caches which are hit several times in a request. I guess there is no
harm in using FastLRUCache across all the caches.
On Thu, Jun 4, 2009 at 3:22 AM, Robert Purdy rdpu...@gmail.com wrote:
Hey there,
Anyone got any advice on
isn't better to use an UpdateProcessor for this?
On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hello,
It's ugly, but the first thing that came to mind was ThreadLocal.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
-
What is the defaultOperator set in your solrconfig.xml? Are you sure that it
matches for au and not author?
-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Thursday, June 04, 2009 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Strange behaviour with
Is it correct to assume that using field compression will cause performance
issues if we decide to allow search over this field?
ie:
field name=id type=sint indexed=true stored=true
required=true /
field name=title type=textindexed=true stored=true
omitNorms=true/
Yao Ge schrieb:
Maybe we should call this alternative search terms or
suggested search terms instead of spell checking. It is
misleading as there is no right or wrong in spelling, there
is only popular (term frequency?) alternatives.
I had exactly the same difficulty in understanding the
Hey there, I am trying to optimize the setup of hasDocSet.
Have read the documentation here:
http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab
But can't exactly understand it.
Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or
Hmmm, are you quite sure that you emptied the index first and didn'tjust add
all the documents a second time to the index?
Also, when you say the index almost doubled, were you looking only
at the size of the *directory*? SOLR might have been holding a copy
of the old index open while you built a
Warning: This is from a Lucene perspective
I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to
*storing* the data, not putting the tokens in the index
(this latter is what's serached)...
It *will* cause performance issues if you load that field for a large
number of
Shalin Shekhar Mangar wrote:
| If you use spellcheck.q parameter for specifying
| the spelling query, then the field's analyzer will
| be used [...] If you use the q parameter, then the
| SpellingQueryConverter is used.
http://markmail.org/message/k35r7qmpatjvllsc - message
query suggest --wunder
On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote:
Yao Ge schrieb:
Maybe we should call this alternative search terms or
suggested search terms instead of spell checking. It is
misleading as there is no right or wrong in spelling, there
is only popular
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer
--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:
From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer
--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:
From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June
2009/6/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com:
FastLRUCache is designed to be lock free so it is well suited for
caches which are hit several times in a request. I guess there is no
harm in using FastLRUCache across all the caches.
Gets are cheaper, but evictions are more
Why build one? Don't those already exist?
Personally, I'd start with Hadoop instead of Solr. Putting logs in a
search index is guaranteed to not scale. People were already trying
different approaches ten years ago.
wunder
On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote:
Hi,
I am index a database with over 1 millions rows. Two of fields contain
unstructured text but size of each fields is limited (256 characters).
I come up with an idea to use visualize the text fields using text cloud by
turning the two text fields in facets. The weight of font and size is of
each
Hi,
I was wondering if there's an option to return statistics about distances
from the query terms to the most frequent terms in the result documents.
At present I return the most frequent terms using facetSearch which returns
for each word in the result documents the number ob occurences
On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:
It *will* cause performance issues if you load that field for a large
number of documents on a particular search. I know Lucene itself
has lazy field loading that helps in this case, but I don't know how
to persuade SOLR to use it (it may even
On Thu, Jun 4, 2009 at 7:24 PM, Michael Ludwig m...@as-guides.com wrote:
Shalin Shekhar Mangar wrote:
| If you use spellcheck.q parameter for specifying
| the spelling query, then the field's analyzer will
| be used [...] If you use the q parameter, then the
| SpellingQueryConverter is
Are you using Solr 1.3?
You might want to try the latest 1.4 test build - faceting has changed a lot.
-Yonik
http://www.lucidimagination.com
On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote:
I am index a database with over 1 millions rows. Two of fields contain
unstructured text
Thanks for the Good information :) Well I haven't had any evictions in any of
the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in
documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So
in your opinion should the documentCache and queryResultCache use
On Thu, Jun 4, 2009 at 7:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote:
Hey there, I am trying to optimize the setup of hasDocSet.
Be aware that in the latest versions of Solr 1.4, HashDocSet is no
longer used by Solr.
https://issues.apache.org/jira/browse/SOLR-1169
Have read the
Hi, One of the fields to be indexed is price which is comma separated, e.g.,
12,034.00. How can I indexed it as a number?
I am using DIH to pull the data. Thanks.
Yes. I am using 1.3. When is 1.4 due for release?
Yonik Seeley-2 wrote:
Are you using Solr 1.3?
You might want to try the latest 1.4 test build - faceting has changed a
lot.
-Yonik
http://www.lucidimagination.com
On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote:
I am
Hi,
I find that I am freely able to post to my production SOLR server, from any
other host that can run the post command. So somebody can wipe out the whole
index by posting a delete query. Is there a way SOLR can be configured so
that it will take updates ONLY from the server on which it is
Take a look at the security section in the wiki, u could do this with
firewall rules or password access.
On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote:
Hi,
I find that I am freely able to post to my production SOLR server, from any
other host that can run the post command. So
Hello,
If you know what language the user specified (or is associated with), then you
just have to ensure the fl URL parameter contain that field (and any other
fields you want returned). So if the language/locale is de_de, then make sure
the request has
I would also be interested to know what other existing solutions exist.
Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the
On Tue, Jun 2, 2009 at 11:28 PM, anuvenk anuvenkat...@hotmail.com wrote:
I'm using query time synonyms.
These don't currently work if the synonyms expand to more than one
option, and those options have a different number of words.
-Yonik
http://www.lucidimagination.com
I can't tell what that analyzer does, but I'm guessing it uses n-grams?
Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Fer-Bj fernando.b...@gmail.com
To:
Aha, so you really want to rename the field at response time? I wonder if this
is something that could be done with (or should be added to) response writers.
That's where I'd go look first.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
Hi,
As Alex correctly pointed out my main intention is to figure out whether
Solr/lucene offer functionalities to replicate what Splunk is doing in terms of
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as we
may
My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp.
if you avoid range queries (or use tdate), and esp. if you shard/segment
indices smartly, so that at query time you send (or distribute if you have to)
the query to only those shards that have the data (if your
Hi,
I have more than 20 categories for my search application. I'm interested in
finding the category of query entered by user dynamically instead of asking
the user to filter the results through long list of categories.
Its a general question, its not specific to solr though, any suggestion
I still have a problem with exact matching.
query.setQuery(title:\hello the world\);
This will return all docs with title containing hello the world, i.e.,
hello the world, Jack will also be matched. What I want is exactly hello the
world. Setting this field to string instead of text doesn't
What we usually do to reindex is:
1. stop solr
2. rmdir -r data (that is to remove everything in /opt/solr/data/
3. mkdir data
4. start solr
5. start reindex. with this we're sure about not having old copies or
index..
To check the index size we do:
cd data
du -sh
Otis Gospodnetic
Here is what we have:
for all the documents we have a field called small_body , which is a 60
chars max text field that were we store the abstract for each article.
We have about 8,000,000 documents indexed, and usually we display this
small_body on our listing pages.
For each listing page we
Hi,
This is encouraging to know that solr/lucene solution may work.
Can anyone using solr/lucene for such scenario can confirm that the solution is
used and working fine? That would be really helpful, as I just started looking
into the solr/lucene solution only couple of days back and might be
Hey,
Your system sounds similar to the work don by Stu Hood at Rackspace in their
Mailtrust unit. See
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor
more details and inspiration.
Regards,
Jeff
On Thu, Jun 4, 2009 at 4:58 PM,
Is it possible to create a custom analyzer (index time) that uses
UpdateRequestProcessor to add new fields to posts, based on the tokens
generated by the other analyzers that have been run (before my custom
analyzer)?
The content of said fields must differ from post to post based on the tokens
Ram,
Typical queries are short, so they are hard to categorize using statistical
approaches. Maybe categorization of queries would work with a custom set of
rules applied to queries?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From:
I don't think there is anything ready to be used in Solr (but would be easy to
add), but if you indexed your with a custom beginning of string and end of
string anchors, you'll be able to get your exact matching working.
For example, convert hello the world to $hello the world$ before indexing
I re-read your original request. Here is the recipe that should work:
* Define new field type that:
Uses KeywordTokenizer
Uses LowerCaseFilter
* Make your field be of the above type.
* Use those begin/end anchor characters at index and search time.
I believe that should work. Please
first: u not have to restart solr,,,u can use new data to replace old data
and call solr to use new search..u can find something in shell script which
with solr
two: u not have to restart solr,,,just keep id is same..example: old
id:1,title:hi, new id:1,title:welcome,,just index new data,,it will
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote:
Hi,
When I index chinese content using chinese tokenizer and analyzer in solr
1.3 ,some of the chinese text files are getting indexed but others are not.
are u sure ur analyzer can do it good?
if not sure, u can use
On Thu, Jun 4, 2009 at 11:29 PM, Robert Purdy rdpu...@gmail.com wrote:
Thanks for the Good information :) Well I haven't had any evictions in any of
the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in
documentCache, 1.00 in the fieldValueCache, and 0.99 in the
If you haven't already given this a thought, you may want to try out an
auto-complete feature, suggesting those categories upfront.
Cheers
Avlesh
On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote:
Hi,
I have more than 20 categories for my search application. I'm
did you try the NumberFormatTransformer ?
On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com wrote:
Hi, One of the fields to be indexed is price which is comma separated, e.g.,
12,034.00. How can I indexed it as a number?
I am using DIH to pull the data. Thanks.
--
Can you analyze the logs to see which categories people choose for
each query? When there are enough queries and a clear preference,
you can highlight that choice.
wunder
On 6/4/09 9:21 PM, Avlesh Singh avl...@gmail.com wrote:
If you haven't already given this a thought, you may want to try
How are you accessing Solr? SolrJ?
does this help?
https://issues.apache.org/jira/browse/SOLR-1129
On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan
kalyan.manepa...@orbitz.com wrote:
Otis,
With that solution, the client has to accept all type location fields
(location_de_de,
Nice suggestion Noble!
If you are using SolrJ, then this particular binding can be an answer to
your question.
Cheers
Avlesh
2009/6/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com
How are you accessing Solr? SolrJ?
does this help?
https://issues.apache.org/jira/browse/SOLR-1129
On
And the field should be of type, text, right Otis?
Does one still need those anchors if the type is string with the filters
you suggested?
Cheers
Avlesh
On Fri, Jun 5, 2009 at 6:35 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
I re-read your original request. Here is the recipe
Hi Otis,
is it a good idea to provide as aliasing feature for Solr similar to
the SQL 'as'
in SQL we can do
select location_da_dk as location
Solr may have
fl.alias=location_da_dk:location
--Noble
On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Aha,
Generally a good idea, but be prepared to entertain requests that should
also ask you to be able to perform the query using those aliases. I mean
when you talk about something similar to aliases in SQL, those aliases can
be used in SQL scripts in the where clause too.
Cheers
Avlesh
2009/6/5
On Fri, Jun 5, 2009 at 10:20 AM, Avlesh Singh avl...@gmail.com wrote:
Generally a good idea, but be prepared to entertain requests that should
also ask you to be able to perform the query using those aliases. I mean
when you talk about something similar to aliases in SQL, those aliases can
be
55 matches
Mail list logo