Erick,
Many thanks for your suggestions and pointers, i am proceeding with my study
and looking forward to do a POC with Solr.
Thanks again.
On Sun, Sep 25, 2011 at 7:40 PM, Erick Erickson erickerick...@gmail.comwrote:
Well, this is not a neutral forum G...
A common use-case for Solr is
Hello,
We use solr.UUIDField to generate unique ids, using the latest trunk (change
list 1163767) seems to throw an error Document is missing mandatory uniqueKey
field: id. The schema is setup to generate a id field on updates
field name=id type=uuid indexed=true stored=true default=NEW /
hi ,
I am replicating solr and getting this error . i am unable to make out the
cause so please kindly help
26 Sep, 2011 8:00:14 AM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
SEVERE: Error during auto-warming of
Sorry for the somewhat length post, I would like to make clear that I covered
my basis here, and looking for an alternative solution, because the more
trivial solutions don't seem to work for my use-case.
Consider Bars, musea, etc.
These places have multiple openinghours that can depend on:
Hi all,
I have a text field named* textForQuery* .
Following content has been indexed into solr in field textForQuery
*Coke Studio at MTV*
when i fired the query as
*textForQuery:(coke studio at mtv)* the results showed 0 documents
After runing the same query in debugMode i got the following
Thx for your response, we will try dynamic fields for this
-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com]
Gesendet: Samstag, 24. September 2011 21:33
An: solr-user@lucene.apache.org
Betreff: Re: How to map database table for facted search?
In general, you
On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote:
Documents: We have close to ~12 million XML docs, of varying sizes average
size 20 KB. These documents have 150 fields, which should be searchable
indexed. [...] Approximately ~6000 such documents are updated 400-800 new
ones
are added
Hi,
We have 500K web document and usind solr (trunk) to index it. We have
special anaylizer which little bit heavy cpu .
Our machine config:
32 x cpu
32 gig ram
SAS HD
We are sending document with 16 reduce client (from hadoop) to the stand
alone solr server. the problem is we couldnt get
Tirthankar,
are you indexing 1.smaller docs or 2.books?
if 1. your caches are too big for your memory, as Erick already said.
Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your
caches smaller.
if 2. read the blog-posts on hathitrust.com.
I won't guarantee this is the 'best algorithm', but here's what we use. (This
is in a final class with only static helper methods):
// Set of characters / Strings SOLR treats as having special meaning in a
query, and the corresponding Escaped versions.
// Note that the actual operators
Hi Alonso, Gora,
I run in the same Problem with the MailEntityProcessor.
I have an Email-Folder called Test. Inside there a only two messages.
When I run the DIH everything looks find, except that the two Emails doesn't
get indexed.
Are there any adidtional informations to this problem?
I'm
On 9/24/11 12:17 PM, Erick Erickson wrote:
What version of Solr?
I am using solr 3.2
When you copied the default, did you set up
default values for MLT?
This is what I need help with.
How should the request handler / solrconfig be setup?
Showing us the request you used
The request is
Just to bring closure on this one, we were slurping data from the
wrong DB (hardly desktop class machine)...
Solr did not cough on 41Mio records @34k updates / sec., single threaded.
Great!
On Sat, Sep 24, 2011 at 9:18 PM, eks dev eks...@yahoo.co.uk wrote:
just looking for hints where to
This is pretty serious issue
Bill Bell
Sent from mobile
On Sep 26, 2011, at 4:09 AM, Isan Fulia isan.fu...@germinait.com wrote:
Hi all,
I have a text field named* textForQuery* .
Following content has been indexed into solr in field textForQuery
*Coke Studio at MTV*
when i fired the
Please don't say it's just like the example. If it was,
then it would most likely be working.
If you don't take the time to show us what you've tried,
and the results you get back, then there's not much we
can do to help.
Best
Erick
On Mon, Sep 26, 2011 at 7:18 AM, dan whelan d...@adicio.com
Hi everyone,
Sorry if this issue has been discussed before, but I'm new to the list.
I have a solr (3.4) instance running with 20 cores (around 4 million docs
each).
The instance has allocated 13GB in a 16GB RAM server. If I run several sets
of queries sequentially in each of the cores, the I/O
Is there any limitation, be it technical or for sanity reasons, on the
number of shards that can be part of a solr cloud implementation?
OK. This is exactly what i did.
With a fresh download of solr 3.2
unpack and go to example directory
start solr: java -jar start.jar
the go to exampledocs and run: ./post.sh *xml
Then go here:
Hi Isan,
Does your search return any documents when you remove the 'at' keyword and
just search for Coke studio MTV ?
Also, can you please provide the snippet of schema.xml file where you have
mentioned this field name and its type description ?
On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia
hi,
do we got any DIH plugin which is for mongodb?
regards,
kiwi
Hi everyone,
Sorry if this issue has been discussed before, but I'm new to the list.
I have a solr (3.4) instance running with 20 cores (around 4 million docs
each).
The instance has allocated 13GB in a 16GB RAM server. If I run several sets
of queries sequentially in each of the cores, the I/O
On 9/26/2011 9:33 AM, Bictor Man wrote:
Hi everyone,
Sorry if this issue has been discussed before, but I'm new to the list.
I have a solr (3.4) instance running with 20 cores (around 4 million docs
each).
The instance has allocated 13GB in a 16GB RAM server. If I run several sets
of queries
You have not said how big your index is but I suspect that allocating 13GB for
your 20 cores is starving the OS of memory for caching file data. Have you
tried 6GB with 20 cores? I suspect you will see the same performance as 6GB
10 cores.
Generally it is better to allocate just enough memory
Hi all.
how can we do a query similar to 'like' ?
if I have this phrase like a single token in the index: This phrase has
various words (using KeywordTokenizerFactory)
and i like a exact match of: phrase has various or various words form
instance...
How can i do this??
Thanks a lot.
:
: Unfortunately the facet fields are not static. The field are dynamic SOLR
: fields and are generated by different applications.
: The field names will be populated into a data store (like memcache) and
: facets have to be driven from that data store.
:
: I need to write a Custom
Hello guys,
I need to implement a functionality which requires something similar
to aggregate functions in SQL. My Solr schema looks like this:
-doc_id: integer
-date: date
-value1: integer
-value2: integer
Basically the index contains some numerical values (value1, value2,
etc) per doc
You can replicate it with the example app by replacing the id definition in
schema.xml with
field name=id type=uuid indexed=true stored=true default=NEW /
Removing the id fields in the one of the example doc.xml and posting it to solr.
Thanks
Viswa
On Sep 26, 2011, at 12:15 AM, Viswa S
Dan:
The disconnect here seems to be that these examples urls on the
MoreLikeThisHandler wiki page assume a /mlt request handler exists, but
no handler by that name has ever actually existed in the solr example
configs. (the wiki page doesn't explicitly state that those URLs will
work with
500 / second would be 1,800,000 per hour (much more than 500K documents).
1) how big is each document?
2) how big are your index files?
3) as others have recently written, make sure you don't give your JRE so much
memory that your OS is starved for memory to use for file system cache.
JRJ
Hello,
While indexing there are certain urls/ids I'd never want to appear in the
search results (so be indexed). Is there already a 'supported by design'
mechanism to do that to point me too, or should I just create this blacklist
as an processor in the update chain?
--
Regards,
K. Gabriele
Hi all
I am new to SOLR and have a doubt on Boosting the Exact Terms to the top
on a Particular field
For ex :
I have a text field names ts_category and I want to give more boost to
this field rather than other fields, SO in my Query I pass the following in
the QF params qf=body^4.0
If you need those kinds of searches then you should probably not be using
the KeywordTokenizerFactory, is there any reason why you can't switch to a
WhitespaceTokenizer for example? then you could use a simple phrase query
for your search case. if you need everything as a Token, you could use a
We used copyField to copy the address to two fields:
1. Which contains just the first token up to the first whitespace
2. Which copies all of it, but translates to lower case.
Then our users can enter either a street number, a street name, or both. We
copied all of it to the second field
Are you batching the documents before sending them to the solr server? Are
you doing a commit only at the end? Also since you have 32 cores, you can
try upping the number of concurrent updaters from 16 to 32.
Jaeger, Jay - DOT wrote:
500 / second would be 1,800,000 per hour (much more than
: References:
: cafwsjvnqkaufwspqrkm4sckb-0gvak-vktkfrnmfwgzwltm...@mail.gmail.com
: In-Reply-To:
: cafwsjvnqkaufwspqrkm4sckb-0gvak-vktkfrnmfwgzwltm...@mail.gmail.com
: Subject: how to implemente a query like like '%pattern%'
https://people.apache.org/~hossman/#threadhijack
Thread
: Subject: Re: Unique Key error on trunk
:
:
: You can replicate it with the example app by replacing the id definition in
schema.xml with
:
: field name=id type=uuid indexed=true stored=true default=NEW
/
thanks for reporting this Viswa, I've filed a bug to track it...
I have a use case where I would like to search across two fields but I
do not want to weight a document that has a match in both fields higher
than a document that has a match in only 1 field.
For example.
Document 1
- Field A: Foo Bar
- Field B: Foo Baz
Document 2
- Field A: Foo Blarg
-
: Hi Erick, The problem I am trying to solve is to filter invalid entities.
: Users might mispell or enter a new entity name. This new/invalid entities
: need to pass through a KeepWordFilter so that it won't pollute our
: autocomplete result.
how are you doing autocomplete?
if you are using
Hi guys,
thanks for your replies. indeed the filesystem caching seems to be the
difference. sadly I can't add more memory and the 6GB/20core combination
doesn't work. so I'll just try to tweak it as much as I can.
thanks a lot.
2011/9/26 François Schiettecatte fschietteca...@gmail.com
You
Is UpdateProcessor triggered when updating an existing document or for new
documents also?
On Tue, Sep 27, 2011 at 6:00 AM, Chris Hostetter-3 [via Lucene]
ml-node+s472066n3371110...@n3.nabble.com wrote:
: Hi Erick, The problem I am trying to solve is to filter invalid entities.
: Users
i found answer to my question ..
basically it works only with complete match..
--
View this message in context:
http://lucene.472066.n3.nabble.com/external-file-field-partial-data-match-in-key-field-tp3368547p3371328.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi, guys,
Do you have any plans to support function queries on score field? for
example, sort=floor(product(score, 100)+0.5) desc?
So far I am getting the following error:
undefined field score
I can't use subquery in this case because I am trying to use secondary
sorting, however I will be
Hi Mark,
Eh, I don't have Lucene/Solr source code handy, but I *think* for that you'd
need to write custom Lucene similarity.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
From: Mark
Hi Gabriele,
Either the latter option, or just treat them as stop words if you just want to
remove those urls/ids from indexed docs (may still get highlighted).
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
If I were you, probably I will try defining two fields:
1. ts_category as a string type
2. ts_category1 as a text_en type
Make sure copy ts_category to ts_category1.
You can use the following as qf in your dismax:
qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
or something like that.
YH
Hi,
Here is a 1 month old thread I found on search-lucene -- didn't even have to do
a search, I got it as a suggestion from AutoComplete when I started typing the
word mongodb :)
http://search-lucene.com/m/8AEE31AaTd32
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Aha! See, it was the DB after all! ;) Thanks for following up, I was curious.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
From: eks dev eks...@yahoo.co.uk
To: solr-user
Hello,
PS: solr streamindex is not option because we need to submit javabin...
If you are referring to StreamingUpdateSolrServer, then the above statement
makes no sense and you should give SUSS a try.
Are you sure your 16 reducers produce more than 500 docs/second?
I think somebody already
Rajat,
What version? If 3.4.0, I'd try 3.4.0 first.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
From: shinkanze rajatrastogi...@gmail.com
To: solr-user@lucene.apache.org
Sent:
Hi Roland,
Have a look at hit #1
here: http://search-lucene.com/?q=manifoldcffc_project=Solr
I think this is what you are after.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
From:
wow, this search engine is powerful !
too bad after look throught it, still got not solution.
seem like I need to get my hand dirty to make one :)
kiwi
On Tue, Sep 27, 2011 at 12:08 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hi,
Here is a 1 month old thread I found on
Hi
You mean to say copy the String field to a Text field or the reverse .
This is the approach I am currently following
Step 1: Created a FieldType
fieldType name=string_lower class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
tokenizer
The following should help with size estimation:
http://search-lucene.com/?q=estimate+memoryfc_project=Solr
http://issues.apache.org/jira/browse/LUCENE-3435
I'll just add that with that much RAM you'll be more than fine.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene
From: Kiwi de coder kiwio...@gmail.com
wow, this search engine is powerful !
Thanks, glad it helps.
too bad after look throught it, still got not solution.
seem like I need to get my hand dirty to make one :)
:)
Please consider contributing: http://wiki.apache.org/solr/HowToContribute
Otis
I have been able to setup Solr Spell checker on my web application. It is a
file based spell checker that i have implemented. I would like to add that
the same isn't that accurate, since I haven't applied any specific algorithm
for having the most relevant search result. Kindly do let me know in
Firstly, just to make it clear the dictionary is made out of already indexed
terms, rather it is built upon it if you are using *str
name=classnamesolr.IndexBasedSpellChecker/str* which you are.
Next lot of changes are required for your *solrconfig.xml*
1. str name=fieldspell/str is the name of
Hi Rahul,
I also tried searching Coke Studio MTV but no documents were returned.
Here is the snippet of my schema file.
fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer
On Tue, Sep 27, 2011 at 10:51 AM, nagarjuna nagarjuna.avul...@gmail.com wrote:
Hi everybody.
right now i have little bit idea about the solr query ..but i am not
clear about delta query
wht it is? and how to write ?any sample delta query?
http://lmgtfy.com/?q=solr+delta+query
There
Hi gora can u pls quit ur answers like these..
i may get the perfect answer from anybody but not u,so kindly
please be quit
i already googled and i saw many links as a beginner i am unable to got the
main intention behind using the delta query,even we have query.and i
I'm interested in the stopwords solution as it sounds like less work but i'm
not sure i understand how it works. By having msn.com as a stopword it doesnt
mean i wont get msn.com as a result for say 'hotmail'. My understanding is that
msn.com will never make it to the similarity function and
60 matches
Mail list logo