Thanks a lot: this tip was very important for me.
I tried with php curl with the purpose to send from Windows to MAC OS, after
one day I discovered that the @filename doesn't work on Windows, the error
was "26 failed creating formpost data" and the reason is that Windows php
curl (I don't know whe
> 1) How can I get rid of underscores('_') without using the
> wordDelimiter
> Filter (which gets rid of other syntax I need)?
Before TokenizerFactory you can apply that will replace
"_" with " " or "" depending of your needs.
mapping.txt will contain:
"_" => "" or
"_" => " "
Hi,
I'd like to use SOLR to create indices for deployment with katta. I'd like to
install a SOLR server on each crawler. The crawling script then sends the
content directly to the local SOLR server. Every 5-10 minutes I'd like to take
the current SOLR index, add it to katta and let SOLR start w
> It appears the parameter default
> setting in
> solrconfig.xml does not take effect.
Where did you put it? In ... section?
You need to put it into :
10
I'll try removing the '-'. I do need now to search it. the other option
would be to request the user what language to query. but in my region we
use italian and german in the same quantity, so it would turn out in
querying both the languages all the time. or you meant a more performant
solution of
Hi,
the parameter for WordDelimiterFilterFactory is catenateAll;
you should set it to 1.
Cheers,
Sven
--On Mittwoch, 10. Februar 2010 16:37 -0800 Yu-Shan Fung
wrote:
Check out the configuration of WordDelimiterFilterFactory in your
schema.xml.
Depending on your settings, it's probably
> I am using SOLR 1.3 and my server is
> embedded and accessed using SOLRJ.
> I would like to setup my searches so that exact matches are
> the first
> results returned, followed by near matches, and finally
> token based
> matches.
> For example, if I have a summary field in schema which is
> crea
> - A TokenFilter would allow me to tap into the existing analysis pipeline so
> I get the tokens for free but I can't access the document.
https://issues.apache.org/jira/browse/SOLR-1536
On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham wrote:
> We'd like to implement a profanity detector for docume
Hi,
i defined a requestHandler like this:
dismax
title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8
title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8
title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8
0.1
content* fields are tokenized. The content comes from nutch. As it
On Thu, Feb 11, 2010 at 8:39 AM, Ahmet Arslan wrote:
>> I am using SOLR 1.3 and my server is
>> embedded and accessed using SOLRJ.
>> I would like to setup my searches so that exact matches are
>> the first
>> results returned, followed by near matches, and finally
>> token based
>> matches.
>> Fo
Hello Everyone,
If I have a large data set which needs to be indexed, what strategy I can
take to build the index fast?
1. split the input into multiple xml files and then open different shells
and post each of the split xml file? will this work and help me build index
faster than 1 large xml fi
Hi,
were trying to implement another sortby Algorythm which is calculate outside of
our solr Server.
Is there a limit for the lines in that outside file? Cause we sometimes have
1.5 million lines in some situations.
Also is this a performance killer for 1.5 million rows?
Most of the other files
Why don't you approach for DIH
http://wiki.apache.org/solr/DataImportHandler
Thank you,
Vijayant Kumar
Software Engineer
Website Toolbox Inc.
http://www.websitetoolbox.com
1-800-921-7803 x211
>
> Hello Everyone,
>
> If I have a large data set which needs to be indexed, what strategy I can
> take
Thanks really useful article.
I am wondering about this statement in the article
"Keep in mind that Solr does not calculate universal term/doc frequencies.
At a large scale, its not likely to matter that tf/idf is calculated at the
shard level - however, if your collection is heavily skewed in
This sounds like an ideal use case for payloads. You could attach a boost value
to each term in your "keywords" field.
See
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
Another common workaround is to create, say, 8 multi-valued fields with boosts
0.5, 1.0, 1.5,
My point is that I WANT the AT, DOT to be indexed, to avoid these being treated
the same: foo-...@brown.fox and foo-bar.brown.fox
By using the LowerCaseFilterFactory before the replacements, you actually
ensure that a search for email:at will not give a match because the query will
be lower-case
Hi again,
I would still keep all fields in the original schema of the global Solr, just
for the sake of simplicity.
For custom sort order, you can look at ExternalFileField which is a text file
that you can add to your local Solr index independently of the pre-built index.
However, this only s
Can you show us how you configured spell check?
--
Jan Høydahl - search architect
Cominvent AS - www.cominvent.com
On 10. feb. 2010, at 11.48, michaelnazaruk wrote:
>
> Hello,all!
> I have some problem with spellcheck! I download,build and connect
> dictionary(~500 000 words)!It work fine! But
Yao Ge wrote:
> It appears the hl.maxAlternateFieldLength parameter default setting in
> solrconfig.xml does not take effect. I can only get it to work by explicitly
> sending the parameter via the client request. It is not big deal but it
> appears to be a bug.
>
Are you sure? Its handled the s
How about a field indextime_dt filled with "NOW". Then do a facet query to get
the montly stats last 12 months:
http://localhost:8983/solr/select/?q=*:*&rows=0&facet=true&facet.date=indextime_dt&facet.date.start=NOW/MONTH-12MONTHS&facet.date.end=NOW/MONTH%2B1MONTH&facet.date.gap=%2B1MONTH
To get
There is already a patch available to address that short-coming in
distributed search:
http://issues.apache.org/jira/browse/SOLR-1632
On Feb 11, 2010, at 6:56 AM, abhishes wrote:
Thanks really useful article.
I am wondering about this statement in the article
"Keep in mind that Solr
Hi,
Did you add spellcheck.extendedResults=true to your query? This will a.o. tell
you if Solr thinks it has been spelled correctly or not. However, if you have
specified spellcheck.onlyMorePopular=true, you may get suggestions even if it
has been spelled correctly.
Don't let the onlyMorePopu
Regarding hi-jacking, that was a false alarm. Apple Mail fooled me to believe
it was part of another thread. Sorry Jose.
I think the "properties" field approach is clean. It relies on index-time
classification which is where such heavy-lifting should preferrably be done.
Faceting on a multi-val
> I might be able to try this out though in general the
> project has a
> policy about only using released code (no trunk/unstable).
> https://issues.apache.org/jira/browse/SOLR-1604
> It looks like the kind of searching I want to do is not
> really
> supported in SOLR by default though. Is that co
Let me understand the issue... Have you added spellchecking parameters
to the /itas mapping in solrconfig.xml? If so, you should be able to
do /itas?q=mispeled&wt=xml and see the suggestions in the response.
If you've gotten that far you'll be able to navigate to them using the
object na
On Thu, Feb 11, 2010 at 6:56 AM, abhishes wrote:
>
> Thanks really useful article.
>
> I am wondering about this statement in the article
>
> "Keep in mind that Solr does not calculate universal term/doc frequencies.
> At a large scale, its not likely to matter that tf/idf is calculated at the
>
Things are done :-)
now we already have done the UIMA CAS consumer for Solr,
we are making it public, more news soon.
We have also been developing some filters based on payloads
One of the filters is to remove words with the payloads in the list the
other one maintains only these tokens
Sven & Yu-Shan - thank you for your advice.
It doesn't seem to work for me for some reason however,
this is what I was trying to get working last night before sending
My message out.
I'll try to explain in more detail what my setup is like.
I use a multiValued text field as a sort of holder for
hi there
i am trying to get familiar with solr while setting it up on my local pc and
indeing and retrieving some sample data .. a couple of things i am having
trouble with
1 - in my schema if i dont use the copyField to copy data from some fields
to the text field .. they are not searchable .. so
Unfortunately, the underscore is being quite resilient =(
I tried the solr.MappingCharFilterFactory and know the mapping is working as
I am changing "c" => "q" just fine. But the underscore refuses to go!
I am baffled . . .
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.co
here simple query:
http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true
result:
populars! but popular is correct word! Maybe i must change some properties
in solrconfig! Here my configs for keyrequest:
dismax
true
here simple query:
http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true
result:
populars! but popular is correct word! Maybe i must change some properties
in solrconfig! Here my configs for keyrequest:
dismax
true
here simple query:
http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true
result:
populars! but popular is correct word! Maybe i must change some properties
in solrconfig! Here my configs for keyrequest:
dismax
true
Hmm... I think I'm onto something.
It may be the stop word removal of "the".
When I changed my query analyzer for "text" to set
enablePositionIncrements="false" instead of true,
the query seems to find what I'm expecting. I'll keep
looking into this.
Is there any information available on what Pos
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml. Is it possible to get access to that field's term
vector in the URP?
On Thu, Feb 11, 2010 at 1:52 PM, Ahmet Arslan wrote:
>> What I really want is the equivalent of a match like this
>> along with
>> the normal tokenized matching (where the query has been
>> lowercased and
>> trimmed as well):
>> select * from blah where lowercase(column) like '%query%';
>> I think
Thanks for the help!
Yes, we are doing a commit following the update. We will try
IndexWriter.setInfoStream
Below are our the environments we are testing on:
Ubuntu Hardy, Kernel 2.6.16-xenU i386
Amazon EC2, US East Region
Embedded Jetty
Java 1.6.0_16
Solr 1.4
Server B
Ubuntu Hardy, Kernel 2.
Hi,
Check my earlier reply. You have explicitely set onlyMorePopular to true thus
you will most likely always get suggestion even if the term was spelled
correctly. You'll only get no suggestions if the term is spelled correctly and
it is the most `popular` term.
You can opt for keeping onlyM
Mike Perham wrote:
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml. Is it possible to get access to that field's term
vector in the URP?
You cannot get term vector info of a document b
On 2010-02-11 17:04, Mike Perham wrote:
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml. Is it possible to get access to that field's term
vector in the URP?
No, term vectors are created
We use the PatternTokenizerFactory. We have the following in our
schema:
And to get rid of '_' we just remove it from the pattern.
-Original Message-
From:
solr-user-return-32434-laurent.vauthrin=disney@lucene.apache.org
[mailto:solr-user-return-32434-laurent.vauthrin=disney@l
You did not say how frequent you need to update the index, if this is batch
type of operation or if you also have some real-time requirements after the
initial load.
Your ETL could use SolrJ and the StreamingUpdateSolrServer for high throughput.
You could try multiple threads pushing in parallel
I will run update index once a day.
Regards,
Abhishek
--Original Message--
From: Jan Høydahl / Cominvent
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Re: Posting Concurrently to Solr
Sent: Feb 11, 2010 22:17
You did not say how frequent you need to update
Have you played around with the "option httpclose" or the "option
forceclose" configuration options in HAProxy (both documented here:
http://haproxy.1wt.eu/download/1.3/doc/configuration.txt)?
-Tim
On Wed, Feb 10, 2010 at 10:05 AM, Ian Connor wrote:
> Thanks,
>
> I bypassed haproxy as a test and
On Jan 28, 2010, at 4:46 PM, Mike Perham wrote:
> We'd like to implement a profanity detector for documents during indexing.
> That is, given a file of profane words, we'd like to be able to mark a
> document as safe or not safe if it contains any of those words so that we
> can have something si
Not yet - but thanks for the link.
I think that the OS also has a timeout that keeps it around even after this
event and with heavy traffic I have seen this build up. Having said all
this, the performance impact after testing was negligible for us but I
thought I would post that haproxy can cause
I change config, but i get the same result!
dismax
false
false
true
external
query
spellcheck
mlt
--
View this message in context:
http://old.nabble.com/spellcheck-tp27527425p27550755.html
Sent from the Solr - U
Hi everyone,
I'm trying to enhance a more like this search I'm conducting by boosting the
documents that have a date close to the original. I would like to do something
like a parabolic function centered on the date (would make tuning a little more
effective), though a linear function would pro
My problem was that spellcheck component was missing from /itas handler.
With that in place, I could use
$response.response.spellcheck.suggestions.collation (no idea why I needed
$response.response?) to pick up the spellcheck.
Now it works quite well:
http://ec2-79-125-69-12.eu-west-1.compute.
thanks.
Hi,
I see you use an `external` dictionary. I've no idea what that is and how it
works but it looks like the dictionary believe `populars!` is a term which
obviously is not equal to `popular`. If this is an external index under your
manual control; how about adding `popular` to the dictionary?
> > If you use string type for summaryExact you can run
> this query summaryExact:my\ item* It will bring you all
> documents begins with my item.
>
> Actually it won't. The data I am indexing has extra spaces
> in front
> and is capitalized. I really need to be able to filter it
> through the
>
Hello,
I have a log search like application which requires indexed log events to be
searchable within a minute
and uses facets and the statscomponent.
Some stats:
- The log events are indexed every 10 seconds with a "commitWithin" of 60
seconds.
- 1M events / day (~75% are updates to previous eve
Can you show us your field definitions and the exact query string you are
using, and what you expect to see?
--
Jan Høydahl - search architect
Cominvent AS - www.cominvent.com
On 11. feb. 2010, at 15.31, adeelmahmood wrote:
>
> hi there
> i am trying to get familiar with solr while setting it
> Unfortunately, the underscore is
> being quite resilient =(
>
> I tried the solr.MappingCharFilterFactory and know the
> mapping is working as
> I am changing "c" => "q" just fine. But the underscore
> refuses to go!
>
> I am baffled . . .
I just activated name="textCharNorm" in example schem
The idea is that in the log is currently like:
Completed in 1290ms (View: 152, DB: 75) | 200 OK [
http://localhost:3000/search?q=nik+gene+cluster&view=2]
I want to extend it to also track the Solr query times and time spent in
solr-ruby like:
Completed in 1290ms (View: 152, DB: 75, Solr: 334) |
On Thu, Feb 11, 2010 at 13:07, Ian Connor wrote:
> The idea is that in the log is currently like:
>
> Completed in 1290ms (View: 152, DB: 75) | 200 OK [
> http://localhost:3000/search?q=nik+gene+cluster&view=2]
>
> I want to extend it to also track the Solr query times and time spent in
> solr-rub
On Feb 11, 2010, at 12:21 PM, Jan Høydahl / Cominvent wrote:
With that in place, I could use
$response.response.spellcheck.suggestions.collation (no idea why I
needed $response.response?) to pick up the spellcheck.
$response.response is needed in the Velocity templates because
$response i
Hi,
I'm currently evaluating the following solution: My crawler sends all docs to
a SOLR core named "WHATEVER". Every 5 minutes a new SOLR core with the same
name WHATEVER is created, but with a new datadir. The datadir contains a
timestamp in it's name.
Now I can check for datadirs that are ol
This seems to allow you to log each query - which is a good start.
I was thinking of something that would add all the ms together and report it
in the "completed at" line so you can get a higher level view of which
requests take the time and where.
Ian.
On Thu, Feb 11, 2010 at 1:13 PM, Mat Brown
Oh - indeed - sorry, didn't read your email closely enough : )
Yeah that would probably involve some pretty crufty monkey patching /
use of globals...
On Thu, Feb 11, 2010 at 13:22, Ian Connor wrote:
> This seems to allow you to log each query - which is a good start.
>
> I was thinking of somet
...and probably break stuff - that might be why it hasn't been done.
On Thu, Feb 11, 2010 at 1:28 PM, Mat Brown wrote:
> Oh - indeed - sorry, didn't read your email closely enough : )
>
> Yeah that would probably involve some pretty crufty monkey patching /
> use of globals...
>
> On Thu, Feb 11
On 10.02.2010, at 16:41, Lukas Kahwe Smith wrote:
> There is a solution to update via DIH, but is there also a way to define a
> query that fetches id's for documents that should be removed?
Or to phrase the question a bit more open. I have a file with id's of documents
to delete (one per lin
Janne,
I usually just turn the caches to next to nearly off for frequent commits.
Jason
On Thu, Feb 11, 2010 at 9:35 AM, Janne Majaranta
wrote:
> Hello,
>
> I have a log search like application which requires indexed log events to be
> searchable within a minute
> and uses facets and the statsc
So I got it to work by running the drupal cron.php.
I was originally trying to use the exampledocs, indexing that content, and
making that index available to the Drupal solr.
But it might just be that they are different indexes? And that's why I
wasn't get responses.
One quick question, the Dru
Hey Jason,
Do you use faceting with frequent commits ?
And by turning off the caches you mean setting autowarmcount to zero ?
I did try to turn off autowarming with a 36M documents instance but getting
facets over those documents takes over 10 seconds.
With a warm cache it takes 200ms ...
-Janne
On Thu, Feb 11, 2010 at 3:21 PM, Janne Majaranta
wrote:
> Hey Jason,
>
> Do you use faceting with frequent commits ?
> And by turning off the caches you mean setting autowarmcount to zero ?
>
> I did try to turn off autowarming with a 36M documents instance but getting
> facets over those document
Xavier Schepler wrote:
>
> for example, "concept_user_*", and I will have maybe more than 200 users
> using this feature.
>
I've done tests with many hundred dynamically created fields (ie foo_1 thru
f_400). generally speaking, I havent noticed any noticeable performance
issues from having t
On Thu, Feb 11, 2010 at 15:41, gdeconto wrote:
>
>
> Xavier Schepler wrote:
>>
>> for example, "concept_user_*", and I will have maybe more than 200 users
>> using this feature.
>>
>
> I've done tests with many hundred dynamically created fields (ie foo_1 thru
> f_400). generally speaking, I have
Janne,
The answers to your last 2 questions are both yes. I've seen that done a few
times and it works. I don't have the answer to the always-hot cache question.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- O
I don't know, but the other day I did see a NPE related to fields with '-'. In
Distributed Search context at least, fields with '-' were causing a NPE.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Mess
Ok,
Thanks Yonik and Otis.
I already had static warming queries with facets turned on and autowarming
at zero.
There were a lot of other optimizations after that however, so I'll try with
zero autowarming and static warming queries again.
If that doesn't work, I'll go with 3 instances on the same
Claudio,
If I understand correctly, the problem is that you are trying to sort on a
tokenized text field. That won't work and for something like "content" field
that corresponds to the content of a web page, it doesn't even make much sense.
What you may to do is create another *string* field a
Good afternoon!
I was in the IRC room earlier this morning with a problem, and I'm still having
difficulty with it. I'm trying to do a site search upsell so that sponsored
results can be highlighted and boosted to the top of the results. I need to
have my default operator set to AND, because i
Claudio,
Ah, through multilingual indexing/search work (with
http://www.sematext.com/products/multilingual-indexer/index.html ) I learned
that cross-language search often doesn't really make sense, unless the search
involves "universal terms" (e.g. Fiat, BMW, Mercedes, Olivetti, Tomi de Paola,
Gerald,
Your suggestion will likely get lost in the piles of solr-user email. You
should add your comments to JIRA-236 directly.
Otis
-
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Message
> From: gdecon
Mark,
Yes, facets will give you that information. Min/max StatsComponent? See
http://www.search-lucene.com/?q=StatsComponent
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Message
> From: Mark
>
> Hello,
>
> I have a question related to local solr. For certain locations (latitude,
> longitude), the spatial search does not work. Here is the query I try to
> make which gives me no results:
>
> q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.060547&radius=450
>
> However if I make th
Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I
*think* they are trying to patent this.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Message
> From: Jan Høydahl / Comi
Note that UIMA doesn't doe NER itself (as far as I know), but instead relies on
GATE or OpenNLP or OpenCalais, AFAIK :)
Those interested in UIMA and living close to New York should go to
http://www.meetup.com/NYC-Search-and-Discovery/calendar/12384559/
Otis
Sematext :: http://sematext.com
Hi,
this is correct. Usually one does not know, how a stemmer - or
other language specific filters - behaves in the context of a
foreign language.
But there is an exception that sometimes comes to the rescue:
If one has a stable dictionary of terms in all the languages
of interest, then one migh
Hi,
I would like to know if there is a way of reindexing data without restarting
the server. Lets say I make a change in the schema file. That would require
me to reindex data. Is there a solution to this ?
--
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/
Hi,
restarting the Solr server wouldn't help. If you want to re-index
your data you have to pipe it through the whole process again.
In your case it might be a good idea to consider having several
cores holding the different schema definitions. This will not save
you from getting the original da
Thanks for responding to my question.
Let me just put a situation that might arise in future. I decide to add a
new field to the schema. So if I have understood you correctly, "piping it
through the whole process" would mean, that I delete records one by one, and
add the same records again. Basica
Hi,
thanks for your answer. I'm getting crazy for this.
No. I did not define any sorting or scoring explicitly. But solr isn't
working with my requestHandler. It complains about sorting on the
content field.
I agree with you. sorting on content wouldn't make much sense. On my
first post i quoted
I'd like to boost an exact phrase match such as q="video poker" over
q=video poker. How would I do this using dismax?
I tried pre-processing video poker into, video poker "video poker"
however that just gets munged by dismax into "video poker video
poker"... Which is wrong.
Cheers!
if you use the core model via solr.xml you can reload a core without
having to to restart the servlet container,
http://wiki.apache.org/solr/CoreAdmin
On 02/11/2010 02:40 PM, Emad Mushtaq wrote:
Hi,
I would like to know if there is a way of reindexing data without restarting
the server. Lets sa
I think I am making some progress - the key suggestion was to look at the
analysis.jsp which I foolishly had forgotten =(.
I think it is actually a bug in the ShingleFilterFactory when it is used in
subsequent to another Filter which removes tokens, e.g. StopFilterFactory or
WordDelimiterFactory.
That's a bug, IMO...
On Thu, Feb 11, 2010 at 1:30 PM, Otis Gospodnetic
wrote:
> I don't know, but the other day I did see a NPE related to fields with '-'.
> In Distributed Search context at least, fields with '-' were causing a NPE.
>
>
> Otis
>
> Sematext :: http://sematext.com/ :: Solr
I gave you bad advice about qt=. Erik Hatcher kindly corrected me:
>> Actually qt selects the request handler. defType selects the query parser.
>> qt may implicitly select a query parser of course, but that would depend on
>> the request handler definition.
On Wed, Feb 10, 2010 at 1:10 PM, S
Hi Christopher,
ShingleFilter(Factory), by design, inserts underscores for empty positions, so
that you don't get shingles created from non-contiguous tokens.
It would probably be better to treat empty positions as edges, like an
end-of-stream followed by a beginning-of-stream, and only output
I agree. I just didn't have the chance to look at it closely to get enough
details for filing in JIRA.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Message
> From: Jason Rutherglen
> To: solr-user
92 matches
Mail list logo