Re: Posting pdf file and posting from remote

2010-02-11 Thread alendo
Thanks a lot: this tip was very important for me. I tried with php curl with the purpose to send from Windows to MAC OS, after one day I discovered that the @filename doesn't work on Windows, the error was "26 failed creating formpost data" and the reason is that Windows php curl (I don't know whe

Re: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Ahmet Arslan
> 1) How can I get rid of underscores('_') without using the > wordDelimiter > Filter (which gets rid of other syntax I need)? Before TokenizerFactory you can apply that will replace "_" with " " or "" depending of your needs. mapping.txt will contain: "_" => "" or "_" => " "

continuously creating index packages for katta with solr

2010-02-11 Thread Thomas Koch
Hi, I'd like to use SOLR to create indices for deployment with katta. I'd like to install a SOLR server on each crawler. The crawling script then sends the content directly to the local SOLR server. Every 5-10 minutes I'd like to take the current SOLR index, add it to katta and let SOLR start w

Re: hl.maxAlternateFieldLength defaults in solrconfig.xml

2010-02-11 Thread Ahmet Arslan
> It appears the parameter default > setting in > solrconfig.xml does not take effect. Where did you put it? In ... section? You need to put it into : 10

Re: dismax and multi-language corpus

2010-02-11 Thread Claudio Martella
I'll try removing the '-'. I do need now to search it. the other option would be to request the user what language to query. but in my region we use italian and german in the same quantity, so it would turn out in querying both the languages all the time. or you meant a more performant solution of

Re: Need a bit of help, Solr 1.4: type "text".

2010-02-11 Thread Sven Maurmann
Hi, the parameter for WordDelimiterFilterFactory is catenateAll; you should set it to 1. Cheers, Sven --On Mittwoch, 10. Februar 2010 16:37 -0800 Yu-Shan Fung wrote: Check out the configuration of WordDelimiterFilterFactory in your schema.xml. Depending on your settings, it's probably

Re: Cannot get like exact searching to work

2010-02-11 Thread Ahmet Arslan
> I am using SOLR 1.3 and my server is > embedded and accessed using SOLRJ. > I would like to setup my searches so that exact matches are > the first > results returned, followed by near matches, and finally > token based > matches. > For example, if I have a summary field in schema which is > crea

Re: implementing profanity detector

2010-02-11 Thread Alexey Serba
> - A TokenFilter would allow me to tap into the existing analysis pipeline so > I get the tokens for free but I can't access the document. https://issues.apache.org/jira/browse/SOLR-1536 On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham wrote: > We'd like to implement a profanity detector for docume

sorting

2010-02-11 Thread Claudio Martella
Hi, i defined a requestHandler like this: dismax title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8 title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8 title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8 0.1 content* fields are tokenized. The content comes from nutch. As it

Re: Cannot get like exact searching to work

2010-02-11 Thread Aaron Zeckoski
On Thu, Feb 11, 2010 at 8:39 AM, Ahmet Arslan wrote: >> I am using SOLR 1.3 and my server is >> embedded and accessed using SOLRJ. >> I would like to setup my searches so that exact matches are >> the first >> results returned, followed by near matches, and finally >> token based >> matches. >> Fo

Posting Concurrently to Solr

2010-02-11 Thread abhishes
Hello Everyone, If I have a large data set which needs to be indexed, what strategy I can take to build the index fast? 1. split the input into multiple xml files and then open different shells and post each of the split xml file? will this work and help me build index faster than 1 large xml fi

ExternalFileField

2010-02-11 Thread Julian Hille
Hi, were trying to implement another sortby Algorythm which is calculate outside of our solr Server. Is there a limit for the lines in that outside file? Cause we sometimes have 1.5 million lines in some situations. Also is this a performance killer for 1.5 million rows? Most of the other files

Re: Posting Concurrently to Solr

2010-02-11 Thread Vijayant Kumar
Why don't you approach for DIH http://wiki.apache.org/solr/DataImportHandler Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211 > > Hello Everyone, > > If I have a large data set which needs to be indexed, what strategy I can > take

Re: Question on Solr Scalability

2010-02-11 Thread abhishes
Thanks really useful article. I am wondering about this statement in the article "Keep in mind that Solr does not calculate universal term/doc frequencies. At a large scale, its not likely to matter that tf/idf is calculated at the shard level - however, if your collection is heavily skewed in

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-11 Thread Jan Høydahl / Cominvent
This sounds like an ideal use case for payloads. You could attach a boost value to each term in your "keywords" field. See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ Another common workaround is to create, say, 8 multi-valued fields with boosts 0.5, 1.0, 1.5,

Re: Question on Tokenizing email address

2010-02-11 Thread Jan Høydahl / Cominvent
My point is that I WANT the AT, DOT to be indexed, to avoid these being treated the same: foo-...@brown.fox and foo-bar.brown.fox By using the LowerCaseFilterFactory before the replacements, you actually ensure that a search for email:at will not give a match because the query will be lower-case

Re: Replication and querying

2010-02-11 Thread Jan Høydahl / Cominvent
Hi again, I would still keep all fields in the original schema of the global Solr, just for the sake of simplicity. For custom sort order, you can look at ExternalFileField which is a text file that you can add to your local Solr index independently of the pre-built index. However, this only s

Re: spellcheck

2010-02-11 Thread Jan Høydahl / Cominvent
Can you show us how you configured spell check? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 10. feb. 2010, at 11.48, michaelnazaruk wrote: > > Hello,all! > I have some problem with spellcheck! I download,build and connect > dictionary(~500 000 words)!It work fine! But

Re: hl.maxAlternateFieldLength defaults in solrconfig.xml

2010-02-11 Thread Mark Miller
Yao Ge wrote: > It appears the hl.maxAlternateFieldLength parameter default setting in > solrconfig.xml does not take effect. I can only get it to work by explicitly > sending the parameter via the client request. It is not big deal but it > appears to be a bug. > Are you sure? Its handled the s

Re: Getting max/min dates from solr index

2010-02-11 Thread Jan Høydahl / Cominvent
How about a field indextime_dt filled with "NOW". Then do a facet query to get the montly stats last 12 months: http://localhost:8983/solr/select/?q=*:*&rows=0&facet=true&facet.date=indextime_dt&facet.date.start=NOW/MONTH-12MONTHS&facet.date.end=NOW/MONTH%2B1MONTH&facet.date.gap=%2B1MONTH To get

Re: Question on Solr Scalability

2010-02-11 Thread Erik Hatcher
There is already a patch available to address that short-coming in distributed search: http://issues.apache.org/jira/browse/SOLR-1632 On Feb 11, 2010, at 6:56 AM, abhishes wrote: Thanks really useful article. I am wondering about this statement in the article "Keep in mind that Solr

Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi, Did you add spellcheck.extendedResults=true to your query? This will a.o. tell you if Solr thinks it has been spelled correctly or not. However, if you have specified spellcheck.onlyMorePopular=true, you may get suggestions even if it has been spelled correctly. Don't let the onlyMorePopu

Re: Faceting

2010-02-11 Thread Jan Høydahl / Cominvent
Regarding hi-jacking, that was a false alarm. Apple Mail fooled me to believe it was part of another thread. Sorry Jose. I think the "properties" field approach is clean. It relies on index-time classification which is where such heavy-lifting should preferrably be done. Faceting on a multi-val

Re: Cannot get like exact searching to work

2010-02-11 Thread Ahmet Arslan
> I might be able to try this out though in general the > project has a > policy about only using released code (no trunk/unstable). > https://issues.apache.org/jira/browse/SOLR-1604 > It looks like the kind of searching I want to do is not > really > supported in SOLR by default though. Is that co

Re: How to add SpellCheckResponse to Solritas?

2010-02-11 Thread Erik Hatcher
Let me understand the issue... Have you added spellchecking parameters to the /itas mapping in solrconfig.xml? If so, you should be able to do /itas?q=mispeled&wt=xml and see the suggestions in the response. If you've gotten that far you'll be able to navigate to them using the object na

Re: Question on Solr Scalability

2010-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2010 at 6:56 AM, abhishes wrote: > > Thanks really useful article. > > I am wondering about this statement in the article > > "Keep in mind that Solr does not calculate universal term/doc frequencies. > At a large scale, its not likely  to matter that tf/idf is calculated at the >

Re: Solr and UIMA

2010-02-11 Thread JCodina
Things are done :-) now we already have done the UIMA CAS consumer for Solr, we are making it public, more news soon. We have also been developing some filters based on payloads One of the filters is to remove words with the payloads in the list the other one maintains only these tokens

RE: Need a bit of help, Solr 1.4: type "text".

2010-02-11 Thread Dickey, Dan
Sven & Yu-Shan - thank you for your advice. It doesn't seem to work for me for some reason however, this is what I was trying to get working last night before sending My message out. I'll try to explain in more detail what my setup is like. I use a multiValued text field as a sort of holder for

help with facets and searchable fields

2010-02-11 Thread adeelmahmood
hi there i am trying to get familiar with solr while setting it up on my local pc and indeing and retrieving some sample data .. a couple of things i am having trouble with 1 - in my schema if i dont use the copyField to copy data from some fields to the text field .. they are not searchable .. so

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Christopher Ball
Unfortunately, the underscore is being quite resilient =( I tried the solr.MappingCharFilterFactory and know the mapping is working as I am changing "c" => "q" just fine. But the underscore refuses to go! I am baffled . . . -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.co

Re: spellcheck

2010-02-11 Thread michaelnazaruk
here simple query: http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true result: populars! but popular is correct word! Maybe i must change some properties in solrconfig! Here my configs for keyrequest: dismax true

Re: spellcheck

2010-02-11 Thread michaelnazaruk
here simple query: http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true result: populars! but popular is correct word! Maybe i must change some properties in solrconfig! Here my configs for keyrequest: dismax true

Re: spellcheck

2010-02-11 Thread michaelnazaruk
here simple query: http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt=keyrequest&spellcheck.extendedResults=true result: populars! but popular is correct word! Maybe i must change some properties in solrconfig! Here my configs for keyrequest: dismax true

RE: Need a bit of help, Solr 1.4: type "text".

2010-02-11 Thread Dickey, Dan
Hmm... I think I'm onto something. It may be the stop word removal of "the". When I changed my query analyzer for "text" to set enablePositionIncrements="false" instead of true, the query seems to find what I'm expecting. I'll keep looking into this. Is there any information available on what Pos

term frequency vector access?

2010-02-11 Thread Mike Perham
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP?

Re: Cannot get like exact searching to work

2010-02-11 Thread Aaron Zeckoski
On Thu, Feb 11, 2010 at 1:52 PM, Ahmet Arslan wrote: >> What I really want is the equivalent of a match like this >> along with >> the normal tokenized matching (where the query has been >> lowercased and >> trimmed as well): >> select * from blah where lowercase(column) like '%query%'; >> I think

Re: "after flush: fdx size mismatch" on query durring writes

2010-02-11 Thread Acadaca
Thanks for the help! Yes, we are doing a commit following the update. We will try IndexWriter.setInfoStream Below are our the environments we are testing on: Ubuntu Hardy, Kernel 2.6.16-xenU i386 Amazon EC2, US East Region Embedded Jetty Java 1.6.0_16 Solr 1.4 Server B Ubuntu Hardy, Kernel 2.

Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi, Check my earlier reply. You have explicitely set onlyMorePopular to true thus you will most likely always get suggestion even if the term was spelled correctly. You'll only get no suggestions if the term is spelled correctly and it is the most `popular` term. You can opt for keeping onlyM

Re: term frequency vector access?

2010-02-11 Thread Koji Sekiguchi
Mike Perham wrote: In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP? You cannot get term vector info of a document b

Re: term frequency vector access?

2010-02-11 Thread Andrzej Bialecki
On 2010-02-11 17:04, Mike Perham wrote: In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP? No, term vectors are created

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Vauthrin, Laurent
We use the PatternTokenizerFactory. We have the following in our schema: And to get rid of '_' we just remove it from the pattern. -Original Message- From: solr-user-return-32434-laurent.vauthrin=disney@lucene.apache.org [mailto:solr-user-return-32434-laurent.vauthrin=disney@l

Re: Posting Concurrently to Solr

2010-02-11 Thread Jan Høydahl / Cominvent
You did not say how frequent you need to update the index, if this is batch type of operation or if you also have some real-time requirements after the initial load. Your ETL could use SolrJ and the StreamingUpdateSolrServer for high throughput. You could try multiple threads pushing in parallel

Re: Posting Concurrently to Solr

2010-02-11 Thread abhishes
I will run update index once a day. Regards, Abhishek --Original Message-- From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Re: Posting Concurrently to Solr Sent: Feb 11, 2010 22:17 You did not say how frequent you need to update

Re: Distributed search and haproxy and connection build up

2010-02-11 Thread Tim Underwood
Have you played around with the "option httpclose" or the "option forceclose" configuration options in HAProxy (both documented here: http://haproxy.1wt.eu/download/1.3/doc/configuration.txt)? -Tim On Wed, Feb 10, 2010 at 10:05 AM, Ian Connor wrote: > Thanks, > > I bypassed haproxy as a test and

Re: implementing profanity detector

2010-02-11 Thread Grant Ingersoll
On Jan 28, 2010, at 4:46 PM, Mike Perham wrote: > We'd like to implement a profanity detector for documents during indexing. > That is, given a file of profane words, we'd like to be able to mark a > document as safe or not safe if it contains any of those words so that we > can have something si

Re: Distributed search and haproxy and connection build up

2010-02-11 Thread Ian Connor
Not yet - but thanks for the link. I think that the OS also has a timeout that keeps it around even after this event and with heavy traffic I have seen this build up. Having said all this, the performance impact after testing was negligible for us but I thought I would post that haproxy can cause

Re: spellcheck

2010-02-11 Thread michaelnazaruk
I change config, but i get the same result! dismax false false true external query spellcheck mlt -- View this message in context: http://old.nabble.com/spellcheck-tp27527425p27550755.html Sent from the Solr - U

parabolic type function centered on a date

2010-02-11 Thread Nagelberg, Kallin
Hi everyone, I'm trying to enhance a more like this search I'm conducting by boosting the documents that have a date close to the original. I would like to do something like a parabolic function centered on the date (would make tuning a little more effective), though a linear function would pro

Re: How to add SpellCheckResponse to Solritas?

2010-02-11 Thread Jan Høydahl / Cominvent
My problem was that spellcheck component was missing from /itas handler. With that in place, I could use $response.response.spellcheck.suggestions.collation (no idea why I needed $response.response?) to pick up the spellcheck. Now it works quite well: http://ec2-79-125-69-12.eu-west-1.compute.

How to setup solr with Struts framework (oc4j servlet)?

2010-02-11 Thread Ching Zheng
thanks.

Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi, I see you use an `external` dictionary. I've no idea what that is and how it works but it looks like the dictionary believe `populars!` is a term which obviously is not equal to `popular`. If this is an external index under your manual control; how about adding `popular` to the dictionary?

Re: Cannot get like exact searching to work

2010-02-11 Thread Ahmet Arslan
> > If you use string type for summaryExact you can run > this query summaryExact:my\ item* It will bring you all > documents begins with my item. > > Actually it won't. The data I am indexing has extra spaces > in front > and is capitalized. I really need to be able to filter it > through the >

Realtime search and facets with very frequent commits

2010-02-11 Thread Janne Majaranta
Hello, I have a log search like application which requires indexed log events to be searchable within a minute and uses facets and the statscomponent. Some stats: - The log events are indexed every 10 seconds with a "commitWithin" of 60 seconds. - 1M events / day (~75% are updates to previous eve

Re: help with facets and searchable fields

2010-02-11 Thread Jan Høydahl / Cominvent
Can you show us your field definitions and the exact query string you are using, and what you expect to see? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 15.31, adeelmahmood wrote: > > hi there > i am trying to get familiar with solr while setting it

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Ahmet Arslan
> Unfortunately, the underscore is > being quite resilient =( > > I tried the solr.MappingCharFilterFactory and know the > mapping is working as > I am changing "c" => "q" just fine. But the underscore > refuses to go! > > I am baffled . . . I just activated name="textCharNorm" in example schem

Has anyone done request logging with Solr-Ruby for use in Rails?

2010-02-11 Thread Ian Connor
The idea is that in the log is currently like: Completed in 1290ms (View: 152, DB: 75) | 200 OK [ http://localhost:3000/search?q=nik+gene+cluster&view=2] I want to extend it to also track the Solr query times and time spent in solr-ruby like: Completed in 1290ms (View: 152, DB: 75, Solr: 334) |

Re: Has anyone done request logging with Solr-Ruby for use in Rails?

2010-02-11 Thread Mat Brown
On Thu, Feb 11, 2010 at 13:07, Ian Connor wrote: > The idea is that in the log is currently like: > > Completed in 1290ms (View: 152, DB: 75) | 200 OK [ > http://localhost:3000/search?q=nik+gene+cluster&view=2] > > I want to extend it to also track the Solr query times and time spent in > solr-rub

Re: How to add SpellCheckResponse to Solritas?

2010-02-11 Thread Erik Hatcher
On Feb 11, 2010, at 12:21 PM, Jan Høydahl / Cominvent wrote: With that in place, I could use $response.response.spellcheck.suggestions.collation (no idea why I needed $response.response?) to pick up the spellcheck. $response.response is needed in the Velocity templates because $response i

"Overwriting" cores with the same core name

2010-02-11 Thread Thomas Koch
Hi, I'm currently evaluating the following solution: My crawler sends all docs to a SOLR core named "WHATEVER". Every 5 minutes a new SOLR core with the same name WHATEVER is created, but with a new datadir. The datadir contains a timestamp in it's name. Now I can check for datadirs that are ol

Re: Has anyone done request logging with Solr-Ruby for use in Rails?

2010-02-11 Thread Ian Connor
This seems to allow you to log each query - which is a good start. I was thinking of something that would add all the ms together and report it in the "completed at" line so you can get a higher level view of which requests take the time and where. Ian. On Thu, Feb 11, 2010 at 1:13 PM, Mat Brown

Re: Has anyone done request logging with Solr-Ruby for use in Rails?

2010-02-11 Thread Mat Brown
Oh - indeed - sorry, didn't read your email closely enough : ) Yeah that would probably involve some pretty crufty monkey patching / use of globals... On Thu, Feb 11, 2010 at 13:22, Ian Connor wrote: > This seems to allow you to log each query - which is a good start. > > I was thinking of somet

Re: Has anyone done request logging with Solr-Ruby for use in Rails?

2010-02-11 Thread Ian Connor
...and probably break stuff - that might be why it hasn't been done. On Thu, Feb 11, 2010 at 1:28 PM, Mat Brown wrote: > Oh - indeed - sorry, didn't read your email closely enough : ) > > Yeah that would probably involve some pretty crufty monkey patching / > use of globals... > > On Thu, Feb 11

Re: delete via DIH

2010-02-11 Thread Lukas Kahwe Smith
On 10.02.2010, at 16:41, Lukas Kahwe Smith wrote: > There is a solution to update via DIH, but is there also a way to define a > query that fetches id's for documents that should be removed? Or to phrase the question a bit more open. I have a file with id's of documents to delete (one per lin

Re: Realtime search and facets with very frequent commits

2010-02-11 Thread Jason Rutherglen
Janne, I usually just turn the caches to next to nearly off for frequent commits. Jason On Thu, Feb 11, 2010 at 9:35 AM, Janne Majaranta wrote: > Hello, > > I have a log search like application which requires indexed log events to be > searchable within a minute > and uses facets and the statsc

Re: Solr/Drupal Integration - Query Question

2010-02-11 Thread jaybytez
So I got it to work by running the drupal cron.php. I was originally trying to use the exampledocs, indexing that content, and making that index available to the Drupal solr. But it might just be that they are different indexes? And that's why I wasn't get responses. One quick question, the Dru

Re: Realtime search and facets with very frequent commits

2010-02-11 Thread Janne Majaranta
Hey Jason, Do you use faceting with frequent commits ? And by turning off the caches you mean setting autowarmcount to zero ? I did try to turn off autowarming with a 36M documents instance but getting facets over those documents takes over 10 seconds. With a warm cache it takes 200ms ... -Janne

Re: Realtime search and facets with very frequent commits

2010-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2010 at 3:21 PM, Janne Majaranta wrote: > Hey Jason, > > Do you use faceting with frequent commits ? > And by turning off the caches you mean setting autowarmcount to zero ? > > I did try to turn off autowarming with a 36M documents instance but getting > facets over those document

Re: Dynamic fields with more than 100 fields inside

2010-02-11 Thread gdeconto
Xavier Schepler wrote: > > for example, "concept_user_*", and I will have maybe more than 200 users > using this feature. > I've done tests with many hundred dynamically created fields (ie foo_1 thru f_400). generally speaking, I havent noticed any noticeable performance issues from having t

Re: Dynamic fields with more than 100 fields inside

2010-02-11 Thread Mat Brown
On Thu, Feb 11, 2010 at 15:41, gdeconto wrote: > > > Xavier Schepler wrote: >> >> for example, "concept_user_*", and I will have maybe more than 200 users >> using this feature. >> > > I've done tests with many hundred dynamically created fields (ie foo_1 thru > f_400).  generally speaking, I have

Re: Realtime search and facets with very frequent commits

2010-02-11 Thread Otis Gospodnetic
Janne, The answers to your last 2 questions are both yes. I've seen that done a few times and it works. I don't have the answer to the always-hot cache question. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - O

Re: dismax and multi-language corpus

2010-02-11 Thread Otis Gospodnetic
I don't know, but the other day I did see a NPE related to fields with '-'. In Distributed Search context at least, fields with '-' were causing a NPE. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Mess

Re: Realtime search and facets with very frequent commits

2010-02-11 Thread Janne Majaranta
Ok, Thanks Yonik and Otis. I already had static warming queries with facets turned on and autowarming at zero. There were a lot of other optimizations after that however, so I'll try with zero autowarming and static warming queries again. If that doesn't work, I'll go with 3 instances on the same

Re: sorting

2010-02-11 Thread Otis Gospodnetic
Claudio, If I understand correctly, the problem is that you are trying to sort on a tokenized text field. That won't work and for something like "content" field that corresponds to the content of a web page, it doesn't even make much sense. What you may to do is create another *string* field a

Site search upsells & boosting by content type

2010-02-11 Thread Brandon Konkle
Good afternoon! I was in the IRC room earlier this morning with a problem, and I'm still having difficulty with it. I'm trying to do a site search upsell so that sponsored results can be highlighted and boosted to the top of the results. I need to have my default operator set to AND, because i

Re: dismax and multi-language corpus

2010-02-11 Thread Otis Gospodnetic
Claudio, Ah, through multilingual indexing/search work (with http://www.sematext.com/products/multilingual-indexer/index.html ) I learned that cross-language search often doesn't really make sense, unless the search involves "universal terms" (e.g. Fiat, BMW, Mercedes, Olivetti, Tomi de Paola,

Re: question/suggestion for Solr-236 patch

2010-02-11 Thread Otis Gospodnetic
Gerald, Your suggestion will likely get lost in the piles of solr-user email. You should add your comments to JIRA-236 directly. Otis - Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: gdecon

Re: Getting max/min dates from solr index

2010-02-11 Thread Otis Gospodnetic
Mark, Yes, facets will give you that information. Min/max StatsComponent? See http://www.search-lucene.com/?q=StatsComponent Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Mark

Problem with Spatial Search

2010-02-11 Thread Emad Mushtaq
> > Hello, > > I have a question related to local solr. For certain locations (latitude, > longitude), the spatial search does not work. Here is the query I try to > make which gives me no results: > > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.060547&radius=450 > > However if I make th

Re: Collating results from multiple indexes

2010-02-11 Thread Otis Gospodnetic
Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Jan Høydahl / Comi

Re: Faceting

2010-02-11 Thread Otis Gospodnetic
Note that UIMA doesn't doe NER itself (as far as I know), but instead relies on GATE or OpenNLP or OpenCalais, AFAIK :) Those interested in UIMA and living close to New York should go to http://www.meetup.com/NYC-Search-and-Discovery/calendar/12384559/ Otis Sematext :: http://sematext.com

Re: dismax and multi-language corpus

2010-02-11 Thread Sven Maurmann
Hi, this is correct. Usually one does not know, how a stemmer - or other language specific filters - behaves in the context of a foreign language. But there is an exception that sometimes comes to the rescue: If one has a stable dictionary of terms in all the languages of interest, then one migh

How to reindex data without restarting server

2010-02-11 Thread Emad Mushtaq
Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets say I make a change in the schema file. That would require me to reindex data. Is there a solution to this ? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/

Re: How to reindex data without restarting server

2010-02-11 Thread Sven Maurmann
Hi, restarting the Solr server wouldn't help. If you want to re-index your data you have to pipe it through the whole process again. In your case it might be a good idea to consider having several cores holding the different schema definitions. This will not save you from getting the original da

Re: How to reindex data without restarting server

2010-02-11 Thread Emad Mushtaq
Thanks for responding to my question. Let me just put a situation that might arise in future. I decide to add a new field to the schema. So if I have understood you correctly, "piping it through the whole process" would mean, that I delete records one by one, and add the same records again. Basica

Re: sorting

2010-02-11 Thread Claudio Martella
Hi, thanks for your answer. I'm getting crazy for this. No. I did not define any sorting or scoring explicitly. But solr isn't working with my requestHandler. It complains about sorting on the content field. I agree with you. sorting on content wouldn't make much sense. On my first post i quoted

Dismax phrase queries

2010-02-11 Thread Jason Rutherglen
I'd like to boost an exact phrase match such as q="video poker" over q=video poker. How would I do this using dismax? I tried pre-processing video poker into, video poker "video poker" however that just gets munged by dismax into "video poker video poker"... Which is wrong. Cheers!

Re: How to reindex data without restarting server

2010-02-11 Thread Joe Calderon
if you use the core model via solr.xml you can reload a core without having to to restart the servlet container, http://wiki.apache.org/solr/CoreAdmin On 02/11/2010 02:40 PM, Emad Mushtaq wrote: Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets sa

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Christopher Ball
I think I am making some progress - the key suggestion was to look at the analysis.jsp which I foolishly had forgotten =(. I think it is actually a bug in the ShingleFilterFactory when it is used in subsequent to another Filter which removes tokens, e.g. StopFilterFactory or WordDelimiterFactory.

Re: dismax and multi-language corpus

2010-02-11 Thread Jason Rutherglen
That's a bug, IMO... On Thu, Feb 11, 2010 at 1:30 PM, Otis Gospodnetic wrote: > I don't know, but the other day I did see a NPE related to fields with '-'.   > In Distributed Search context at least, fields with '-' were causing a NPE. > > > Otis > > Sematext :: http://sematext.com/ :: Solr

Re: Indexing / querying multiple data types

2010-02-11 Thread Lance Norskog
I gave you bad advice about qt=. Erik Hatcher kindly corrected me: >> Actually qt selects the request handler. defType selects the query parser. >> qt may implicitly select a query parser of course, but that would depend on >> the request handler definition. On Wed, Feb 10, 2010 at 1:10 PM, S

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Steven A Rowe
Hi Christopher, ShingleFilter(Factory), by design, inserts underscores for empty positions, so that you don't get shingles created from non-contiguous tokens. It would probably be better to treat empty positions as edges, like an end-of-stream followed by a beginning-of-stream, and only output

Re: dismax and multi-language corpus

2010-02-11 Thread Otis Gospodnetic
I agree. I just didn't have the chance to look at it closely to get enough details for filing in JIRA. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Jason Rutherglen > To: solr-user