Re: How can I do this in Solr?

2010-03-25 Thread Lance Norskog
You can create a field 'staff' with field values AAA_manager and BBB_coordinator. This preserves the database relationship. In general, think of a Solr index as one database table: you have to flatten (denormalize) a standard database schema. 2010/3/25 scott chu : > I have a input xml data file &

Re: How to add a new field to existing document (append a new field to already existing document)

2010-03-25 Thread Lance Norskog
There is no feature in Lucene that allows it to append new fields to an existing document. You have to re-index the entire thing. On Thu, Mar 25, 2010 at 6:56 PM, bbarani wrote: > > Hi, > > I have a peculiar situation, > > I am having my DIH dataconfig file as below > > ---> object > cachekey=y

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-25 Thread Lance Norskog
Do you want to index the text in the attachments? If so, you probably are better off creating a unique document for the mail body and each attachment. A field in the document could give the id of the main email document. The main email document could contain a multivalued field giving all of the a

Re: Threads blocking on solr slave servers

2010-03-25 Thread Lance Norskog
The Solr instance can only handle so many queries per second. It should be possible to configure the web app container to limit the total number of active threads. Also, have you watched the garbage collection statistics with 'jconsole' or another interactive tool? You might discover that it is sp

Re: expungeDeletes on commit in Dataimport

2010-03-25 Thread Lance Norskog
Oops- solrconfig.xml does not include an option for autocommit to use expungeDeletes. You will have to do a operation directly. On Thu, Mar 25, 2010 at 8:18 PM, Lance Norskog wrote: > You can do autoCommit in solrconfig.xml. This runs regular commits > independently of the DataImportHandler. > >

Re: keyword query tokenizer

2010-03-25 Thread Tommy Chheng
Multi-field searches is one reason of doing the tokenizing in the parser. Imagine if your query was "name:bob content:climate" The parser can tokenize the query into "name:bob", "content:climate" and pass each into their own analyzer. Tommy Chheng Programmer and UC Irvine Graduate Student Tw

Re: expungeDeletes on commit in Dataimport

2010-03-25 Thread Lance Norskog
You can do autoCommit in solrconfig.xml. This runs regular commits independently of the DataImportHandler. On Thu, Mar 25, 2010 at 9:44 AM, Ruben Chadien wrote: > Hi > > I know this has been discussed before, but is there any way do > expungeDeletes=true when the DataImportHandler does the commi

Re: solr highlighting

2010-03-25 Thread Lance Norskog
To display html-markup in an html page, it has to be in entity-encoded form. So, encode the <> as entities in your input application, and have it indexed and stored in this format. Then, the are inserted as normal. This gives you the html text displayable in an html page, with all words highlighta

Re: RejectedExecutionException when searching with DirectSolrConnection

2010-03-25 Thread Don Werve
Nope; in fact, I thought that might be the problem, so I spent some time to make sure that I was (a) only loading a given core once, and (b) that all of my setup/teardown was threadsafe. So it's just one DirectSolrConnection for the lifetime of the application, and it only gets closed when the JVM

Re: Solrj doesn't tell if PDF was actually parsed by Tika

2010-03-25 Thread Lance Norskog
Please file a bug for this on the JIRA. https://issues.apache.org/jira/secure/Dashboard.jspa On Thu, Mar 25, 2010 at 7:21 AM, Abdelhamid ABID wrote: > Hi, > When posting pdf files using solrj the only response we get from Solr is > only server response status, but never know whether > pdf was a

Re: RejectedExecutionException when searching with DirectSolrConnection

2010-03-25 Thread Mark Miller
This type of thing has come up in the past - check the searchable lists (eg search.lucidimagination.com). Not sure there is much helpful there though. To be honest, this should *really* not be possible unless the executor is in the SHUTDOWN state - which should really not be possible unless c

Re: keyword query tokenizer

2010-03-25 Thread Jason Chaffee
I am curious as to why the query parser does any tokenizing? I would think you would want control/configure this with your analyzers? Does anyone know the answer to this. Is there a performance gain or something? Thanks, Jason On Mar 25, 2010, at 4:04 PM, "Ahmet Arslan" wrote: > I hav

How can I do this in Solr?

2010-03-25 Thread scott chu
I have a input xml data file & it has a 'Reporters' tag looks like this: AAA manager BBB coordinator You see name & title are paired. As I know, Solr only support a field with mutliple value of primitive type, e.g. string. But in my case, it

Re: multicore embedded swap / reload etc.

2010-03-25 Thread Lance Norskog
All operations through the SolrJ work exactly the same against the Solr web app and embedded Solr. You code the calls to update cores with the same SolrJ APIs either way. On Wed, Mar 24, 2010 at 2:19 PM, Nagelberg, Kallin wrote: > Hi, > > I've got a situation where I need to reindex a core once a

Re: document categorization using solr?

2010-03-25 Thread Tommy Chheng
Hi Joel, Do you need a supervised or unsupervised classification? supervised: u have examples of your classes unsupervised: u don't know your classes in advance In the contribs, there is a solr clustering component which will handle unsupervised classification: http://wiki.apache.org/solr/Clus

RejectedExecutionException when searching with DirectSolrConnection

2010-03-25 Thread Don Werve
This is one of those fantastically irritating bugs that only pops up... sometimes. I'm using DirectSolrConnection to provide search for a JRuby application, and everything works great, except every now and then, I get a ava.util.concurrent.RejectedExecutionException in the error log. I wrote a te

How to add a new field to existing document (append a new field to already existing document)

2010-03-25 Thread bbarani
Hi, I have a peculiar situation, I am having my DIH dataconfig file as below ---> object --> object properties Now since I am using Cachedsql entity processor I am getting in to out of memory exception very often so to tackle the issue I thought of adding a filter criteria to my entity Y in

document categorization using solr?

2010-03-25 Thread Joel Nylund
Hi, Does solr have something built in, or recommended add-on that does document categorization? ( I found a thread about a year ago, but not exact same topic) For example, here is a commercial categorization product that will take a website and categorize it http://grapeshot.co.uk/onlin

Re: wikipedia and teaching kids search engines

2010-03-25 Thread Mark Miller
On 03/24/2010 10:40 AM, Erik Hatcher wrote: I've got a couple of questions for the community... * what's the simplest way to get Solr up and running with a relatively richly schema'd index of a Wikipedia dump? What I'm looking for is something as easy as something along these lines: java

Re: Impossible Boost Query?

2010-03-25 Thread Geert-Jan Brits
Have a look at functionqueries. http://wiki.apache.org/solr/FunctionQuery You could for instance use your regular score and multiply it with RandomValueSource bound between 1.0 and 1.1 for example. This would at least break ties in a possibly natural loo

Re: Impossible Boost Query?

2010-03-25 Thread Blargy
Ok so this is basically just a random sort. Anyway I can get this to randomly sort documents that closely related and not the rest of the results? -- View this message in context: http://n3.nabble.com/Impossible-Boost-Query-tp472080p580214.html Sent from the Solr - User mailing list archive at

Re: Impossible Boost Query?

2010-03-25 Thread Lance Norskog
The RandomValueSource class is available as a sort value, but it is not available as a function. If it was, you could include the function as part of the relevance but not all of it. On Wed, Mar 24, 2010 at 9:41 AM, blargy wrote: > > This sound a little closer to what I want but I don't want full

Re: keyword query tokenizer

2010-03-25 Thread Jason Chaffee
Thanks, didn't realize that. On Mar 25, 2010, at 4:04 PM, "Ahmet Arslan" wrote: > I have the following configured for a > particular field: > > > > > > class="solr.KeywordTokenizerFactory" /> > > class="solr.LowerCaseFilterFactory" /> > > > > > > > > I am using

Re: keyword query tokenizer

2010-03-25 Thread Ahmet Arslan
> I have the following configured for a > particular field: > > > >       > >         class="solr.KeywordTokenizerFactory" /> > >         class="solr.LowerCaseFilterFactory" /> > >       > > > > > > I am using dismax and querying multiple fields and I expect > the query to > be pa

keyword query tokenizer

2010-03-25 Thread Jason Chaffee
I have the following configured for a particular field: I am using dismax and querying multiple fields and I expect the query to be parsed different for each field. For some reason, it is not kept as single token for this field's query. For example, t

RE: Field Collapsing SOLR-236

2010-03-25 Thread Rob Z
What do you mean you had to revert to Trunk 1.5. Do you mean upgrade? Which version were you using before hand? Can you please list the exact version of 1.5 and the patch # you used. I downloaded the latest nightly build and tried patching using the 2/1 patch. Everything went ok but I am gett

RE: Field Collapsing SOLR-236

2010-03-25 Thread Rob Z
Thanks Ill give field-collapse-5 a try although I heard it has some bad memory bugs in it. > From: martijn.is.h...@gmail.com > Date: Thu, 25 Mar 2010 13:29:30 +0100 > Subject: Re: Field Collapsing SOLR-236 > To: solr-user@lucene.apache.org > > Hi Blargy, > > The latest path is not compatible w

Re: How do I create a solr core with the data from an existing one?

2010-03-25 Thread Chris Hostetter
: *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy : of the core, and then swapping it in for the main core. I tried following ... : The problem I am having is, the core created in step 1 doesn't have any data : in it. If I am going to do a full index of everyth

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-25 Thread Chris Hostetter
: > I tried calling the addFile() twice (one call for each file) and no : > error but nothing getting indexed as well. ... : Write your own RequestHandler that uses the existing ExtractingRequestHandler : to actually parse the streams, and then you combine the results arbitrarily in : your

Threads blocking on solr slave servers

2010-03-25 Thread dipti khullar
Hi Since last 2 days we are facing extremely slow behaviour with solr slave servers. The threads are either in blocking or waiting state and require a restart of servers to keep the production running. Following are the relevant details: There are 2 search servers in a virtualized VMware environ

expungeDeletes on commit in Dataimport

2010-03-25 Thread Ruben Chadien
Hi I know this has been discussed before, but is there any way do expungeDeletes=true when the DataImportHandler does the commit. I am using the deleteDocByQuery in a Transformer when doing a delta-import and as discussed before the documents are not deleted until restart. Also, how do i know i

solr highlighting

2010-03-25 Thread Niraj Aswani
Hi, I am using the following two parameters to highlight the hits. "hl.simple.pre=" + URLEncoder.encode("") "hl.simple.post=" + URLEncoder.encode("") This seems to work. However, there is a bit of trouble when the text itself contains html markup. For example, I have indexed a document with

RE: Field Collapsing SOLR-236

2010-03-25 Thread Mark Roberts
Yeah got it working fine - but I needed to revert to Trunk (1.5) to get the patch to apply. It does certainly have some performance implications, but tweaking configuration can help here. Overall the benefits very much outweigh the costs for us :) Mark. -Original Message- From: Denn

Re: wikipedia and teaching kids search engines

2010-03-25 Thread Jon Baer
Just throwing this out there ... I recently saw something I found pretty interesting from CMU ... http://csunplugged.org/activities The search algorithm exercise was focused on a Battleship lookup I think. - Jon On Mar 24, 2010, at 10:40 AM, Erik Hatcher wrote: > I've got a couple of quest

Solrj doesn't tell if PDF was actually parsed by Tika

2010-03-25 Thread Abdelhamid ABID
Hi, When posting pdf files using solrj the only response we get from Solr is only server response status, but never know whether pdf was actually parsed or not, checking the log I found that some Tika wasn't able to succeed with some pdf files because of content nature (texts in images only) or are

no of cfs files are more that the mergeFactor

2010-03-25 Thread kishan
hi, I had my mergeFactor as 5 , but when i load a data with some 1,00,000 i got some 12 .cfs files in my data/index folder . How come this is possible . in what context we can have more no of .cfs files please tell me. Thanks, Prasad -- View this message in context: http://n3.nabble.com/n

dismax multi search?

2010-03-25 Thread solruser2010
Hi, I am trying to see if something like this is possible using a dismax query I want to be able to direct some search terms to specific fields I want to do something like this keyword1 should search against book titles / authors keyword2 should search against book contents / book info / user

Re: If you could have one feature in Solr...

2010-03-25 Thread Jacob Elder
1. Real time or near-real time updates. 2. First-class spatial search. On Wed, Feb 24, 2010 at 9:42 AM, Grant Ingersoll wrote: > What would it be? > -- Jacob Elder

Re: Field Collapsing SOLR-236

2010-03-25 Thread Martijn v Groningen
Hi Blargy, The latest path is not compatible with 1.4, I believe that the latest field-collapse-5.patch file is compatible with 1.4. The file should at least compile with 1.4 trunk. I'm not sure how the performance is. Martijn On 25 March 2010 01:49, Dennis Gearon wrote: > Boy, I hope that fiel

Re: How to compose a query from multiple HTTP URL parameters?

2010-03-25 Thread Abdelhamid ABID
Hi, You may implement this alternative by using "URL rewrite" mechanism either by coding your own filter or by pulling back your servlet engine behind the Apache httpd in order to benefit from mod_rewrite. On 3/25/10, Conal Tuohy wrote: > > I would like to be able to specify a query over multipl