Re: faceting question

2009-01-23 Thread Shalin Shekhar Mangar
On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz wrote: > Hello; > > I got a multiField named tagList which may contain multiple tags. I am > making a query like: > > tagList:a AND tagList:b AND tagList:c > > and I am also getting a tagList facet returning me some values. > > What I would like is Solr t

Re: DataImport TXT file entity processor

2009-01-23 Thread Shalin Shekhar Mangar
On Sat, Jan 24, 2009 at 5:56 AM, Nathan Adams wrote: > Is there a way to us Data Import Handler to index non-XML (i.e. simple > text) files (either via HTTP or FileSystem)? I need to put the entire > contents of a text file into a single field of a document and the other > fields are being pulle

Re: Should I extend DIH to handle POST too?

2009-01-23 Thread Shalin Shekhar Mangar
There's another option. Using DIH with Solrj. Take a look at: https://issues.apache.org/jira/browse/SOLR-853 There's a patch there but it hasn't been updated to trunk. A contribution would be most welcome. On Sat, Jan 24, 2009 at 3:11 AM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: >

Re: Results not appearing

2009-01-23 Thread Chris Harris
These might be obvious, but: * I assume you did a Solr commit command after indexing, right? * If you are using the fieldtype definitions from the default schema.xml, then your "string" fields are not being analyzed, which means you should expect search results only if you enter the entire, exact

Results not appearing

2009-01-23 Thread Johnny X
I've indexed my XML using the below in the schema: Message-ID However searching via the Message-ID or Content fields returns 0. Using Luke I can still see these fields are stored however. Out of interest

faceting question

2009-01-23 Thread Cam Bazz
Hello; I got a multiField named tagList which may contain multiple tags. I am making a query like: tagList:a AND tagList:b AND tagList:c and I am also getting a tagList facet returning me some values. What I would like is Solr to return me facets as if the query was: tagList:a AND tagList:b is

DataImport TXT file entity processor

2009-01-23 Thread Nathan Adams
Is there a way to us Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)? I need to put the entire contents of a text file into a single field of a document and the other fields are being pulled out of Oracle... -Nathan

Re: Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
Chris, Ahmet - thanks for the responses. Ahmet - yes, i want to see "run" as a top term + the original words that formed that term The reason is that due to mis-stemming, the terms could become non-english. ex: "permanent" would stem to "perm", "archive" would become "archiv". I need to extract

Re: Solr stemming -> preserve original words

2009-01-23 Thread AHMET ARSLAN
I didn't understand what exactly you want. if a document has run(10), running(20), runner(2), runners(8): (assuming stemmer reduces all those words to run) with non-stemmed you will see: running(20) run(10) runners(8) runner(2) with stemmed you will see: run(40) You want to see run as a top te

Re: Solr stemming -> preserve original words

2009-01-23 Thread Chris Harris
It seems like what's desired is not so much a stemmer as what you might call a "canonicalizer", which would translate each source word not into its "stem" but into its "most canonical form". Critically, the latter, by definition, is always a legitimate word, e.g. "run". What's more, it's always the

Should I extend DIH to handle POST too?

2009-01-23 Thread Gunaranjan Chandraraju
Hi I had earlier described my requirement of needing to 'post XMLs as-is' to SOLR and have it handled just as the DIH would do on import using the mapping in data-config.xml. I got multiple answers for the 'post approach' - the top two being - Use SOLR CELL - Use SOLRJ In general I would

Re: Issue indexing in Solr

2009-01-23 Thread Jeff Newburn
The best way to find out what was wrong with the request is going to be the web server logs. It should throw an exception that usually complains about fields missing or incorrect. As to the committing solr has an autocommit option that will fire after a designated amount of changes have been ente

Issue indexing in Solr

2009-01-23 Thread Johnny X
I keep getting the error "FATAL: Solr returned an error: Bad Request" Solr is running on a different port (8080) so I changed the command line request to "java -Durl=http://localhost:8080/solr/update -jar post.jar *.xml" which seems to at least initiate. "WARNING: Make sure your XML documents a

Re: Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
hi Ahmet, thanks. when i look at the non_stemmed_text field to get the top terms, i will not be getting the useful feature of aggregating many related words into one (which is done by stemming). for ex: if a document has run(10), running(20), runner(2), runners(8) - i would like to see a a "top t

Re: Solr stemming -> preserve original words

2009-01-23 Thread AHMET ARSLAN
I think best way to get non-stemmed top terms is to index the field using a fieldType that does not employes any stem filter. For example: By using copyField you can store two (or more) versions of a field. Stemmed and non-stemmed. Just a new field: And a copy field: Schema Brow

Re: Solr schema causing an error

2009-01-23 Thread Johnny X
Wicked...you fixed it! Thanks very much. Pretty simple in the end I guess...but I thought it might be. Cheers. Jeff Newburn wrote: > > The important info you are looking for is "undefined field sku at". It > looks like there may be a copyfield in the schema looking for a field > named > s

Re: Solr schema causing an error

2009-01-23 Thread Johnny X
Well here are the first 10/15 lines: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null -

Re: Solr schema causing an error

2009-01-23 Thread Jeff Newburn
The important info you are looking for is "undefined field sku at". It looks like there may be a copyfield in the schema looking for a field named sku which does not exist. Just search "sku" in the file and see what comes up. On 1/23/09 11:15 AM, "Johnny X" wrote: > > Well here are the first

Re: Solr schema causing an error

2009-01-23 Thread Jeff Newburn
The first 10-15 lines of the jargon might help. Additionally, the full exceptions will be in the webserver logs (ie tomcat or jetty logs). On 1/23/09 10:40 AM, "Johnny X" wrote: > > Ah, gotcha. > > Where do I go to find the log messages? Obviously it prints a lot of jargon > on the admin pag

Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
hello, Is it possible to retrieve the original words once solr (Porter algorithm) stems them? I need to index a bunch of data, store it in solr, and get back a list of most frequent terms out of solr. and i want to see the non-stemmed version of this data. so basically, i want to enhance this: ht

Re: Solr schema causing an error

2009-01-23 Thread Johnny X
Ah, gotcha. Where do I go to find the log messages? Obviously it prints a lot of jargon on the admin page reporting the error, but is that what you want? Jeff Newburn wrote: > > Are there any error log messages? > > The difference between a string and text is that string is basically > store

Re: stats.jsp - maxDoc and numDoc-help

2009-01-23 Thread S.Selvam Siva
On Fri, Jan 23, 2009 at 10:54 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hello, > > Those two numbers won't necessarily give you the number of duplicates, as > they reflect the number of deletes in the index, and those deletes were not > necessarily caused by Solr detecting a dupl

Re: Solr schema causing an error

2009-01-23 Thread Jeff Newburn
Are there any error log messages? The difference between a string and text is that string is basically stored with no modification (it is the solr.StrField). The text type is actually defined in the fieldtype section and usually contains a tokenizer and some analyzers (usually stemming, lowercasi

Solr schema causing an error

2009-01-23 Thread Johnny X
Hi there, I just configured my Solr schema file to support the data types I wish to submit for indexing. However, as soon as try and start the Solr server I get an error trying to reach the admin page. I know this only has something to do with my definitions in the schema, because when I tried

Re: stats.jsp - maxDoc and numDoc-help

2009-01-23 Thread Otis Gospodnetic
Hello, Those two numbers won't necessarily give you the number of duplicates, as they reflect the number of deletes in the index, and those deletes were not necessarily caused by Solr detecting a duplicate insert. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Origi

Re: Intermittent high response times

2009-01-23 Thread wojtekpia
The type of garbage collector definitely affects performance, but there are other settings as well. There's a related thread currently discussing this: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-td21588427.html hbi dev wrote: > > Hi wojtekpia, > > That's inter

RE: QTime in microsecond

2009-01-23 Thread Feak, Todd
The easiest way is to run maybe 100,000 or more queries and take an average. A single microsecond value for a query would be incredibly inaccurate. -ToddFeak -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Friday, January 23, 2009 1:33 AM To: solr-user@lucene.apa

RE: Performance "dead-zone" due to garbage collection

2009-01-23 Thread Feak, Todd
Can you share your experience with the IBM JDK once you've evaluated it? You are working with a heavy load, I think many would benefit from the feedback. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Thursday, January 22, 2009 3:46 PM To: solr-user@luc

Method toMultiMap(NamedList params) in SolrParams

2009-01-23 Thread Hana
Hi, I'm getting confused about the method Map toMultiMap(NamedList params) in SolrParams class. When some of your parameter is instanceof String[] it's converted to to String using the toString() method, which seems to me to be wrong. It is probably assuming, that the values in NamedList are all

Fwd: [Travel Assistance] Applications for ApacheCon EU 2009 - Now Open

2009-01-23 Thread Erik Hatcher
Begin forwarded message: From: Tony Stevenson Date: January 23, 2009 8:28:19 AM EST To: travel-assista...@apache.org Subject: [Travel Assistance] Applications for ApacheCon EU 2009 - Now Open The Travel Assistance Committee is now accepting applications for those wanting to attend Apa

Re: I get SEVERE: Lock obtain timed out

2009-01-23 Thread Jerome L Quinn
Julian Davchev wrote on 01/20/2009 10:07:48 AM: > Julian Davchev > 01/20/2009 10:07 AM > > I get SEVERE: Lock obtain timed out > > Hi, > Any documents or something I can read on how locks work and how I can > controll it. When do they occur etc. > Cause only way I got out of this mess was rest

Re: how can solr search angainst group of field

2009-01-23 Thread Marc Sturlese
I think you could use dismax and restric de result with a filter query. Suposing you're using dismaxquery parser it should look like: http://localhost:8080/solr/select?q=whatever&fq=category:3 I think this would sort your case surfer10 wrote: > > definitly disMax do the thing by searching one

Re: Master failover - seeking comments

2009-01-23 Thread edre...@ha
Thanks for the response. Let me clarify things a bit. Regarding the Slaves: Our project is a web application. It is our desire to embedd Solr into the web application. The web applications are configured with a local embedded Solr instance configured as a slave, and a remote Solr instance confi

Re: Maximum size of document indexed

2009-01-23 Thread Erick Erickson
Try: http://wiki.apache.org/solr/SolrConfigXml?highlight=(maxfieldlength) Best Erick On Fri, Jan 23, 2009 at 7:29 AM, Gargate, Siddharth wrote: > Hi, > I am trying to index a 25 MB word document. I am not able to search all > the keywords. Looks like only certain number of initial words are > ge

search/query issue. sorting, match exact, match first etc

2009-01-23 Thread Julian Davchev
Hi, I am trying to utilize solr into an autocomplete thingy. Let's assume I query for 'foo'. Assuming we work with case insensitive here. I would like to have records returned in specific order. First all that have exact match, then all that start with Foo in alphabetical order, then all that con

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Jaco
Hi, I have tested this as well, looking fine! Both issues are indeed fixed, and the index directory of the slaves gets cleaned up nicely. I will apply the changes to all systems I've got running and report back in this thread in case any issues are found. Thanks for the very fast help! I usually

Maximum size of document indexed

2009-01-23 Thread Gargate, Siddharth
Hi, I am trying to index a 25 MB word document. I am not able to search all the keywords. Looks like only certain number of initial words are getting indexed. Is there any limit to the size of document getting indexed? Or is there any word count limit per field? Thanks, Siddharth

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have opened an issue to track this https://issues.apache.org/jira/browse/SOLR-978 On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote: > I tested with the patch > it has solved both the issues > > On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar > wrote: >> >> >> On Fri, Ja

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
I tested with the patch it has solved both the issues On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar wrote: > > > On Fri, Jan 23, 2009 at 2:12 PM, Jaco wrote: >> >> Hi, >> >> I applied the patch and did some more tests - also adding some LOG.info() >> calls in delTree to see if it actual

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Shalin Shekhar Mangar
On Fri, Jan 23, 2009 at 2:12 PM, Jaco wrote: > Hi, > > I applied the patch and did some more tests - also adding some LOG.info() > calls in delTree to see if it actually gets invoked (LOG.info("START: > delTree: "+dir.getName()); at the start of that method). I don't see any > entries of this sho

Re: URL-import field type?

2009-01-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jan 23, 2009 at 2:55 PM, Paul Libbrecht wrote: > > Le 23-janv.-09 à 10:10, Noble Paul നോബിള്‍ नोब्ळ् a écrit : >> >> if the response is not XML ,then there is no EntityProcessor that can >> consume this. We may need to add one. > > well, even binary data such as word documents (base64-enc

Re: Any advice for facet.prefix for suggestions

2009-01-23 Thread Erik Hatcher
Ian, A new field is indeed needed and warranted for this case. Facets only work off indexed terms, not stored. Erik On Jan 22, 2009, at 11:48 PM, Ian Connor wrote: The facet prefix method to get suggestions for search terms really helps. However, it seems to show the indexed rat

Re: Intermittent high response times

2009-01-23 Thread hbi dev
Hi wojtekpia, That's interesting, I shall be looking into this over the weekend so I shall look at the GC also. I was briefly reading about GC last night, am I right in thinking it could be affected by what version of the jvm I'm using (1.5.0.8), and also what type of Collector is set? What collec

facet dates and distributed search

2009-01-23 Thread Marc Sturlese
Hey there, I would like to understand why distributed search doesn't suport facet dates. As I understand it would have problems because if the time of the servers is not syncronized, the results would not be exact but... In case I wouldn't mind if results are completley exacts... would be possible

Re: What can be the reason for stopping solr work after some time?

2009-01-23 Thread an...@iguanait.com
Hi, thanks for your reply. Sorry for lesser information that i gave in my first post, i just didn't know what to share. Yes, java proccess is still working, but search in the site does not work and i cannot see any http request at this time in the logs. I have not tested the admin page, this is s

QTime in microsecond

2009-01-23 Thread AHMET ARSLAN
Is there a way to get QTime in microsecond from solr? I have small set of collection and my response time (QTime) is 0 or 1 milliseconds. I am running benchmark tests and I need more sensitive running times for comparision. Thanks for your help.

Re: URL-import field type?

2009-01-23 Thread Paul Libbrecht
Le 23-janv.-09 à 10:10, Noble Paul നോബിള്‍ नोब्ळ् a écrit : if the response is not XML ,then there is no EntityProcessor that can consume this. We may need to add one. well, even binary data such as word documents (base64-encoded for example) run the risk of appearing here. They sure need

Re: DIH XPathEntityProcessor fails with docs containing

2009-01-23 Thread Fergus McMenemie
Seems to work fin on this mornings 23-jan-2009 nightly. Thanks very much. >On Wed, Jan 21, 2009 at 6:05 PM, Fergus McMenemie wrote: > >> >> After looking looking at http://issues.apache.org/jira/browse/SOLR-964, >> where >> it seems this issue has been addressed, I had another go at indexing >

Re: URL-import field type?

2009-01-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jan 23, 2009 at 2:28 PM, Paul Libbrecht wrote: > Well, > > the idea is that the solr engine indexes the contents of a web platform. > > Each document is a user-side-URL out of which several fields would be > fetched through various URL-get-documents (e.g. the full-text-view, e.g. the > fut

Re: URL-import field type?

2009-01-23 Thread Paul Libbrecht
Well, the idea is that the solr engine indexes the contents of a web platform. Each document is a user-side-URL out of which several fields would be fetched through various URL-get-documents (e.g. the full-text-view, e.g. the future openmath representation, e.g. the topics (URIs in an onto

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Jaco
Hi, I applied the patch and did some more tests - also adding some LOG.info() calls in delTree to see if it actually gets invoked (LOG.info("START: delTree: "+dir.getName()); at the start of that method). I don't see any entries of this showing up in the log file at all, so it looks like delTree d

stats.jsp - maxDoc and numDoc-help

2009-01-23 Thread S.Selvam Siva
Hi all, i am new to solr.I have posted nearly 10 lakh xml docs for the last few months. Now i want to find out the total number of duplicate posts untill now. whether the stats.jsp's numDocs and maxDocs is the appropriate one to find out the total duplicate post(maxDocs-numDocs) so far? please

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Shalin Shekhar Mangar
Yes Solr does. But DataImportHandler with the 1.3 release does not support it. However, you can use the trunk data import handler jar with Solr 1.3 if you do not feel comfortable using Solr 1.4 trunk. On Fri, Jan 23, 2009 at 1:36 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: > > I t

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Gunaranjan Chandraraju
I thought 1.3 supported dynamic fields in schema.xml? Guna On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 2

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-23 Thread Gunaranjan Chandraraju
I thought 1.3 supported dynamic fields in schema.xml? Guna On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 2