Re: Filtering results

2010-02-04 Thread Ahmet Arslan
> Hi > I want to add a filter to my query which takes documents > whose "city" > field has either Bangalore of cochin or Bombay. how do i do > this? > > fq=city:bangalore&fq=city:bombay& fq=city:cochin > will take the > intersection. I need the union. fq=city:(bangalore OR cochin OR bombay) sam

Filtering results

2010-02-04 Thread Abin Mathew
Hi I want to add a filter to my query which takes documents whose "city" field has either Bangalore of cochin or Bombay. how do i do this? fq=city:bangalore&fq=city:bombay& fq=city:cochin will take the intersection. I need the union. Please help Thanks

Re: How to return filtered tokens as query results?

2010-02-04 Thread Ahmet Arslan
> Is there a way to return Solr's > analyzed/filtered tokens from a query, > rather than the original indexed data?  (Ideally at a > fairly high level like > solrj). TermVectorComponent [1] can do that. [1]http://wiki.apache.org/solr/TermVectorComponent

How to return filtered tokens as query results?

2010-02-04 Thread Gregg Horan
Is there a way to return Solr's analyzed/filtered tokens from a query, rather than the original indexed data? (Ideally at a fairly high level like solrj). Thanks

Re: Thanks Robert!

2010-02-04 Thread Robert Muir
thanks, well All would be a little inaccurate... we still have one huge monster (Synonyms) remaining and some other smaller stuff: SOLR-1657 has a list with the finished stuff crossed-out. think WDF took a year off my life, but will take a second look now and see if i can resolve some more of thes

source tree for lucene

2010-02-04 Thread Joe Calderon
i want to recompile lucene with http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure which source tree to use, i tried using the implied trunk revision from the admin/system page but solr fails to build with the generated jars, even if i exclude the patches from 2230... im wondering i

Re: weird behabiour when setting negative boost with bq using dismax

2010-02-04 Thread Marc Sturlese
>: bq=(*:* -field_a:54^1) >I think what you want there is bq=(*:* -field_a:54)^1 >...you are "boosting" things that don't match "field_a:54" Thanks Hoss. I've updated the Wiki, the content of the bq param was wrong: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low

Re: ExtractingRequestHandler "multiple values encountered for non multiValued field last_modified"

2010-02-04 Thread Lance Norskog
The Tika integration with the DataImportHandler allows you to control many aspects of what goes into the index, including solving this problem: http://wiki.apache.org/solr/TikaEntityProcessor (Tika is the extraction library, and ExtractingRequestHandler and the TikaEntityProcessor both use it.)

Re: Using solr to store data

2010-02-04 Thread Tim Underwood
We just switched over to storing our data directly in Solr as compressed JSON fields at http://frugalmechanic.com. So far it's working out great. Our detail pages (e.g.: http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter) now make a single Solr request to grab the p

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Answering my own question... PatternReplaceFilter doesn't output multiple tokens... Which means messing with capture state... On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen wrote: > Transferred partially to solr-user... > > Steven, thanks for the reply! > > I wonder if PatternReplaceFilter can

Thanks Robert!

2010-02-04 Thread Jason Rutherglen
Robert, thanks for redoing all the Solr analyzers to the new API! It helps to have many examples to work from, best practices so to speak.

Re: Solr not starting JMX

2010-02-04 Thread Walter Underwood
I remember that I had to have a JMX password file with the right permissions, or it wouldn't start. --wunder On Feb 4, 2010, at 2:27 PM, Chris Hostetter wrote: > > : My parameters look like this (running the Solr example): > : > : java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxrem

Re: Solr not starting JMX

2010-02-04 Thread Chris Hostetter
: My parameters look like this (running the Solr example): : : java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060 : -Dcom.sun.management.jmxremote.authenticate=false : -Dcom.sun.management.jmxremote.ssl=false -jar start.jar What implementation/version of java are you ru

Re: Solr Index size : Java out of memory

2010-02-04 Thread Lance Norskog
Solr needs memory allocation for different operations, not for the index size. It needs X amount of memory for a query, Y amount of memory for document found by a query, and other things. Sorting needs memory for the number of documents. Faceting needs memory for the number of unique values in a fi

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Transferred partially to solr-user... Steven, thanks for the reply! I wonder if PatternReplaceFilter can output multiple tokens? I'd like to progressively strip the non-alphanums, for example output: apple!&* apple!& apple! apple On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe wrote: > Hi Jaso

Re: Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Yonik Seeley
On Thu, Feb 4, 2010 at 4:42 PM, Erik Hatcher wrote: > What about using Tomcat instead?   Tomcat has Windows service capability > already, right? Another part of the problem is telling the solr webapp where it's solr home is. Options: - use a tomcat context fragment (described in http://wiki.apac

Re: Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Erik Hatcher
What about using Tomcat instead? Tomcat has Windows service capability already, right? Erik On Feb 4, 2010, at 2:18 PM, Roland Villemoes wrote: Hi, I need to have Solr/Jetty running as a Windows Service. I am using the Lucid distribution. Does anyone have a running example and to

Re: Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Thanks Yonik! We want to go to Index replication soon (couple of months), which will also help with incremental updates. But for now we want a quick and dirty solution without running two servers. Does the utility look ok to index a CSV file? Is it safe to do in production environment? I know maint

Re: Indexing CSV without HTTP

2010-02-04 Thread Yonik Seeley
On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe wrote: > We are indexing quite a lot of data using update/csv handler. For > reasons I can't get into right now, I can't implement a DIH since I > can only access the DB using Stored Procs and stored proc support in > DIH is not yet available. Indexing

Re: C++ being filtered (please help)

2010-02-04 Thread Chris Hostetter
: > now i want to tokenize it based on comma or white space and : > other word : > delimiting characters only. Not on the plus sign. so that : > result after : > tokenization should be ... : > But the result I am getting is ...you haven't told us what type of analyzer settings you are curr

Re: HTTP caching and distributed search

2010-02-04 Thread Chris Hostetter
: > http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards : > =localhost:8080/solr/core1,localhost:8080/solr/core2 : You are right, etag is calculated using the searcher on core1 only and it : does not take other shards into account. Can you open a Jira issue? ...as a possible

Re: weird behabiour when setting negative boost with bq using dismax

2010-02-04 Thread Chris Hostetter
: bq=(*:* -field_a:54^1) I think what you want there is bq=(*:* -field_a:54)^1 ...you are "boosting" things that don't match "field_a:54" adding a boost value "^1" to a negated clause doesn't do much (except maybe make hte queryNorm really wacky) -Hoss

Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Hi Everyone, We are indexing quite a lot of data using update/csv handler. For reasons I can't get into right now, I can't implement a DIH since I can only access the DB using Stored Procs and stored proc support in DIH is not yet available. Indexing takes about 3 hours and I don't want to tax the

RE: fuzzy matching / configurable distance function?

2010-02-04 Thread Fuad Efendi
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1 and 3.0... There are samples of other distance in "contrib" folder If you want to play with distance, check http://issues.apache.org/jira/browse/LUCENE-2230 It works if distance is integer and follows "metric space axioms

Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Roland Villemoes
Hi, I need to have Solr/Jetty running as a Windows Service. I am using the Lucid distribution. Does anyone have a running example and tool for this? med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk Alpha Solutions A/S Borgergade 2,

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
> I've analyzed my index application > and checked the XML before executing the http request and > the field it's empty: > > > > It should be empty on SOLR. > > Probably something in the way between my application (.NET) > and the SOLR (Jetty on Ubuntu) adds the whitespace. > > Anyway, I'll t

fuzzy matching / configurable distance function?

2010-02-04 Thread Joe Calderon
is it possible to configure the distance formula used by fuzzy matching? i see there are other under the function query page under strdist but im wondering if they are applicable to fuzzy matching thx much --joe

Re: Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch
Yes, It's true that we could do it in index time if we had a way to know. I was thinking in some solution in search time, maybe measuring the % of stopwords of each document. Normally, a document of another language won't have any stopword of its main language. If you know some external software

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
I've analyzed my index application and checked the XML before executing the http request and the field it's empty: It should be empty on SOLR. Probably something in the way between my application (.NET) and the SOLR (Jetty on Ubuntu) adds the whitespace. Anyway, I'll try to remove the field

Re: query all filled field?

2010-02-04 Thread Erik Hatcher
On Feb 4, 2010, at 12:38 AM, Lance Norskog wrote: Queries that start with minus or NOT don't work. You have to do this: *:* AND -fieldX:[* TO *] That's only true for subqueries. A purely negative single top-level clause works fine with Solr. Erik On Wed, Feb 3, 2010 at 5:

RE: Guidance on Solr errors

2010-02-04 Thread Vauthrin, Laurent
Thank you for the responses! -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Wednesday, February 03, 2010 1:56 PM To: solr-user@lucene.apache.org Subject: Re: Guidance on Solr errors Inline below. On Feb 2, 2010, at 8:40 PM, Vauthrin,

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
> XML update. I'm serializing the doc > in .NET, and then using solsharp to > insert/update the doc to SOLR. > > The result is: > > >     > > > Dows this means I'm adding a whitespace on XML Update? Yes exactly. You can remove from your ... if value of fieldX.trim() is equal to ""

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
XML update. I'm serializing the doc in .NET, and then using solsharp to insert/update the doc to SOLR. The result is: Dows this means I'm adding a whitespace on XML Update? Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quinta-feira, 4 de

Re: Using + with Stopwords

2010-02-04 Thread Ahmet Arslan
> > Does any body know how to provide this ability to > search for stopwords > > CommonGramsFilterFactory [1] may help. > Sorry, Solr 1.4 has this filter.

Re: Is it posible to exclude results from other languages?

2010-02-04 Thread Ahmet Arslan
> In our indexes, sometimes we have some documents written in > other languages > different to the most common index's language. Is there any > way to give less > boosting to this documents? If you are aware of those documents, at index time you can boost those documents with a value less than 1

Re: Gathering metrics on 1.4 (was Re: Solr 1.4 - stats page slow)

2010-02-04 Thread Mark Miller
john allspaw wrote: > Heya - > > So we just upgraded our Solr install to 1.4, and there's a great CPU drop > and query response time drop. Good! > But we're seeing the slowdown in the collection of statistics (stats.jsp) > mentioned here: > > http://www.mail-archive.com/solr-user@lucene.apache.org/

Re: Some questions on solr replication backup feature

2010-02-04 Thread Licinio Fernández Maurelo
I've made a backup request to my local solr server, it works but .. can i set snapshoots dir path? El 4 de febrero de 2010 16:54, Licinio Fernández Maurelo < licinio.fernan...@gmail.com> escribió: > Hi folks, > > as we're moving to solr 1.4 replication, i want to know about backups. > > Question

Gathering metrics on 1.4 (was Re: Solr 1.4 - stats page slow)

2010-02-04 Thread john allspaw
Heya - So we just upgraded our Solr install to 1.4, and there's a great CPU drop and query response time drop. Good! But we're seeing the slowdown in the collection of statistics (stats.jsp) mentioned here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg30224.html to the tune of taki

Re: Using + with Stopwords

2010-02-04 Thread Ahmet Arslan
> Hi, > > I have some common stopwords defined like [a,the,of] etc. > Our users need the > ability to include stopwords in their search. I tried using > + sign like, > [Bank +of America] to get accurate results, but it does not > work. > > Does any body know how to provide this ability to search

Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch
Hi, In our indexes, sometimes we have some documents written in other languages different to the most common index's language. Is there any way to give less boosting to this documents? Thanks in advance, Raimon Bosch. -- View this message in context: http://old.nabble.com/Is-it-posible-to-exc

Using + with Stopwords

2010-02-04 Thread Asim Rahman
Hi, I have some common stopwords defined like [a,the,of] etc. Our users need the ability to include stopwords in their search. I tried using + sign like, [Bank +of America] to get accurate results, but it does not work. Does any body know how to provide this ability to search for stopwords - we

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
> Theoretically yes,it's correct, but i > have about 1/10 of the docs with > this field not empty and the rest is empty. > > Most of the articles have the field empty as I can see when > query *:*. How are you adding documents to solr? xml update, DIH? Probably you are adding whitespace value t

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Christoph Brill
Looks like it works. No crashes and the logs states it was added. I didn't test against acutal data, though. 04.02.2010 17:14:13 org.apache.solr.handler.extraction.ExtractingRequestHandler inform INFO: Adding Date Format: -MM-dd HH:mm:ss 04.02.2010 17:14:13 org.apache.solr.handler.extraction.E

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
Theoretically yes,it's correct, but i have about 1/10 of the docs with this field not empty and the rest is empty. Most of the articles have the field empty as I can see when query *:*. So the queries don't make sense... -Original Message- From: Ankit Bhatnagar [mailto:abhatna...@vantage

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Mark Miller wrote: > Christoph Brill wrote: > >> Cool, this way it's no longer crashing. >> >> Thanks and Regards, >> Chris >> >> Am 04.02.2010 14:29, schrieb Mark Miller: >> >> >>> Before you file a JIRA issue: >>> >>> I don't believe this is a bug, so there is likely no need for JIRA.

Re: ContentStreamUpdateRequest addFile fails to close Stream

2010-02-04 Thread Christoph Brill
Good job Mark, works fine and does not keep my files open. Thanks, Chris Am 03.02.2010 15:24, schrieb Mark Miller: > Hey Christoph, > > Could you give the patch at > https://issues.apache.org/jira/browse/SOLR-1744 a try and let me know > how it works out for you?

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Christoph Brill wrote: > Cool, this way it's no longer crashing. > > Thanks and Regards, > Chris > > Am 04.02.2010 14:29, schrieb Mark Miller: > >> Before you file a JIRA issue: >> >> I don't believe this is a bug, so there is likely no need for JIRA. Try >> putting the date.formats snipped in

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-04 Thread Jorg Heymans
Hi, I'm having some troubles getting this to work on a snapshot from 3rd feb My config looks as follows and i get this stacktrace org.apache.solr.handler.dataimport.DataImportHandlerExcep

Re: Best OCR API for solr

2010-02-04 Thread Kranti™ K K Parisa
yes tika indexes all formats. but i am specifically looking for OCR (thru java) atleast for PDF or JPEG images any clues? Best Regards, Kranti K K Parisa On Thu, Feb 4, 2010 at 8:29 PM, mike anderson wrote: > There might be an OCR plugin for Apache Tika (which does exactly this out > of > th

Some questions on solr replication backup feature

2010-02-04 Thread Licinio Fernández Maurelo
Hi folks, as we're moving to solr 1.4 replication, i want to know about backups. Questions - 1. Properties that can be set to configure this feature (only know backupAfter) 2. Is it an incremental backup or a full index snapshoot? Thx -- Lici ~Java Developer~

Storing values in addition to "last_index_time"

2010-02-04 Thread cjkadakia
I understand that upon performing an index (full-import or delta-import), the dataimport.properties file is written to with a last_index_time which can then be accessed by the data-config.xml for delta-import queries with ${dataimporter.last_index_time}. I was curious if another key could be adde

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Christoph Brill
Cool, this way it's no longer crashing. Thanks and Regards, Chris Am 04.02.2010 14:29, schrieb Mark Miller: > Before you file a JIRA issue: > > I don't believe this is a bug, so there is likely no need for JIRA. Try > putting the date.formats snipped in the defaults section rather than > simply

Solr not starting JMX

2010-02-04 Thread Jan-Simon Winkelmann
Hi everyone, I am currently trying to set up JMX support for Solr, but somehow the listening socket is not even created on my specified port. My parameters look like this (running the Solr example): java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060 -Dcom.sun.management.

ExtractingRequestHandler "multiple values encountered for non multiValued field last_modified"

2010-02-04 Thread Christoph Brill
Hi list, I'm using the ExtractingRequestHandler to extract content from documents. It's extracting the "last_modified" field quite fine, but of course only for documents where this field is set. If this field is not set I want to pass the file system timestamp of the file. I'm doing: final Conte

Re: Best OCR API for solr

2010-02-04 Thread mike anderson
There might be an OCR plugin for Apache Tika (which does exactly this out of the box except for OCR capability, i believe). http://lucene.apache.org/tika/ -mike 2010/2/4 Kranti™ K K Parisa > Hi, > > Can anyone list the best OCR APIs available to use in combination with > SOLR. > > The idea is

RE: query all filled field?

2010-02-04 Thread Ankit Bhatnagar
That's correct. If u want to find "Missing Values" ie fields for whom value is not present then u will use - Ankit -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, February 04, 2010 9:41 AM To: solr-user@lucene.apache.org Subject: RE: query all filled f

solr multicore and nfs

2010-02-04 Thread Valérie TAESCH
Hello, We are using Solr(v 1.3.0 694707 with Lucene version 2.4-dev 691741) in multicore mode with an average of 400 indexes (all indexes have the same structure). These indexes are stored on a nfs disk. A java process writes continuously in these indexes while solr is only used to read th

Solr Index size : Java out of memory

2010-02-04 Thread Smith G
Hello All, I am trying to start Solr server using Jetty ( same as in Solr tutorial in their website ). As the index size is around 3.5gb its returning OutOfMemoryError. Is it mandatory to satisfy the condition java heap size > index size ? . If yes, is there any solution to run Solr s

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
> *:* AND -fieldX:[* TO *] - returns 0 docs > > fieldX:(a*) - return docs, so I'm sure that there's docs > with this field filled. > > Any other ideias what could be wrong? There is not wrong in this scenario. If -fieldX:[* TO *] returns 0 docs, it means that all of your documents have that f

Re: query all filled field?

2010-02-04 Thread Mark Miller
Not entirely true - thats the case in Lucene, but in Solr, top level queries *can* start with minus or not. They cannot if they are nested. Both *:* AND -fieldX:[* TO *] and -fieldX:[* TO *] are the same in Solr. -- - Mark http://www.lucidimagination.com Lance Norskog wrote: > Queries th

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Before you file a JIRA issue: I don't believe this is a bug, so there is likely no need for JIRA. Try putting the date.formats snipped in the defaults section rather than simply within the RequestHandler tags. Then you should be good to go. -- - Mark http://www.lucidimagination.com Lance No

RE: Solr response extremely slow

2010-02-04 Thread Fuad Efendi
'!' :))) Plus, FastLRUCache (previous one was synchronized) (and of course warming-up time) := start complains after ensuring there are no complains :) (and of course OS needs time to cache filesystem blocks, and Java HotSpot, ... - few minutes at least...) > On Feb 3, 2010, at 1:38 PM, Rajat Gar

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
I tried another one: fieldX:["" TO *] and it returns articles with the field filled :), so I guess I'm getting there. But I tried also fieldX:[" " TO *] and get a few more results that the first one... Is there a real difference between these, and also if the results are really all docs with

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
Thanks, but still no luck with that: *:* AND -fieldX:[* TO *] - returns 0 docs fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled. Any other ideias what could be wrong? Frederico -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: quint

Re: HTTP caching and distributed search

2010-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 3, 2010 at 12:21 AM, Charlie Jackson wrote: > Currently, I've got a Solr setup in which we're distributing searches > across two cores on a machine, say core1 and core2. I'm toying with the > notion of enabling Solr's HTTP caching on our system, but I noticed an > oddity when using it

How to send web pages(urls) to solr cell via solrj?

2010-02-04 Thread dhamu
Hi, I am newbie to solr and exploring solr last few days. I am using solr cell with tika for parsing, indexing and searching Posting the rich text documents via Solrj. My actual requirement is instead of using local documents(pdf, doc & docx), i want to use webpages(urls for eg..,(http://www.apach

Re: weird behabiour when setting negative boost with bq using dismax

2010-02-04 Thread Marc Sturlese
>Generally speaking, by convention boosts in Lucene have unity at 1.0, >not 0.0. So, a "negative boost" is usually done with boosts between 0 >and 1. For this case, maybe a boost of 0.1 is what you want? I forgot to say I tried what you say aswell but didn't work. >In the standard query parse