> Hi
> I want to add a filter to my query which takes documents
> whose "city"
> field has either Bangalore of cochin or Bombay. how do i do
> this?
>
> fq=city:bangalore&fq=city:bombay& fq=city:cochin
> will take the
> intersection. I need the union.
fq=city:(bangalore OR cochin OR bombay)
sam
Hi
I want to add a filter to my query which takes documents whose "city"
field has either Bangalore of cochin or Bombay. how do i do this?
fq=city:bangalore&fq=city:bombay& fq=city:cochin will take the
intersection. I need the union.
Please help
Thanks
> Is there a way to return Solr's
> analyzed/filtered tokens from a query,
> rather than the original indexed data? (Ideally at a
> fairly high level like
> solrj).
TermVectorComponent [1] can do that.
[1]http://wiki.apache.org/solr/TermVectorComponent
Is there a way to return Solr's analyzed/filtered tokens from a query,
rather than the original indexed data? (Ideally at a fairly high level like
solrj).
Thanks
thanks, well All would be a little inaccurate... we still have one huge
monster (Synonyms) remaining and some other smaller stuff: SOLR-1657 has a
list with the finished stuff crossed-out.
think WDF took a year off my life, but will take a second look now and see
if i can resolve some more of thes
i want to recompile lucene with
http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure
which source tree to use, i tried using the implied trunk revision
from the admin/system page but solr fails to build with the generated
jars, even if i exclude the patches from 2230...
im wondering i
>: bq=(*:* -field_a:54^1)
>I think what you want there is bq=(*:* -field_a:54)^1
>...you are "boosting" things that don't match "field_a:54"
Thanks Hoss. I've updated the Wiki, the content of the bq param was wrong:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low
The Tika integration with the DataImportHandler allows you to control
many aspects of what goes into the index, including solving this
problem:
http://wiki.apache.org/solr/TikaEntityProcessor
(Tika is the extraction library, and ExtractingRequestHandler and the
TikaEntityProcessor both use it.)
We just switched over to storing our data directly in Solr as
compressed JSON fields at http://frugalmechanic.com. So far it's
working out great. Our detail pages (e.g.:
http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter)
now make a single Solr request to grab the p
Answering my own question... PatternReplaceFilter doesn't output
multiple tokens...
Which means messing with capture state...
On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen
wrote:
> Transferred partially to solr-user...
>
> Steven, thanks for the reply!
>
> I wonder if PatternReplaceFilter can
Robert, thanks for redoing all the Solr analyzers to the new API! It
helps to have many examples to work from, best practices so to speak.
I remember that I had to have a JMX password file with the right permissions,
or it wouldn't start. --wunder
On Feb 4, 2010, at 2:27 PM, Chris Hostetter wrote:
>
> : My parameters look like this (running the Solr example):
> :
> : java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxrem
: My parameters look like this (running the Solr example):
:
: java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060
: -Dcom.sun.management.jmxremote.authenticate=false
: -Dcom.sun.management.jmxremote.ssl=false -jar start.jar
What implementation/version of java are you ru
Solr needs memory allocation for different operations, not for the
index size. It needs X amount of memory for a query, Y amount of
memory for document found by a query, and other things. Sorting needs
memory for the number of documents. Faceting needs memory for the
number of unique values in a fi
Transferred partially to solr-user...
Steven, thanks for the reply!
I wonder if PatternReplaceFilter can output multiple tokens? I'd like
to progressively strip the non-alphanums, for example output:
apple!&*
apple!&
apple!
apple
On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe wrote:
> Hi Jaso
On Thu, Feb 4, 2010 at 4:42 PM, Erik Hatcher wrote:
> What about using Tomcat instead? Tomcat has Windows service capability
> already, right?
Another part of the problem is telling the solr webapp where it's solr home is.
Options:
- use a tomcat context fragment (described in
http://wiki.apac
What about using Tomcat instead? Tomcat has Windows service
capability already, right?
Erik
On Feb 4, 2010, at 2:18 PM, Roland Villemoes wrote:
Hi,
I need to have Solr/Jetty running as a Windows Service.
I am using the Lucid distribution.
Does anyone have a running example and to
Thanks Yonik! We want to go to Index replication soon (couple of
months), which will also help with incremental updates. But for now we
want a quick and dirty solution without running two servers. Does the
utility look ok to index a CSV file? Is it safe to do in production
environment? I know maint
On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe wrote:
> We are indexing quite a lot of data using update/csv handler. For
> reasons I can't get into right now, I can't implement a DIH since I
> can only access the DB using Stored Procs and stored proc support in
> DIH is not yet available. Indexing
: > now i want to tokenize it based on comma or white space and
: > other word
: > delimiting characters only. Not on the plus sign. so that
: > result after
: > tokenization should be
...
: > But the result I am getting is
...you haven't told us what type of analyzer settings you are curr
: > http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
: > =localhost:8080/solr/core1,localhost:8080/solr/core2
: You are right, etag is calculated using the searcher on core1 only and it
: does not take other shards into account. Can you open a Jira issue?
...as a possible
: bq=(*:* -field_a:54^1)
I think what you want there is bq=(*:* -field_a:54)^1
...you are "boosting" things that don't match "field_a:54"
adding a boost value "^1" to a negated clause doesn't do much (except
maybe make hte queryNorm really wacky)
-Hoss
Hi Everyone,
We are indexing quite a lot of data using update/csv handler. For
reasons I can't get into right now, I can't implement a DIH since I
can only access the DB using Stored Procs and stored proc support in
DIH is not yet available. Indexing takes about 3 hours and I don't
want to tax the
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1
and 3.0...
There are samples of other distance in "contrib" folder
If you want to play with distance, check
http://issues.apache.org/jira/browse/LUCENE-2230
It works if distance is integer and follows "metric space axioms
Hi,
I need to have Solr/Jetty running as a Windows Service.
I am using the Lucid distribution.
Does anyone have a running example and tool for this?
med venlig hilsen/best regards
Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk
Alpha Solutions A/S
Borgergade 2,
> I've analyzed my index application
> and checked the XML before executing the http request and
> the field it's empty:
>
>
>
> It should be empty on SOLR.
>
> Probably something in the way between my application (.NET)
> and the SOLR (Jetty on Ubuntu) adds the whitespace.
>
> Anyway, I'll t
is it possible to configure the distance formula used by fuzzy
matching? i see there are other under the function query page under
strdist but im wondering if they are applicable to fuzzy matching
thx much
--joe
Yes, It's true that we could do it in index time if we had a way to know. I
was thinking in some solution in search time, maybe measuring the % of
stopwords of each document. Normally, a document of another language won't
have any stopword of its main language.
If you know some external software
I've analyzed my index application and checked the XML before executing the
http request and the field it's empty:
It should be empty on SOLR.
Probably something in the way between my application (.NET) and the SOLR (Jetty
on Ubuntu) adds the whitespace.
Anyway, I'll try to remove the field
On Feb 4, 2010, at 12:38 AM, Lance Norskog wrote:
Queries that start with minus or NOT don't work. You have to do this:
*:* AND -fieldX:[* TO *]
That's only true for subqueries. A purely negative single top-level
clause works fine with Solr.
Erik
On Wed, Feb 3, 2010 at 5:
Thank you for the responses!
-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
Ingersoll
Sent: Wednesday, February 03, 2010 1:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Guidance on Solr errors
Inline below.
On Feb 2, 2010, at 8:40 PM, Vauthrin,
> XML update. I'm serializing the doc
> in .NET, and then using solsharp to
> insert/update the doc to SOLR.
>
> The result is:
>
>
>
>
>
> Dows this means I'm adding a whitespace on XML Update?
Yes exactly. You can remove from your
...
if value of fieldX.trim() is equal to ""
XML update. I'm serializing the doc in .NET, and then using solsharp to
insert/update the doc to SOLR.
The result is:
Dows this means I'm adding a whitespace on XML Update?
Frederico
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: quinta-feira, 4 de
> > Does any body know how to provide this ability to
> search for stopwords
>
> CommonGramsFilterFactory [1] may help.
>
Sorry, Solr 1.4 has this filter.
> In our indexes, sometimes we have some documents written in
> other languages
> different to the most common index's language. Is there any
> way to give less
> boosting to this documents?
If you are aware of those documents, at index time you can boost those
documents with a value less than 1
john allspaw wrote:
> Heya -
>
> So we just upgraded our Solr install to 1.4, and there's a great CPU drop
> and query response time drop. Good!
> But we're seeing the slowdown in the collection of statistics (stats.jsp)
> mentioned here:
>
> http://www.mail-archive.com/solr-user@lucene.apache.org/
I've made a backup request to my local solr server, it works but .. can i
set snapshoots dir path?
El 4 de febrero de 2010 16:54, Licinio Fernández Maurelo <
licinio.fernan...@gmail.com> escribió:
> Hi folks,
>
> as we're moving to solr 1.4 replication, i want to know about backups.
>
> Question
Heya -
So we just upgraded our Solr install to 1.4, and there's a great CPU drop
and query response time drop. Good!
But we're seeing the slowdown in the collection of statistics (stats.jsp)
mentioned here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg30224.html
to the tune of taki
> Hi,
>
> I have some common stopwords defined like [a,the,of] etc.
> Our users need the
> ability to include stopwords in their search. I tried using
> + sign like,
> [Bank +of America] to get accurate results, but it does not
> work.
>
> Does any body know how to provide this ability to search
Hi,
In our indexes, sometimes we have some documents written in other languages
different to the most common index's language. Is there any way to give less
boosting to this documents?
Thanks in advance,
Raimon Bosch.
--
View this message in context:
http://old.nabble.com/Is-it-posible-to-exc
Hi,
I have some common stopwords defined like [a,the,of] etc. Our users need the
ability to include stopwords in their search. I tried using + sign like,
[Bank +of America] to get accurate results, but it does not work.
Does any body know how to provide this ability to search for stopwords - we
> Theoretically yes,it's correct, but i
> have about 1/10 of the docs with
> this field not empty and the rest is empty.
>
> Most of the articles have the field empty as I can see when
> query *:*.
How are you adding documents to solr? xml update, DIH?
Probably you are adding whitespace value t
Looks like it works. No crashes and the logs states it was added. I
didn't test against acutal data, though.
04.02.2010 17:14:13
org.apache.solr.handler.extraction.ExtractingRequestHandler inform
INFO: Adding Date Format: -MM-dd HH:mm:ss
04.02.2010 17:14:13
org.apache.solr.handler.extraction.E
Theoretically yes,it's correct, but i have about 1/10 of the docs with
this field not empty and the rest is empty.
Most of the articles have the field empty as I can see when query *:*.
So the queries don't make sense...
-Original Message-
From: Ankit Bhatnagar [mailto:abhatna...@vantage
Mark Miller wrote:
> Christoph Brill wrote:
>
>> Cool, this way it's no longer crashing.
>>
>> Thanks and Regards,
>> Chris
>>
>> Am 04.02.2010 14:29, schrieb Mark Miller:
>>
>>
>>> Before you file a JIRA issue:
>>>
>>> I don't believe this is a bug, so there is likely no need for JIRA.
Good job Mark, works fine and does not keep my files open.
Thanks,
Chris
Am 03.02.2010 15:24, schrieb Mark Miller:
> Hey Christoph,
>
> Could you give the patch at
> https://issues.apache.org/jira/browse/SOLR-1744 a try and let me know
> how it works out for you?
Christoph Brill wrote:
> Cool, this way it's no longer crashing.
>
> Thanks and Regards,
> Chris
>
> Am 04.02.2010 14:29, schrieb Mark Miller:
>
>> Before you file a JIRA issue:
>>
>> I don't believe this is a bug, so there is likely no need for JIRA. Try
>> putting the date.formats snipped in
Hi,
I'm having some troubles getting this to work on a snapshot from 3rd feb My
config looks as follows
and i get this stacktrace
org.apache.solr.handler.dataimport.DataImportHandlerExcep
yes tika indexes all formats.
but i am specifically looking for OCR (thru java) atleast for PDF or JPEG
images
any clues?
Best Regards,
Kranti K K Parisa
On Thu, Feb 4, 2010 at 8:29 PM, mike anderson wrote:
> There might be an OCR plugin for Apache Tika (which does exactly this out
> of
> th
Hi folks,
as we're moving to solr 1.4 replication, i want to know about backups.
Questions
-
1. Properties that can be set to configure this feature (only know
backupAfter)
2. Is it an incremental backup or a full index snapshoot?
Thx
--
Lici
~Java Developer~
I understand that upon performing an index (full-import or delta-import), the
dataimport.properties file is written to with a last_index_time which can
then be accessed by the data-config.xml for delta-import queries with
${dataimporter.last_index_time}.
I was curious if another key could be adde
Cool, this way it's no longer crashing.
Thanks and Regards,
Chris
Am 04.02.2010 14:29, schrieb Mark Miller:
> Before you file a JIRA issue:
>
> I don't believe this is a bug, so there is likely no need for JIRA. Try
> putting the date.formats snipped in the defaults section rather than
> simply
Hi everyone,
I am currently trying to set up JMX support for Solr, but somehow the
listening socket is not even created on my specified port.
My parameters look like this (running the Solr example):
java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060
-Dcom.sun.management.
Hi list,
I'm using the ExtractingRequestHandler to extract content from
documents. It's extracting the "last_modified" field quite fine, but of
course only for documents where this field is set. If this field is not
set I want to pass the file system timestamp of the file.
I'm doing:
final Conte
There might be an OCR plugin for Apache Tika (which does exactly this out of
the box except for OCR capability, i believe).
http://lucene.apache.org/tika/
-mike
2010/2/4 Kranti™ K K Parisa
> Hi,
>
> Can anyone list the best OCR APIs available to use in combination with
> SOLR.
>
> The idea is
That's correct.
If u want to find "Missing Values"
ie fields for whom value is not present then u will use -
Ankit
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Thursday, February 04, 2010 9:41 AM
To: solr-user@lucene.apache.org
Subject: RE: query all filled f
Hello,
We are using Solr(v 1.3.0 694707 with Lucene version 2.4-dev 691741)
in multicore mode with an average of 400 indexes (all indexes have the
same structure).
These indexes are stored on a nfs disk.
A java process writes continuously in these indexes while solr is only
used to read th
Hello All,
I am trying to start Solr server using Jetty ( same as in
Solr tutorial in their website ). As the index size is around 3.5gb
its returning OutOfMemoryError. Is it mandatory to satisfy the
condition java heap size > index size ? . If yes, is there any
solution to run Solr s
> *:* AND -fieldX:[* TO *] - returns 0 docs
>
> fieldX:(a*) - return docs, so I'm sure that there's docs
> with this field filled.
>
> Any other ideias what could be wrong?
There is not wrong in this scenario.
If -fieldX:[* TO *] returns 0 docs, it means that all of your documents have
that f
Not entirely true - thats the case in Lucene, but in Solr, top level
queries *can* start with minus or not. They cannot if they are nested.
Both
*:* AND -fieldX:[* TO *]
and
-fieldX:[* TO *]
are the same in Solr.
--
- Mark
http://www.lucidimagination.com
Lance Norskog wrote:
> Queries th
Before you file a JIRA issue:
I don't believe this is a bug, so there is likely no need for JIRA. Try
putting the date.formats snipped in the defaults section rather than
simply within the RequestHandler tags. Then you should be good to go.
--
- Mark
http://www.lucidimagination.com
Lance No
'!'
:)))
Plus, FastLRUCache (previous one was synchronized)
(and of course warming-up time) := start complains after ensuring there are
no complains :)
(and of course OS needs time to cache filesystem blocks, and Java HotSpot,
... - few minutes at least...)
> On Feb 3, 2010, at 1:38 PM, Rajat Gar
I tried another one:
fieldX:["" TO *] and it returns articles with the field filled :), so I guess
I'm getting there.
But I tried also fieldX:[" " TO *] and get a few more results that the first
one...
Is there a real difference between these, and also if the results are really
all docs with
Thanks, but still no luck with that:
*:* AND -fieldX:[* TO *] - returns 0 docs
fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled.
Any other ideias what could be wrong?
Frederico
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: quint
On Wed, Feb 3, 2010 at 12:21 AM, Charlie Jackson wrote:
> Currently, I've got a Solr setup in which we're distributing searches
> across two cores on a machine, say core1 and core2. I'm toying with the
> notion of enabling Solr's HTTP caching on our system, but I noticed an
> oddity when using it
Hi,
I am newbie to solr and exploring solr last few days.
I am using solr cell with tika for parsing, indexing and searching
Posting the rich text documents via Solrj.
My actual requirement is instead of using local documents(pdf, doc & docx),
i want to use webpages(urls for eg..,(http://www.apach
>Generally speaking, by convention boosts in Lucene have unity at 1.0,
>not 0.0. So, a "negative boost" is usually done with boosts between 0
>and 1. For this case, maybe a boost of 0.1 is what you want?
I forgot to say I tried what you say aswell but didn't work.
>In the standard query parse
67 matches
Mail list logo