Hi Eric,
Thanks for the response.
I am already using termVectors with offsets & positions enabled as shown below.
I am indexing FAQ content and some these FAQ has attachments linked to them and
these attachments have files like PDF, DOC *.TAR , *.GZIP files that contains
additional informa
Don't know if its this particular issue, but have you seen:
https://issues.apache.org/jira/browse/LUCENE-3588
Best
Erick
On Fri, Nov 25, 2011 at 4:59 PM, Justin Caratzas
wrote:
> Lasse Aagren writes:
>
>> Hi,
>>
>> We are running Solr-Lucene 4.0-SNAPSHOT (1199777M - hudson - 2011-11-09
>> 14:5
Have you considered removing them at index time? See:
http://wiki.apache.org/solr/Deduplication
Best
Erick
On Fri, Nov 25, 2011 at 3:13 PM, Ted Dunning wrote:
> See http://en.wikipedia.org/wiki/Locality-sensitive_hashing
>
> The obvious thought that I had just after hitting send was that you cou
Are you committing after the run?
Best
Erick
On Fri, Nov 25, 2011 at 1:32 PM, Young, Cody wrote:
> I don't see anything wrong so far other than a typo here (missing a p in
> the second price):
>
>
> Can you see if there are any warnings in the log about documents not
> being able to be created
Please review: http://wiki.apache.org/solr/UsingMailingLists
you're asking us to figure out what you've done. IN particular,
are you using either dismax or edismax? They don't respect
the defaultOperator. Use the mm param to get this kind
of behavior.
Best
Erick
On Thu, Nov 24, 2011 at 6:33 PM,
Have you looked at the admin/analysis page? That's invaluable
for answering this kind of question.
Best
Erick
On Thu, Nov 24, 2011 at 2:30 PM, Uomesh wrote:
> Hi,
>
> I tried with preserveOriginal="1" and reindex too but still no result.
>
> Thanks,
> Umesh
>
> On Wed, Nov 23, 2011 at 5:33 PM, S
Please review: http://wiki.apache.org/solr/UsingMailingLists
You have given us virtually no information that would allow
us to help...
Best
Erick
On Thu, Nov 24, 2011 at 1:57 PM, GAURAV PAREEK
wrote:
> I am serching some of the key word but I am not getting the correct result.
>
> According my
Highlighting is dependent on the size of the
data being fed through the highlighter. Unless you have
termVectors & offsets & positions enabled, the text
must be re-analyzed, see:
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=%28termvector%29%7C%28retrieve%29%7C%28contents%29
But high
You just need two clauses, something like
q=field:yes (field:* -field:[* TO *])
fq could work here too.
Best
Erick
On Fri, Nov 25, 2011 at 10:06 AM, Phil Hoy wrote:
> Hi,
>
> Thanks for getting back to me, and sorry the default q value was *:* so I
> omitted it from the example.
>
> I do not
It's checked in, SOLR-2438. Although it's getting some surgery so you
can expect it to morph a bit.
Erick
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote:
> Thanks for confirming that, and laying out the options, Robert.
>
> -Mike
>
> On 11/23/2011 9:03 PM, Robert Muir wrote:
>>
>> hi,
>
Lasse Aagren writes:
> Hi,
>
> We are running Solr-Lucene 4.0-SNAPSHOT (1199777M - hudson - 2011-11-09
> 14:58:50) on severel servers running:
>
> 64bit Debian Squeeze (6.0.3)
> OpenJDK6 (b18-1.8.9-0.1~squeeze1)
> Tomcat 6.028 (6.0.28-9+squeeze1)
>
> Some of the servers have 48G RAM and in that
See http://en.wikipedia.org/wiki/Locality-sensitive_hashing
The obvious thought that I had just after hitting send was that you could
put the LSH signatures on the documents. That would let you do the scan at
low volume and using LSH would make the duplicate scan almost as fast as
your score scan
only one field can be a default. use copy field and copy the fields
you need to search into a single field and set the copy field to be
the default. That might be ok depending upon your circumstances
On 25 November 2011 12:46, kiran.bodigam wrote:
> In my schema i have defined below tag for index
in general terms, when your Java heap is so large, it is beneficial to
set mx and ms to the same size.
On Wed, Nov 23, 2011 at 5:12 AM, Artem Lokotosh wrote:
> Hi!
>
> * Data:
> - Solr 3.4;
> - 30 shards ~ 13GB, 27-29M docs each shard.
>
> * Machine parameters (Ubuntu 10.04 LTS):
> user@Solr:~$ u
I don't see anything wrong so far other than a typo here (missing a p in
the second price):
Can you see if there are any warnings in the log about documents not
being able to be created?
Also, you should have a field type definition for text in your schema.
It will look something like
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote:
> Thanks for confirming that, and laying out the options, Robert.
>
FYI: Erick committed the multiterm stuff, so I opened an issue for
this: https://issues.apache.org/jira/browse/SOLR-2919
--
lucidimagination.com
I don't think there is a way of seeing the "boosts" from the index, as
those are encoded as "norms" (together with length normalization). You can
see the norms with Luke if you want to and in the debugQuery output the
index-time boost should be represented in the "fieldNorm" section. (if you
click
Update on this: I've established:
* It's not a problem in the DB (I can index from this DB into a Solr
instance on another server)
* It's not Tomcat (I get the same problem in Jetty)
* It's not the schema (I have simplified it to one field)
That leaves SolrConfig.xml and data-config.
Only thing c
Hi Stephane,
Do you know about Solr's DataImportHandler, aka DIH?:
http://wiki.apache.org/solr/DataImportHandler
Steve
> -Original Message-
> From: KabooHahahein [mailto:stele...@hotmail.com]
> Sent: Friday, November 25, 2011 10:33 AM
> To: solr-user@lucene.apache.org
> Subject: XML Man
Hi,
I am new to Solr, and from what I understand, Solr indexes an XML database
into its own format in order to enter the data into the search engine.
I am currently trying to find an XML solution for management of these XML
files. My database will include multiple XML files, and I'd like to be ab
On 11/25/2011 3:13 AM, Mark Miller wrote:
When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?
I'm my experience, the overhead for dist
Hi all,
I have 4 products, let's call them p1,p2, p3 and p4, at the point of indexing
I'm boosting each document as follows (using ):
p1 = 2.3434156476491901
p2 = 2.1894875146124502
p3 = 2.51677824126855
p4 = 2.2773491010634999
(Note: scores may not be identical to what it currently indexed, be
You might be able to sort by the map function q=*:*&sort=map(price,0,100,
10) asc, price asc.
Phil
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 25 November 2011 13:49
To: solr-user@lucene.apache.org
Subject: Re: Sort question
Not that I know of. Yo
Hi,
Thanks for getting back to me, and sorry the default q value was *:* so I
omitted it from the example.
I do not have a problem getting the null values so q=*:*&fq=-field:[* TO *]
indeed works but I also need the docs with a specific value e.g. fq=field:yes.
Is this possible?
Phil
-Or
There's another approach that *may* help, see:
https://issues.apache.org/jira/browse/SOLR-2429
This is probably suitable if you don't have a zillion results
to sort through. The idea here is that you can specify a
filter query that only executes after all the other parts
of a query are done, i.e.
Please review:
http://wiki.apache.org/solr/UsingMailingLists
You haven't shown the relevant parts of your configs.
You haven't shown the queries you're using, with &debugQuery=on
You haven't shown the input
You haven't explained why you think synonyms have anything
to do with the problem.
So it'
You might try with a less "fraught" search phrase,
"to be or not to be" is a classic query that may be all
stop words.
Otherwise, I'm clueless.
On Wed, Nov 23, 2011 at 3:15 PM, Ariel Zerbib wrote:
> I tested with the version 4.0-2011-11-04_09-29-42.
>
> Ariel
>
>
> 2011/11/17 Erick Erickson
>
No, I mean the number that's used to hold the length of the field is a byte,
but that it's not just a simple byte. It's encoded to handle very long
fields in that byte, but there's some loss of precision. For instance,
and I'm pulling numbers out of thin air here, fields of 1-25 terms may
collapse
Did you turn it on? In the defaults section, something like:
on
BTW, I would NOT do the spellcheck.build=true on every
request, this will rebuild your dictionary every time which
is a definite performance problem!
Best
Erick
On Wed, Nov 23, 2011 at 7:32 AM, meghana wrote:
>
> I have configured
I think you're asking for something like:
fq=date:[NOW/DAY-5DAYS TO NOW/DAY+1DAY]?
Best
Erick
On Wed, Nov 23, 2011 at 6:29 AM, do3do3 wrote:
> what i got is the number of this period but i want to get this result only,
> what is the query to can get that like
> fq=source:"news"
>
>
> --
> View t
You haven't specified any "q" clause, just an "fq" clause. Try
q=*:* -field:[* TO *]
or
q=*:*&fq=-field:[* TO *]
BTW, the logic of field:yes -field:[* TO *] makes no sense
You're saying "find me all the fields containing the value "yes" and
remove from that set all the fields containing any value
Not that I know of. You could conceivably do some
work at index time to create a field that would sort
in that order by doing some sort of mapping from
these values into a field that sorts the way you
want, or you might be able to do a plugin
Best
Erick
On Wed, Nov 23, 2011 at 3:29 AM, vraa wrot
Well, you can try adding a directive to put it into
a numeric field
But you need to provide significantly more details. From what
you've said there's not enough information to say much besides
"it should work".
Perhaps you should review:
http://wiki.apache.org/solr/UsingMailingLists
Best
E
In addition to Samuel's comment, the filterCache is also used under
certain circumstances
Best
Erick
2011/11/22 Samuel García Martínez :
> AFAIK, FieldValueCache is only used for faceting on tokenized fields.
> Maybe, are you getting confused with FieldCache (
> http://lucene.apache.org/java/
thanks. i did consider postprocessing and may wind up doing that, i was
hoping there was a way to have Solr do it for me! that I have to as this
question is probably not a good sign, but what is LSH clustering?
On Fri, Nov 25, 2011 at 4:34 AM, Ted Dunning wrote:
> You can do that pretty easily
In my schema i have defined below tag for indexing the fields because in my
use case except the uniquekey remaining fields needs to be indexed as it is
(with same datatype)
Here i would like to search all of them with out field name unfortunately i
can't put all of them using option coz its dyna
Hi I have copied my Solr config from a working Windows server to a new
one, and it can't seem to run an import.
They're both using win server 2008 and SQL 2008R2. This is the data
importer config
I can use MS SQL Prof
Hi,
Is it possible to constrain the results of a query to return docs were a field
contains no value or a particular value?
I tried ?fq=(field:yes OR -field:[* TO *]) but I get no results even though
queries with either ?fq=field:yes or ?fq=-field:[* TO *]) do return results.
Phil
On 21 Nov 2011, at 23:17, Chris Hostetter wrote:
>
> : The way that I've solved this in the past is to make a field
> : specifically for sorting and then truncate the string to a small number
> : of characters and sort on that. You have to accept that in some cases
>
> Something to consider is
Hi,
You're right -- currently Carrot2 clustering ignores the Solr analysis
chain and uses its own pipeline. It is possible to integrate with Solr's
analysis components to some extent, see the discussion here:
https://issues.apache.org/jira/browse/SOLR-2917.
Staszek
> > Hi
> > Trying to use carr
You can do that pretty easily by just retrieving extra documents and post
processing the results list.
You are likely to have a significant number of apparent duplicates this
way.
To really get rid of duplicates in results, it might be better to remove
them from the corpus by deploying something
45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.
filterCache class="solr.FastLRUCache" size="1200" initialSize="1200"
autowarmCount="128"/>
true
50
200
In you case I would first check if the network throu
42 matches
Mail list logo