During data import can you update a record with min and max fields, these
would be equal in the case of a single non range value.
I know this is not a solr solution but a data pre-processing one but would
work?
Failing the above i've saw in the docs reference to a compound value field
(in the
How long does it take to reach this OOM situation? Is it possible for
you to try a merge with each setting in turn, and evaluate what impact
they each have? That is, indexing speed and memory consumption? It might
be interesting to watch garbage collection too while it is running with
jstat, as
Check out
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
This approach of not using sub entities really improved our load time.
Ephraim Ofir
-Original Message-
From: Robert Gründler
First of all, it's a very nice piece of work.
I am just getting my feet wet with Solr in general. So I 'am not even sure how
a
document is NORMALLY deleted.
The library PHPDocs say 'add', 'get' 'delete', But does anyone know about
'update'?
(obviously one can read-delete-modify-create)
I feel the same way about this group and the Postgres group.
VERY helpful people. All of us helping heacho other.
Dennis Gearon
Signature Warning
- Original Message
From: Adam Estrada estrada.a...@gmail.com
Subject: Thank you!
I just want to say that this list
Hi Dennis,
Not particular to the client you use (solr-php-client) for sending
documents, think of update as an overwrite.
This means that if you update a particular document, the previous
version indexed is lost.
Therefore, when updating a document, make sure that all the fields to
be indexed
hi, users, i serch e way to indexing a lot of iml Dokuments so fast as
Possible.
i have more than 1 million docs on Server 1 and a SolR multicor an Server 2
with tomcat.
i dont know ho i can do it easy and fast..
I cant find a idea in the wiki, maby you have some ideas?
King
RAM usage for merging is tricky.
First off, merging must hold open a SegmentReader for each segment
being merged. However, it's not necessarily a full segment reader;
for example, merging doesn't need the terms index nor norms. But it
will load deleted docs.
But, if you are doing deletions (or
ok, works Great, at the Beginning, but now i get a Big Error :-(
HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:462)
at
Hi all,
I've been bashing my head against the wall for a few hours now, trying
to get mlt (more-like-this) queries working across multiple cores.
I've since seen a JIRA issue and documentation saying that multicore
doesn't yet support mlt queries. Oops!
Anyway, to get around this, I was
Hi All,
Thanks for your suggestions.. I got the result of what i expected..
Cheers,
Satya
As Tanguy says, simply re-adding a document with the same
uniqueKey will automatically delete/readd the doc.
But I wanted to add a caution about your phrase read-delete-modify-create
You only get back what you #stored#. So generally the update is done
from the original source rather than the
How are you querying the core to begin with?
On Dec 16, 2010, at 6:46 AM, Mark Allan wrote:
Hi all,
I've been bashing my head against the wall for a few hours now, trying to get
mlt (more-like-this) queries working across multiple cores. I've since seen a
JIRA issue and documentation
Hear hear! In the beginning of my journey with Solr/Lucene I couldn't have
done it without this site. Smiley and Pugh's book was useful, but this forum
was invaluable. I don't have as many questions now, but each new venture,
Geospatial searching, replication and redundancy, performance tuning,
Hi Grant,
Thanks for your reply. I'm using solrj to connect via http, which
eventually sends this query
Hello guys,
I am getting threads stuck forever at *
org.apache.lucene.document.CompressionTools.decompress*. I am using
Weblogic 10.02, with solr deployed as ear and no work manager specifically
configured for this instance.
Only doing simple queries at this node (q=itemId:9 or
Hi,
See log at [1].
We are using the latest snapshot of lucene_branch3.1. We have configured
Solr to use the ConcurrentMergeScheduler:
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/
When a commit() runs, it blocks indexing (all imcoming update requests
are blocked
I have been very successful in following this example
http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example
http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_ExampleAdam
On Thu, Dec 16, 2010 at 5:44 AM, Jörg Agatz joerg.ag...@googlemail.comwrote:
hi, users, i serch e
Hallo users,
I have create a Multicore instance from Solr with Tomcat6,
i create two Cores mail and index2 at first, mail and index2 are the
Same config, after this, i change the Mail config and Indexing 30 xml
No when i search in each core:
I also have the same problem, i configure dataimport.properties file as shown
in
http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example
but no change occur, can any one help me
--
View this message in context:
What are you trying to do? It sounds like you're storing fields compressed,
is
that true (i.e. defining compressed=true in your field defs)? If so, why? It
may be
costing you more than you benefit.
A quick test would be to stop returning anything except the score
by specifying fl=score. Or at
: Subject: Determining core name from a result?
FYI: some people may be confused because of terminoligy -- i think what
you are asking is how to know which *shard* a document came from when
doing a distributed search.
This isn't currently supported, there is an open issue tracking it...
A couple of observations:
1 your regex at query time is interesting. You're using KeywordTokenizer,
so input of
search me becomes searchme before it goes through the parser. Is
this your intent?
2 Why are you using EdgeNGrams for auto suggest? The TermsComponent is
an easier, more
Oops! Sorry, I thought shard and core were one in the same and the
terms could be used interchangeably - I've got a multicore setup which
I'm able to search across by using the shards parameter. I think
you're right, that *is* the question I was asking.
Thanks for letting me know it's not
Hi,
I am trying to do a facet search and sort the facet values too.
First I tried with 'solr.TextField' as field type. But this does not return
sorted facet values.
After referring to
FAQ(http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F),
I changed it to
2010/12/16 Erick Erickson erickerick...@gmail.com
What are you trying to do? It sounds like you're storing fields compressed,
is
that true (i.e. defining compressed=true in your field defs)? If so, why?
It
may be
costing you more than you benefit.
No compressed fields in my schema
A
I have tryed some Thinks, now i have new news,
when i search in :
Unfortunately, (I think?) Solr currently commits by closing the
IndexWriter, which must wait for any running merges to complete, and
then opening a new one.
This is really rather silly because IndexWriter has had its own commit
method (which does not block ongoing indexing nor merging) for quite
We have ~50 long-running SQL queries that need to be joined and denormalized.
Not all of the queries are to the same db, and some data comes from fixed-width
data feeds. Our current search engine (that we are converting to SOLR) has a
fast disk-caching mechanism that lets you cache all of
So just use add and overwrite. OK, thanks
Dennis Gearon
Signature Warning
-
- Original Message
From: Tanguy Moal tanguy.m...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, December 16, 2010 1:33:36 AM
Subject: Re: PHPSolrClient
Hi Dennis,
Not particular to the client you use
Hello we occasionally bump into the OOM issue during merging after propagation
too, and from the discussion below I guess we are doing thousands of 'false
deletions' by unique id to make sure certain documents are *not* in the index.
Could anyone explain why that is bad? I didn't really
If I ever make it, wikipedia, stackoverflow, PHP, Symfony, Doctrine, Apache are
all going to get donations.
I already send $20 to wikipedia, they're huring now.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
It's not that it's bad, it's just that Lucene must do extra work to
check if these deletes are real or not, and that extra work requires
loading the terms index which will consume additional RAM.
For most apps, though, the terms index is relatively small and so this
isn't really an issue. But if
what is it that you are trying to commit?
a
On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon gear...@sbcglobal.netwrote:
What have people found as the best way to do bulk commits either from the
web or
from a file on the system?
Dennis Gearon
Signature Warning
It is
This is how I import a lot of data from a cvs file. There are close to 100k
records in there. Note that you can either pre-define the column names using
the fieldnames param like I did here *or* include header=true which will
automatically pick up the column header if your file has it.
curl
Hi all, I have the following problems.
I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS
)
If i do a search for: *SectionName:Programas_Home* i have no results: Returned
Data (PasteBin) http://pastebin.com/wnPdHqBm
If i do a search for: *Programas_Home* i have only 1
Thanks Mike,
But, if you are doing deletions (or updateDocument, which is just a
delete + add under-the-hood), then this will force the terms index of
the segment readers to be loaded, thus consuming more RAM.
Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance
a few
Thanks Mike! When you say 'term index of the segment readers', are you
referring to the term vectors?
In our case our index of 8 million docs holds pretty 'skinny' docs containing
searchable product titles and keywords, with the rest of the doc only holding
Ids for faceting upon. Docs
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom tburt...@umich.edu wrote:
Thanks Mike,
But, if you are doing deletions (or updateDocument, which is just a
delete + add under-the-hood), then this will force the terms index of
the segment readers to be loaded, thus consuming more RAM.
Out of
Actually terms index is something different.
If you don't use CFS, go and look at the size of *.tii in your index
directory -- those are the terms index. The terms index picks a
subset of the terms (by default 128) to hold in RAM (plus some
metadata) in order to make seeking to a specific term
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom tburt...@umich.edu wrote:
I always get confused about the two different divisors and their names in the
solrconfig.xml file
This one (for the writer) isnt configurable by Solr. want to open an issue?
We are setting termInfosIndexDivisor,
Hi,
LuSqlv2 beta comes out in the next few weeks, and is designed to
address this issue (among others).
LuSql original
(http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
now moved to: https://code.google.com/p/lusql/) is a JDBC--Lucene
high performance loader.
You may have
That easy, huh? Heck, this gets better and better.
BTW, how about escaping?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from
On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon gear...@sbcglobal.net wrote:
That easy, huh? Heck, this gets better and better.
BTW, how about escaping?
The CSV escaping? It's configurable to allow for loading different
CSV dialects.
http://wiki.apache.org/solr/UpdateCSV
By default it uses
Ezequiel:
Nice job of including relevant details, by the way. Unfortunately I'm
puzzled too. Your SectionName is a string type, so it should
be placed in the index as-is. Be a bit cautious about looking at
returned results (as I see in one of your xml files) because the returned
values are the
Your setting isn't being applied to the reader IW uses during
merging... its only for readers Solr opens from directories
explicitly.
I think you should open a jira issue!
Do I understand correctly that this setting in theory could be applied to the
reader IW uses during merging but is not
On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom tburt...@umich.edu wrote:
Your setting isn't being applied to the reader IW uses during
merging... its only for readers Solr opens from directories
explicitly.
I think you should open a jira issue!
Do I understand correctly that this setting in
On Thu, Dec 16, 2010 at 5:51 AM, Michael McCandless
luc...@mikemccandless.com wrote:
If you are doing false deletions (calling .updateDocument when in fact
the Term you are replacing cannot exist) it'd be best if possible to
change the app to not call .updateDocument if you know the Term
I'll check the Tokenizer to see if that's the problem.
The results of Analysis Page for SectionName:Programas_Home
Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term
position 1 term text Programas_Home term type word source start,end 0,14
payload
So it's not having problems
OK, what version of Solr are you using? I can take a quick check to see
what behavior I get
Erick
On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara ezech...@gmail.comwrote:
I'll check the Tokenizer to see if that's the problem.
The results of Analysis Page for SectionName:Programas_Home
The jars are named like *1.4.1* . So i suppose its the version 1.4.1
Thanks!
On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson erickerick...@gmail.comwrote:
OK, what version of Solr are you using? I can take a quick check to see
what behavior I get
Erick
On Thu, Dec 16, 2010 at 4:44 PM,
Installed Firebug
Now getting the following error
4139 matches.call( document.documentElement, [test!='']:sizzle );
Though my solr server is running on port8983, I am not using any server to
run this jquery, its just an html file in my home folder that i am opening
in my firefox browser.
Installed Firebug
Now getting the following error
4139 matches.call( document.documentElement, [test!='']:sizzle );
Though my solr server is running on port8983, I am not using any server to
run this jquery, its just an html file in my home folder that i am opening
in my firefox browser.
I am sorry for raising up this thread after 6 months.
But we have still problems with faceted search on full-text fields.
We try to get most frequent words in a text field that is created in 1 hour.
The faceted search takes too much time even the matching number of documents
(created_at within 1
I also have the same problem, i configure
dataimport.properties file as shown
in
http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example
but no change occur, can any one help me
What version of solr are you using? This seems a new feature. So it won't work
on solr
I think this could be down to the same server rule applied to ajax requests.
Your not allowed to display content from two different servers :-(
the good news solr supports jsonp which is a neat trick around this try this
(pasted from another thread)
queryString = *:*
$.getJSON(
OK, it works perfectly for me on a 1.4.1 instance. I've looked over your
files a couple of times and see nothing obvious (but you'll never find
anyone better at overlooking the obvious than me!).
Tokenizing and stemming are irrelevant in this case because your
type is string, which is an
Another thing you can try is trunk. This specific case has been
improved by an order of magnitude recenty.
The case that has been sped up is initial population of the
filterCache, or when the filterCache can't hold all of the unique
values, or when faceting is configured to not use the
One very important thing I forgot to mention is that you will have to
increase the JAVA heap size for larger data sets.
Set JAVA_OPT to something acceptable.
Adam
On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley yo...@lucidimagination.comwrote:
On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon
i guess one last call for help .. i am assuming for people who wrote or have
used the pivot faceting .. this should be a yes no question .. are date
fields supported ?
On Wed, Dec 15, 2010 at 12:58 PM, Adeel Qureshi adeelmahm...@gmail.comwrote:
Thanks Pankaj - that was useful to know. I havent
Thanks Adam!
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
Hello all,
I got an error as follows when I do a range query search ([1 TO *])
on an numeric field and highlight is set on another text field.
2010/12/15 10:58:55 org.apache.solr.common.SolrException log
Fatal: org.apache.lucene.search.BooleanQuery$TooManyClauses:
maxClauseCount is set to 1024
I got an error as follows when I do a range query search
([1 TO *])
on an numeric field and highlight is set on another text
field.
Are you using hl.highlightMultiTerm=true? Pasting your search URL can give more
hints.
Adding hl.requireFieldMatch=true should probably solve your problem.
Thank you for reply.
Are you using hl.highlightMultiTerm=true? Pasting your search URL can give
more hints.
Yes, I used the hl.highlightMultiTerm=true , my search query is as follows :
Adding hl.requireFieldMatch=true should probably
solve your problem.
Yes, adding hl.requireFieldMatch=true can solve my
problem, but in my
solution , I have a content field indexing all fields'
contents to
support full text search, but I also have another 2 fields
title and
body which
I've inferred from a bunch of posts that Solr 1.4 is actually the upcoming 4.x
release?
And the numbering systems on other Java products don't seem to match what's
really out there,i.e Eclipse and Sun Java.
So what IS the Solr versioning number system? Can anyone give a (maybe
possible)
Is it possible to put name value pairs of any type in a native Solr Index field
type? Like JSON/XML/YML?
The reason that I ask, since you asked, is I want my main index schema to be a
base object, and another multivalue column to be the attributes of base object
inherited descendants.
Is
I think it will not because default configuration can only have 2
newSearcher threads but the delay will be more and more long. The
newer newSearcher will wait these 2 ealier one to finish.
2010/12/1 Jonathan Rochkind rochk...@jhu.edu:
If your index warmings take longer than two minutes, but
Hi All,
I built solr successfully and i am thinking to test it with nearly
300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300
files nearly
Is there any dummy data available to test for solr,Otherwise i need to
download each and every file individually..??
we now meet the same situation and want to implement like this:
we add new documents to a RAMDirectory and search two indice-- the
index in disk and the RAM index.
regularly(e.g. every hour we flush the RAMDirecotry into disk and make
a new segment)
to prevent error. before add to RAMDirecotry,we
BTW, what is a Delta (in this context, not an equipment line or a rocket,
please :-)
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself.
There are websites with data sets out there. 'Data sets' may not be the right
search terms, but it's something like that.
Exactly what you want, I couldn't guess otherwise?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
72 matches
Mail list logo