Use TemplateTransformer
dataConfig
dataSource
name = wld
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/wld
user=root
password=pass/
document name=variants
Hi Erik,
yes I'm sorting and faceting.
1) Fields for sorting:
sort=f_dccreator_sort, sort=f_dctitle, sort=f_dcyear
The parameter facet.sort= is empty, only using parameter sort=.
2) Fields for faceting:
f_dcperson, f_dcsubject, f_dcyear, f_dccollection, f_dclang, f_dctypenorm,
Hi Omri,
there are two limitations:
1. You can't sort on a multiValued field. (Anyway, on which of the
copied fields would you want to sort first?)
2. You can't make the multiValued field the unique key.
Both are no real limitations:
1. Better sort on at_country, at_state, at_city instead.
2.
On Wed, Jun 15, 2011 at 8:10 PM, Frank Wesemann
f.wesem...@fotofinder.netwrote:
Hi,
I just came across this:
If I abort an import via /dataimport/?command=abort the connections to the
(in my case) database stay open.
Shouldn't DocBuilder#rollback() call something like cleanup() which in turn
Are there any plans to support a kind of federated search
in a future solr version?
I think there are reasons to use seperate indexes for each document type
but do combined searches on these indexes
(for example if you need separate TFs for each document type).
I am aware of
Hello.
Does anybody know if Field Collapsing and Grouping is available in Solr 3.2.
I mean directly available, not as a patch.
I have read conflicting statements about it...
Thanks a lot!
http://www.playence.com/ Description: playence
Sergio Martín Cantero
playence KG
Shalin,
thank you for the answer.
I indeed didn't look into clearCache().
I thought it would just do that ( clear caches ). :)
Shalin Shekhar Mangar schrieb:
The abort command just sets a atomic boolean flag which is checked
frequently by the import threads to see if they should stop. If you
Hi all,
Do you know if it is possible to show the facets for a particular field
related only to the first N docs of the total number of results?
It seems facet.limit doesn't help with it as it defines a window in the
facet constraints returned.
Thanks in advance,
Tommaso
Alas, no, not yet.. grouping/field collapse has had a long history
with Solr.
There were many iterations on SOLR-236, but that impl was never
committed. Instead, SOLR-1682 was committed, but committed only to
trunk (never backported to 3.x despite requests).
Then, a new grouping module was
Mike, thanks a lot for your quick and precise answer!
Sergio Martín Cantero
playence KG
Penthouse office Soho II - Top 1
Grabenweg 68
6020 Innsbruck
Austria
Mobile: (+34)654464222
eMail: sergio.mar...@playence.com
Web:www.playence.com
Stay up to date on the latest developments of
On Thu, Jun 16, 2011 at 3:46 PM, Frank Wesemann
f.wesem...@fotofinder.netwrote:
Shalin,
thank you for the answer.
I indeed didn't look into clearCache().
I thought it would just do that ( clear caches ). :)
Yeah, it is not the most aptly named method :)
Thanks for reviewing the code
You're right...It would be nice to be able to see the cluster results coming
from Solr though...
Adam
On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg andrew.clegg+mah...@gmail.com
wrote:
Well, it does have the ability to pull TermVectors from an index:
Hello,
First i will try to explain the situation:
I have some companies with openinghours. Some companies has multiple seasons
with different openinghours. I wil show some example data :
Companyid Startdate(d-m) Enddate(d-m) Openinghours_end
101-01
Hi,
I set up a Solr instance with 512 cores. Each core has 100k documents and 15
fields. Solr is running on a CPU with 4 cores (2.7Ghz) and 16GB RAM.
Now I've done some benchmarks with JMeter. On each thread iteration JMeter
queriing another Core by random. Here are the results (Duration: each
Hi Otis,
I followed your recommendation and decided to implement the
SearchComponent::modifyRequest(ResponseBuilder rb, SearchComponent who,
ShardRequest sreq) method, where the query routing happens. So far it is
working OK for the non-facet search, this is good news. The bad news is that
it
fascinating
Thank you so much Erik, I'm slowly beginning to understand.
SO I've discovered that by defining 'splitOnNumerics=0' on the filter
class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I
can get *closer* to my required goal!
Now something else odd is occuring.
http://wiki.apache.org/solr/SimpleFacetParameters
facet.offset
This param indicates an offset into the list of constraints to allow paging.
The default value is 0.
This parameter can be specified on a per field basis.
Dmitry
On Thu, Jun 16, 2011 at 1:39 PM, Tommaso Teofili
On 6/16/11 3:22 PM, Mark Schoy wrote:
Hi,
I set up a Solr instance with 512 cores. Each core has 100k documents and 15
fields. Solr is running on a CPU with 4 cores (2.7Ghz) and 16GB RAM.
Now I've done some benchmarks with JMeter. On each thread iteration JMeter
queriing another Core by
Interesting. You guessed right. I changed multivalued to multiValued and
all of a sudden I get Strings. But, doesn't multivalued default to false? In my
schema, I originally did not set multivalued. I only put in multivalued=false
after I experienced this issue.
-Rich
For the record, I had a
I am assuming that you are running on linux here, I have found atop to be very
useful to see what is going on.
http://freshmeat.net/projects/atop/
dstat is also very useful too but needs a little more work to 'decode'.
Obviously there is contention going on, you just need to figure out
FYI: Using multiValued=false for all string fields results in the following
output:
### Field uri is an instance of String.
### Field entity_label is an instance of String.
### Field institution_uri is an instance of String.
### Field asserted_type_uri is an instance of String.
Thanks Dmitry, but maybe I didn't explain correctly as I am not sure
facet.offset is the right solution, I'd like not to page but to filter
facets.
I'll try to explain better with an example.
Imagine I make a query and first 2 docs in results have both 'xyz' and 'abc'
as values for field 'lemmas'
I have the following problem: I am using the spanish analyzer to index and
query, but due to I am using tinymce some charactes of the text are changed
codified in html, for example the text: En españa ... it is changed to
En espantilde;a so I need a way to recodify that text to make queries
Hi Otis,
I have fixed it by assigning the value to rb same as assigned to sreq:
rb.shards = shards.toString().split(,);
not tested that fully yet, but distributed faceting works at least on my pc
_3 shards 1 router_ setup.
Dmitry
On Thu, Jun 16, 2011 at 4:53 PM, Dmitry Kan
I have an index with various fields and I want to highlight query
matchings on title and content fields.
These fields could contain html tags so I've configured HtmlFormatter
for highlighting. The problem is that if the query doesn't match the
text of the field, solr returns the value of
Am I right that you are only interested in results / facets for
current season? If it's so then you can index start/end dates as a
separate number fields and build your search filters like this
fq=+start_date_month:[* TO 6] +start_date_day:[* TO 17]
+end_date_month:[* TO 6] +end_date_day:[16 TO *]
Hi Tommaso,
the FacetComponent works with the DocListAndSet#docSet.
It should be easy to switch to DocListAndSet#docList (which contains all
documents for result list (default: TOP-10, but possible 15-25 (if start=15,
rows=11). Which means to change the source code.
Instead of changing the
Thanks for your answers.
Andrzej was right with his assumption. Solr only needs about 9GB memory but
the system needs the rest of it for disc IO:
64 Cores: 64*100MB index size = 6,4GB + 9 GB Solr Cache + about 600 MB OS =
16GB
Conclusion: My system can exactly buffer the data of 64 Cores.
Hi,
I am designing my indexes to have 1 write-only master core, 2 read-only
slave cores. That means the read-only cores will only have snapshots pulled
from the master and will not have near real time changes. I was thinking
about adding a hybrid read and write master core that will have the
So a search for a product once the user logs in and searches for only the
products that he has access to Will translate to something like this . ,the
product ids are obtained form the db for a particular user and can run
into n number.
search term fq=product_id(100 10001 ..n number)
Hi Ariel,
On 6/16/2011 at 10:45 AM, Ariel wrote:
I have the following problem: I am using the spanish analyzer to index
and query, but due to I am using tinymce some charactes of the text are
changed codified in html, for example the text: En españa ... it is
changed to En espantilde;a so I
Thanks for your answer, I have just put the filter in my schema.xml but it
doesn't work I am using solr 1.4 and my conf is:
code
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter
On 6/16/2011 11:12 AM, Ariel wrote:
Thanks for your answer, I have just put the filter in my schema.xml but it
doesn't work I am using solr 1.4 and my conf is:
code
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
: and all of a sudden I get Strings. But, doesn't multivalued default to
: false? In my schema, I originally did not set multivalued. I only put in
: multivalued=false after I experienced this issue.
That's dependent on the version of Solr, and it's is where the
version property of the schema
We haven't changed Solr versions. We've been using 3.1.0 all along.
Plus, I have some code that runs during indexing and retrieves the fields from
a SolrInputDocument, rather than a SolrDocument. That code gets Strings without
any problem, and always has, even without saying multiValued=false.
Have you stopped Solr before manually copying the data? This way you
can be sure that index is the same and you didn't have any new docs on
the fly.
2011/6/14 Denis Kuzmenok forward...@ukr.net:
What should i provide, OS is the same, environment is the same, solr
is completely copied,
Peter ,
Thanks for the clarification.
Why I specifically asked was because, we have many search instances
(200+) on a single JVM.
Each of these instaces could have n users and each user can subscribe to
n products .Now accordng to your suggestion , I need to maintain an
in-memory list of
Ah! That was the problem. The version was 1.0. I'll change it to 1.2. Thanks!
-Rich
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Thursday, June 16, 2011 2:33 PM
To: Simon, Richard T
Cc: solr-user@lucene.apache.org
Subject: RE: getFieldValue always
with the integer field. If you just want to influence the
score, then just plain external field fields should work for
you.
Is this an appropriate solution, give our use case?
Yes, check out ExternalFileField
* http://search.lucidimagination.com/search/document/CDRG_ch04_4.4.4
*
Hello,
I'm testing out different Similarity implementations, and to do that I
restart Solr each time I want to try a different similarity class I change
the class attributed of the similiary element in schema.xml. Beside running
multiple-cores, each with its own schema, is there a way to tell the
: Seem to have a solution but I am still trying to figure out how/why it works.
:
: Addition of defType=edismax in the boost query seem to honor MM and
: correct boosting based on external file source.
You didn't bost enough details in your original question to be 100%
certain (would have
FYI: There's a new patch specificly for dealing with xml tags and entities
that handles the CDATA case...
https://issues.apache.org/jira/browse/SOLR-2597
: Date: Fri, 27 May 2011 17:01:26 +0800
: From: Ellery Leung elleryle...@be-o.com
: Reply-To: solr-user@lucene.apache.org,
No, there's not a way to control Similarity on a per-request basis.
Some factors from Similarity are computed at index-time though.
What factors are you trying to tweak that way and why? Maybe doing boosting
using some other mechanism (boosting functions, boosting clauses) would be a
better
On 6/16/11 5:31 PM, Mark Schoy wrote:
Thanks for your answers.
Andrzej was right with his assumption. Solr only needs about 9GB memory but
the system needs the rest of it for disc IO:
64 Cores: 64*100MB index size = 6,4GB + 9 GB Solr Cache + about 600 MB OS =
16GB
Conclusion: My system can
On Thu, Jun 16, 2011 at 9:14 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
No, there's not a way to control Similarity on a per-request basis.
Some factors from Similarity are computed at index-time though.
You got me on this.
What factors are you trying to tweak that way and why? Maybe
Hi Ariel,
As Shawn says, char filters come before tokenizers.
You need to use a charFilter tag instead of filter tag.
I've updated the HTMLStripCharFilter documentation on the Solr wiki to include
this information:
Hello,
I am new to Solr and am in the beginning planning stage of a large project and
could use some advice so as not to make a huge design blunder that I will
regret down the road.
Currently I have about 10 MySQL databases that store information about
different archival collections. For
On Thu, Jun 16, 2011 at 3:23 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
I'm trying to assess the impact of coord (search-time) on Qtime. In one
implementation coord returns 1, while in another it's actually computed.
On query time?
coord should be really cheap (unless your impl does
On 6/16/2011 4:41 PM, Mari Masuda wrote:
One reservation I have is that eventually we would like to be able to type in Iraq and
find records across all of the collections at once instead of having to search each collection
separately. Although I don't know anything about it at this stage, I
Hi Mari,
it depends ...
* How many records are stored in your MySQL databases?
* How often will updates occur?
* How many db records / index documents are changed per update?
I would suggest to start with a single Solr core first. Thereby, you can
concentrate on the basics and do not need to
I am not sure if I can use function queries this way. I have a query like
thisattributeX:[* TO ?] in my DB. I replace the ? with input from the front
end. Obviously, this works fine. However, what I really want to do is
attributeX:[* TO (3 * ?)] Is there anyway to embed the results of a
(11/06/17 0:15), Massimo Schiavon wrote:
I have an index with various fields and I want to highlight query matchings on title
and content
fields.
These fields could contain html tags so I've configured HtmlFormatter for
highlighting. The problem
is that if the query doesn't match the text of
We just started using SOLR. I am trying to load a single file with 20 million
records into SOLR using the CSV uploader. I keep getting and out of Memory
after loading 7 million records. Here is the config:
autoCommit
maxDocs1/maxDocs
maxTime6/maxTime
I also
Well, if my theory is right, you should be able to generate OOMs at will by
sorting and faceting on all your fields in one query.
But Lucene's cache should be garbage collected, can you take some memory
snapshots during the week? It should hit a point and stay steady there.
How much memory are
Right, if you've only changed WordDelimiterFilterFactory in the query, then
then tokens you're analyzing may be split up. Try running some of the
terms through the admin/analysis page Unless you have
catenateAll=1, in the definition, the whole term won't be there
It becomes a question of
I really wouldn't go there, it sounds like there are endless
opportunities for errors!
How real-time is real-time? Could you fix this entirely
by
1 adjusting expectations for, say, 5 minutes.
2 adjusting your commit (on the master) and poll (on the slave) appropriately?
Best
Erick
On Thu, Jun
H, are you still getting your OOM after 7M records? Or some larger
number? And how are you using the CSV uploader?
Best
Erick
On Thu, Jun 16, 2011 at 9:14 PM, jyn7 jyotsna.namb...@gmail.com wrote:
We just started using SOLR. I am trying to load a single file with 20 million
records into
Yes Eric, after changing the lock type to Single, I got an OOM after loading
5.5 million records. I am using the curl command to upload the csv.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3074765.html
Sent from the Solr - User
Is it possible to use omitTermFreqAndPositions=true in a fieldType
declaration that uses class=solr.TextField? I've tried doing this and it does
not seem to work (i.e., the prx file size does not change). Using it in a
field declaration does work, but I'd rather set it in the fieldType so I
Alexey,
Do you mean that we have current Index as it is and have a separate core
which has only the user-id ,product-id relation and at while querying ,do a
join between the two cores based on the user-id.
This would involve us to Index/delete the product as and when the user
subscription
If you are sending whole CSV in a single HTTP request using curl, why not
consider sending it in smaller chunks?
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3075091.html
Sent from the Solr - User mailing list archive at Nabble.com.
61 matches
Mail list logo