Re: Need help with DIH dataconfig.xml

2011-06-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
Use TemplateTransformer dataConfig dataSource name = wld type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/wld user=root password=pass/ document name=variants

Re: fieldCache problem OOM exception

2011-06-16 Thread Bernd Fehling
Hi Erik, yes I'm sorting and faceting. 1) Fields for sorting: sort=f_dccreator_sort, sort=f_dctitle, sort=f_dcyear The parameter facet.sort= is empty, only using parameter sort=. 2) Fields for faceting: f_dcperson, f_dcsubject, f_dcyear, f_dccollection, f_dclang, f_dctypenorm,

Re: Copying few field using copyField to non multiValued field

2011-06-16 Thread Michael Kuhlmann
Hi Omri, there are two limitations: 1. You can't sort on a multiValued field. (Anyway, on which of the copied fields would you want to sort first?) 2. You can't make the multiValued field the unique key. Both are no real limitations: 1. Better sort on at_country, at_state, at_city instead. 2.

Re: DIH abort doesn't close datasources

2011-06-16 Thread Shalin Shekhar Mangar
On Wed, Jun 15, 2011 at 8:10 PM, Frank Wesemann f.wesem...@fotofinder.netwrote: Hi, I just came across this: If I abort an import via /dataimport/?command=abort the connections to the (in my case) database stay open. Shouldn't DocBuilder#rollback() call something like cleanup() which in turn

RE: Multiple indexes

2011-06-16 Thread Kai Gülzau
Are there any plans to support a kind of federated search in a future solr version? I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I am aware of

Field Collapsing and Grouping in Solr 3.2

2011-06-16 Thread Sergio Martín
Hello. Does anybody know if Field Collapsing and Grouping is available in Solr 3.2. I mean directly available, not as a patch. I have read conflicting statements about it... Thanks a lot! http://www.playence.com/ Description: playence Sergio Martín Cantero playence KG

Re: DIH abort doesn't close datasources

2011-06-16 Thread Frank Wesemann
Shalin, thank you for the answer. I indeed didn't look into clearCache(). I thought it would just do that ( clear caches ). :) Shalin Shekhar Mangar schrieb: The abort command just sets a atomic boolean flag which is checked frequently by the import threads to see if they should stop. If you

Showing facet of first N docs

2011-06-16 Thread Tommaso Teofili
Hi all, Do you know if it is possible to show the facets for a particular field related only to the first N docs of the total number of results? It seems facet.limit doesn't help with it as it defines a window in the facet constraints returned. Thanks in advance, Tommaso

Re: Field Collapsing and Grouping in Solr 3.2

2011-06-16 Thread Michael McCandless
Alas, no, not yet.. grouping/field collapse has had a long history with Solr. There were many iterations on SOLR-236, but that impl was never committed. Instead, SOLR-1682 was committed, but committed only to trunk (never backported to 3.x despite requests). Then, a new grouping module was

RE: Field Collapsing and Grouping in Solr 3.2

2011-06-16 Thread Sergio Martín
Mike, thanks a lot for your quick and precise answer! Sergio Martín Cantero playence KG Penthouse office Soho II - Top 1 Grabenweg 68 6020 Innsbruck Austria Mobile: (+34)654464222 eMail: sergio.mar...@playence.com Web:www.playence.com Stay up to date on the latest developments of

Re: DIH abort doesn't close datasources

2011-06-16 Thread Shalin Shekhar Mangar
On Thu, Jun 16, 2011 at 3:46 PM, Frank Wesemann f.wesem...@fotofinder.netwrote: Shalin, thank you for the answer. I indeed didn't look into clearCache(). I thought it would just do that ( clear caches ). :) Yeah, it is not the most aptly named method :) Thanks for reviewing the code

Re: Mahout Solr

2011-06-16 Thread Adam Estrada
You're right...It would be nice to be able to see the cluster results coming from Solr though... Adam On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg andrew.clegg+mah...@gmail.com wrote: Well, it does have the ability to pull TermVectors from an index:

Complex situation

2011-06-16 Thread roySolr
Hello, First i will try to explain the situation: I have some companies with openinghours. Some companies has multiple seasons with different openinghours. I wil show some example data : Companyid Startdate(d-m) Enddate(d-m) Openinghours_end 101-01

Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Mark Schoy
Hi, I set up a Solr instance with 512 cores. Each core has 100k documents and 15 fields. Solr is running on a CPU with 4 cores (2.7Ghz) and 16GB RAM. Now I've done some benchmarks with JMeter. On each thread iteration JMeter queriing another Core by random. Here are the results (Duration: each

Re: query routing with shards

2011-06-16 Thread Dmitry Kan
Hi Otis, I followed your recommendation and decided to implement the SearchComponent::modifyRequest(ResponseBuilder rb, SearchComponent who, ShardRequest sreq) method, where the query routing happens. So far it is working OK for the non-facet search, this is good news. The bad news is that it

Re: Boost Strangeness

2011-06-16 Thread Judioo
fascinating Thank you so much Erik, I'm slowly beginning to understand. SO I've discovered that by defining 'splitOnNumerics=0' on the filter class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I can get *closer* to my required goal! Now something else odd is occuring.

Re: Showing facet of first N docs

2011-06-16 Thread Dmitry Kan
http://wiki.apache.org/solr/SimpleFacetParameters facet.offset This param indicates an offset into the list of constraints to allow paging. The default value is 0. This parameter can be specified on a per field basis. Dmitry On Thu, Jun 16, 2011 at 1:39 PM, Tommaso Teofili

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Andrzej Bialecki
On 6/16/11 3:22 PM, Mark Schoy wrote: Hi, I set up a Solr instance with 512 cores. Each core has 100k documents and 15 fields. Solr is running on a CPU with 4 cores (2.7Ghz) and 16GB RAM. Now I've done some benchmarks with JMeter. On each thread iteration JMeter queriing another Core by

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
Interesting. You guessed right. I changed multivalued to multiValued and all of a sudden I get Strings. But, doesn't multivalued default to false? In my schema, I originally did not set multivalued. I only put in multivalued=false after I experienced this issue. -Rich For the record, I had a

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread François Schiettecatte
I am assuming that you are running on linux here, I have found atop to be very useful to see what is going on. http://freshmeat.net/projects/atop/ dstat is also very useful too but needs a little more work to 'decode'. Obviously there is contention going on, you just need to figure out

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
FYI: Using multiValued=false for all string fields results in the following output: ### Field uri is an instance of String. ### Field entity_label is an instance of String. ### Field institution_uri is an instance of String. ### Field asserted_type_uri is an instance of String.

Re: Showing facet of first N docs

2011-06-16 Thread Tommaso Teofili
Thanks Dmitry, but maybe I didn't explain correctly as I am not sure facet.offset is the right solution, I'd like not to page but to filter facets. I'll try to explain better with an example. Imagine I make a query and first 2 docs in results have both 'xyz' and 'abc' as values for field 'lemmas'

Re: How to index correctly a text save with tinyMCE

2011-06-16 Thread Ariel
I have the following problem: I am using the spanish analyzer to index and query, but due to I am using tinymce some charactes of the text are changed codified in html, for example the text: En españa ... it is changed to En espantilde;a so I need a way to recodify that text to make queries

Re: query routing with shards

2011-06-16 Thread Dmitry Kan
Hi Otis, I have fixed it by assigning the value to rb same as assigned to sreq: rb.shards = shards.toString().split(,); not tested that fully yet, but distributed faceting works at least on my pc _3 shards 1 router_ setup. Dmitry On Thu, Jun 16, 2011 at 4:53 PM, Dmitry Kan

Encoding of alternate fields in highlighting

2011-06-16 Thread Massimo Schiavon
I have an index with various fields and I want to highlight query matchings on title and content fields. These fields could contain html tags so I've configured HtmlFormatter for highlighting. The problem is that if the query doesn't match the text of the field, solr returns the value of

Re: Complex situation

2011-06-16 Thread Alexey Serba
Am I right that you are only interested in results / facets for current season? If it's so then you can index start/end dates as a separate number fields and build your search filters like this fq=+start_date_month:[* TO 6] +start_date_day:[* TO 17] +end_date_month:[* TO 6] +end_date_day:[16 TO *]

Re: Showing facet of first N docs

2011-06-16 Thread karsten-solr
Hi Tommaso, the FacetComponent works with the DocListAndSet#docSet. It should be easy to switch to DocListAndSet#docList (which contains all documents for result list (default: TOP-10, but possible 15-25 (if start=15, rows=11). Which means to change the source code. Instead of changing the

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Mark Schoy
Thanks for your answers. Andrzej was right with his assumption. Solr only needs about 9GB memory but the system needs the rest of it for disc IO: 64 Cores: 64*100MB index size = 6,4GB + 9 GB Solr Cache + about 600 MB OS = 16GB Conclusion: My system can exactly buffer the data of 64 Cores.

Document Scoring

2011-06-16 Thread zarni aung
Hi, I am designing my indexes to have 1 write-only master core, 2 read-only slave cores. That means the read-only cores will only have snapshots pulled from the master and will not have near real time changes. I was thinking about adding a hybrid read and write master core that will have the

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

2011-06-16 Thread Alexey Serba
So a search for a product once the user logs in and searches for only the products that he has access to Will translate to something like this . ,the product ids are obtained form the db  for a particular user and can run into  n  number. search term fq=product_id(100 10001  ..n number)

RE: How to index correctly a text save with tinyMCE

2011-06-16 Thread Steven A Rowe
Hi Ariel, On 6/16/2011 at 10:45 AM, Ariel wrote: I have the following problem: I am using the spanish analyzer to index and query, but due to I am using tinymce some charactes of the text are changed codified in html, for example the text: En españa ... it is changed to En espantilde;a so I

Re: How to index correctly a text save with tinyMCE

2011-06-16 Thread Ariel
Thanks for your answer, I have just put the filter in my schema.xml but it doesn't work I am using solr 1.4 and my conf is: code analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter

Re: How to index correctly a text save with tinyMCE

2011-06-16 Thread Shawn Heisey
On 6/16/2011 11:12 AM, Ariel wrote: Thanks for your answer, I have just put the filter in my schema.xml but it doesn't work I am using solr 1.4 and my conf is: code analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Chris Hostetter
: and all of a sudden I get Strings. But, doesn't multivalued default to : false? In my schema, I originally did not set multivalued. I only put in : multivalued=false after I experienced this issue. That's dependent on the version of Solr, and it's is where the version property of the schema

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
We haven't changed Solr versions. We've been using 3.1.0 all along. Plus, I have some code that runs during indexing and retrieves the fields from a SolrInputDocument, rather than a SolrDocument. That code gets Strings without any problem, and always has, even without saying multiValued=false.

Re: Strange behavior

2011-06-16 Thread Alexey Serba
Have you stopped Solr before manually copying the data? This way you can be sure that index is the same and you didn't have any new docs on the fly. 2011/6/14 Denis Kuzmenok forward...@ukr.net: What  should  i provide, OS is the same, environment is the same, solr is  completely  copied,  

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

2011-06-16 Thread Sujatha Arun
Peter , Thanks for the clarification. Why I specifically asked was because, we have many search instances (200+) on a single JVM. Each of these instaces could have n users and each user can subscribe to n products .Now accordng to your suggestion , I need to maintain an in-memory list of

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
Ah! That was the problem. The version was 1.0. I'll change it to 1.2. Thanks! -Rich -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, June 16, 2011 2:33 PM To: Simon, Richard T Cc: solr-user@lucene.apache.org Subject: RE: getFieldValue always

Re: Updating only one indexed field for all documents quickly.

2011-06-16 Thread Alexey Serba
with the integer field. If you just want to influence the score, then just plain external field fields should work for you. Is this an appropriate solution, give our use case? Yes, check out ExternalFileField * http://search.lucidimagination.com/search/document/CDRG_ch04_4.4.4 *

It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Gabriele Kahlout
Hello, I'm testing out different Similarity implementations, and to do that I restart Solr each time I want to try a different similarity class I change the class attributed of the similiary element in schema.xml. Beside running multiple-cores, each with its own schema, is there a way to tell the

Re: Minimum Should Match + External Field + Function Query with boost

2011-06-16 Thread Chris Hostetter
: Seem to have a solution but I am still trying to figure out how/why it works. : : Addition of defType=edismax in the boost query seem to honor MM and : correct boosting based on external file source. You didn't bost enough details in your original question to be 100% certain (would have

RE: HTMLStripTransformer will remove the content in XML??

2011-06-16 Thread Chris Hostetter
FYI: There's a new patch specificly for dealing with xml tags and entities that handles the CDATA case... https://issues.apache.org/jira/browse/SOLR-2597 : Date: Fri, 27 May 2011 17:01:26 +0800 : From: Ellery Leung elleryle...@be-o.com : Reply-To: solr-user@lucene.apache.org,

Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Erik Hatcher
No, there's not a way to control Similarity on a per-request basis. Some factors from Similarity are computed at index-time though. What factors are you trying to tweak that way and why? Maybe doing boosting using some other mechanism (boosting functions, boosting clauses) would be a better

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Andrzej Bialecki
On 6/16/11 5:31 PM, Mark Schoy wrote: Thanks for your answers. Andrzej was right with his assumption. Solr only needs about 9GB memory but the system needs the rest of it for disc IO: 64 Cores: 64*100MB index size = 6,4GB + 9 GB Solr Cache + about 600 MB OS = 16GB Conclusion: My system can

Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Gabriele Kahlout
On Thu, Jun 16, 2011 at 9:14 PM, Erik Hatcher erik.hatc...@gmail.comwrote: No, there's not a way to control Similarity on a per-request basis. Some factors from Similarity are computed at index-time though. You got me on this. What factors are you trying to tweak that way and why? Maybe

RE: How to index correctly a text save with tinyMCE

2011-06-16 Thread Steven A Rowe
Hi Ariel, As Shawn says, char filters come before tokenizers. You need to use a charFilter tag instead of filter tag. I've updated the HTMLStripCharFilter documentation on the Solr wiki to include this information:

getting started

2011-06-16 Thread Mari Masuda
Hello, I am new to Solr and am in the beginning planning stage of a large project and could use some advice so as not to make a huge design blunder that I will regret down the road. Currently I have about 10 MySQL databases that store information about different archival collections. For

Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Robert Muir
On Thu, Jun 16, 2011 at 3:23 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I'm trying to assess the impact of coord (search-time) on Qtime. In one implementation coord returns 1, while in another it's actually computed. On query time? coord should be really cheap (unless your impl does

Re: getting started

2011-06-16 Thread Jonathan Rochkind
On 6/16/2011 4:41 PM, Mari Masuda wrote: One reservation I have is that eventually we would like to be able to type in Iraq and find records across all of the collections at once instead of having to search each collection separately. Although I don't know anything about it at this stage, I

Re: getting started

2011-06-16 Thread Sascha SZOTT
Hi Mari, it depends ... * How many records are stored in your MySQL databases? * How often will updates occur? * How many db records / index documents are changed per update? I would suggest to start with a single Solr core first. Thereby, you can concentrate on the basics and do not need to

sending results of function query to range query

2011-06-16 Thread Kevin Osborn
I am not sure if I can use function queries this way. I have a query like thisattributeX:[* TO ?] in my DB. I replace the ? with input from the front end. Obviously, this works fine. However, what I really want to do is attributeX:[* TO (3 * ?)] Is there anyway to embed the results of a

Re: Encoding of alternate fields in highlighting

2011-06-16 Thread Koji Sekiguchi
(11/06/17 0:15), Massimo Schiavon wrote: I have an index with various fields and I want to highlight query matchings on title and content fields. These fields could contain html tags so I've configured HtmlFormatter for highlighting. The problem is that if the query doesn't match the text of

SOlR -- Out of Memory exception

2011-06-16 Thread jyn7
We just started using SOLR. I am trying to load a single file with 20 million records into SOLR using the CSV uploader. I keep getting and out of Memory after loading 7 million records. Here is the config: autoCommit maxDocs1/maxDocs maxTime6/maxTime I also

Re: fieldCache problem OOM exception

2011-06-16 Thread Erick Erickson
Well, if my theory is right, you should be able to generate OOMs at will by sorting and faceting on all your fields in one query. But Lucene's cache should be garbage collected, can you take some memory snapshots during the week? It should hit a point and stay steady there. How much memory are

Re: Boost Strangeness

2011-06-16 Thread Erick Erickson
Right, if you've only changed WordDelimiterFilterFactory in the query, then then tokens you're analyzing may be split up. Try running some of the terms through the admin/analysis page Unless you have catenateAll=1, in the definition, the whole term won't be there It becomes a question of

Re: Document Scoring

2011-06-16 Thread Erick Erickson
I really wouldn't go there, it sounds like there are endless opportunities for errors! How real-time is real-time? Could you fix this entirely by 1 adjusting expectations for, say, 5 minutes. 2 adjusting your commit (on the master) and poll (on the slave) appropriately? Best Erick On Thu, Jun

Re: SOlR -- Out of Memory exception

2011-06-16 Thread Erick Erickson
H, are you still getting your OOM after 7M records? Or some larger number? And how are you using the CSV uploader? Best Erick On Thu, Jun 16, 2011 at 9:14 PM, jyn7 jyotsna.namb...@gmail.com wrote: We just started using SOLR. I am trying to load a single file with 20 million records into

Re: SOlR -- Out of Memory exception

2011-06-16 Thread jyn7
Yes Eric, after changing the lock type to Single, I got an OOM after loading 5.5 million records. I am using the curl command to upload the csv. -- View this message in context: http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3074765.html Sent from the Solr - User

omitTermFreqAndPositions in a TextField fieldType

2011-06-16 Thread Michael Ryan
Is it possible to use omitTermFreqAndPositions=true in a fieldType declaration that uses class=solr.TextField? I've tried doing this and it does not seem to work (i.e., the prx file size does not change). Using it in a field declaration does work, but I'd rather set it in the fieldType so I

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

2011-06-16 Thread Sujatha Arun
Alexey, Do you mean that we have current Index as it is and have a separate core which has only the user-id ,product-id relation and at while querying ,do a join between the two cores based on the user-id. This would involve us to Index/delete the product as and when the user subscription

Re: SOlR -- Out of Memory exception

2011-06-16 Thread pravesh
If you are sending whole CSV in a single HTTP request using curl, why not consider sending it in smaller chunks? -- View this message in context: http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3075091.html Sent from the Solr - User mailing list archive at Nabble.com.