Autocommit Index Size

2011-12-06 Thread Husain, Yavar
In solrconfig.xml I was experimenting with Indexing Performance. When I set the maxDocs (in autoCommit) to say 1 documents the index size is double to if I just dont use autoCommit (i.e. keep it commented, i.e commit at the end only after adding documents). Does autoCommit affect the index

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread yu shen
Hi Nagendra, I tried to use solr-nrt-ra-3.4, while the dataimporthandler does not work. The error message is: INFO: created /dataimport: org.apache.solr.handler.dataimport.DataImportHandler Dec 6, 2011 1:16:18 AM org.apache.solr.common.SolrException log SEVERE:

LineEntityProcessor

2011-12-06 Thread Oleg Tikhonov
Hello everybody, I'm trying to use LineEntityProcessor of DIH but somehow without success. I've create data-lep-config.xml, added request handler in solrconfig.xml. During full-import I get a response saying that x rows were fetched, 0 docs added/updated. I defined also very basic regex for

lower score for synonyms

2011-12-06 Thread Robert Brown
is it possible to lower the score for synonym matches? we setup... admin = administration but if someone searches specifically for admin, we want those specific matches to rank higher than matches for administration -- IntelCompute Web Design Local Online Marketing

highlight 1 field twice

2011-12-06 Thread Robert Brown
When searching against 1 field, is it possible to have highlighting returned 2 different ways? We'd like the full field returned with keywords highlighted, but then also returned as snippets. Any possible approaches? -- IntelCompute Web Design Local Online Marketing

Testing a custom implementation of CommonsHttpSolrServer

2011-12-06 Thread Mark Swinson
Hi, I want to test a custom implementation CommonsHttpSolrServer, which is required so that we can enable it to use SSL certificates and proxies when accessing the Solr REST api. One thing I want to avoid is having to have a Solr instance set up on every developers sandbox in order for the tests

Re: lower score for synonyms

2011-12-06 Thread Marc SCHNEIDER
Hello, You could create an other field and link to it the synonym analyzer. When querying set a lower boost for this field. Marc. On Tue, Dec 6, 2011 at 11:31 AM, Robert Brown r...@intelcompute.com wrote: is it possible to lower the score for synonym matches? we setup... admin =

Re: highlight 1 field twice

2011-12-06 Thread Erik Hatcher
Within one request, it isn't possible to highlight the same field twice differently (what's the use case here?), but you could either make multiple requests or copyField to have two stored copies that could be highlighted separately in a single request. Erik On Dec 6, 2011, at 06:01 ,

Re: Testing a custom implementation of CommonsHttpSolrServer

2011-12-06 Thread Erik Hatcher
Mark - So you want the *server* to be started programmatically? You could use Jetty's API to do this... or fork a JVM. As for client-side SolrJ, you can pass an HttpClient to CommonsHttpSolrServer's constructor to customize how the HTTP connection is configured. EmbeddedSolrServer - no, it

Solr request handler queries in fiddler

2011-12-06 Thread Kashif Khan
Hi all, I have developed a solr request handler in which i am querying for shards and mergin the results but i do not see any queries in the fiddler. How can i track or capture the queries from the request handler in the fiddler to see the queries and what setting i have to do for that. Please

Solr sorting issue : can not sort on multivalued field

2011-12-06 Thread Ramesh kumar Velusamy
Hi, I am getting this weird error message `can not sort on multivalued field: fieldname` on all the indexed fields. This is the full error message from solr /headbodyh1HTTP Status 400 - can not sort on multivalued field: price/h1hr/pbtype/b Status report/ppbmessage/bcan not sort on

Re: Document Processing

2011-12-06 Thread Erik Hatcher
As for XML overloading Solr... certainly it will add processing time to the situation as well as additional memory requirements. At worst it'd require more RAM and slow things down, but all depends on scale of ingestion rate and size of the documents whether it'd be prohibitive. Erik

Re: Replication downtime?? - master slave

2011-12-06 Thread Erick Erickson
Replication is basically a background file transfer, your slave shouldn't notice. But what your slave will notice is two things: 1 after replication if your first few queries are slow, you need to autowarm your caches. 2 you will see some memory footprint increase while autowarming is

Re: Grouping or Facet ?

2011-12-06 Thread Erick Erickson
OK, I'm not understanding here. You get the counts and the results if you facet on a single category field. The facet counts are the counts of the *values* in that field. So it would help me if you showed the output of faceting on a single category field and why that didn't work for you But

Re: Out of memory during the indexing

2011-12-06 Thread Erick Erickson
I'm going to defer to the folks who actually know the guts here. If you've turned down the cache entries for your Solr caches, you're pretty much left with Lucene caching which is a mystery... Best Erick On Mon, Dec 5, 2011 at 9:23 AM, Jeff Crump jeffrey.cr...@gmail.com wrote: Yes, and without

UUID field changed when document is updated

2011-12-06 Thread blaise thomson
Hi, I've been trying to use the UUIDField in solr to maintain ids of the pages I've crawled with nutch (as per http://wiki.apache.org/solr/UniqueKey). The use case is that I want to have the server able to use these ids in another database for various statistics gathering. So I want the link

Delays when deleting by query

2011-12-06 Thread Mike Gallan
Hello, We're encountering delays of 10+ minutes when trying to delete from our Solr 3.4 instance. We have 335k documents indexed and interface using SolrJ. Our schema basically consists of a parent object with multiple child objects. Every object is indexed as a separate document

Re: sub query parsing bug???

2011-12-06 Thread Erick Erickson
Hmmm, does this help? In Solr 1.4 and prior, you should basically set mm=0 if you want the equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND. In 3.x and trunk the default value of mm is dictated by the q.op param (q.op=AND = mm=100%; q.op=OR = mm=0%). Keep in mind the

Re: Solr Version Upgrade issue

2011-12-06 Thread Mark Miller
Looks like you must have a mix of old and new jars. On Tuesday, December 6, 2011, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am trying to upgrade my SOLR version from 1.4 to 3.2. but it's giving me below exception. I have checked solr home path it is correct.. Please help SEVERE:

Re: synonyms with dashes '-'

2011-12-06 Thread Erick Erickson
Details matter. Your analysis chain on the field may well be the issue. Look at the terms in the field (admin/schema browser). Look at debugQuery=on to see how the query is parsed Look at the admin/analysis page to see the effects of the analysis chain. You might review:

Alternate score-based sorting for Solr Grouping

2011-12-06 Thread George Stathis
My previous subject line was not very scannable. Apologies for the re-post, I'm just hoping to get more eye-balls and hopefully some insights. Thank you in advance for your time. See below. -GS On Mon, Dec 5, 2011 at 1:37 PM, George Stathis gstat...@gmail.com wrote: Currently, solr grouping

Best practice schema.xml when importing rich documents?

2011-12-06 Thread Pål Brattberg
I'm working with SOLR on amainly MS Word, Powerpoint, Excel and PDFs. Is there a best practice schema.xml and/or solrconfig.xml to use in SOLR when using theExtractingRequestHandler? I have been doing tweaks to the default schema to attempt to get facets working on date modification times, but

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread Nagendra Nagarajayya
Spark: The code is compiled to be compliant with JDK 1.5 and above. So you will need to use at least JDK 1.5 for this to work. BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar in you solrconfig.xml. If you want your data import to be searchable in real time, please make

Re: Grouping or Facet ?

2011-12-06 Thread darren
Sorry to jump into this thread, but are you saying that the facet count is not # of result hits? So if I have 1 document with field CAT that has 10 values and I do a query that returns this 1 document with faceting, that the CAT facet count will be 10 not 1? I don't seem to be seeing that

Solr Trunk Changes requires a reindex

2011-12-06 Thread Jamie Johnson
Are there any migration utilities to move from an index built by a Solr 4.0 snapshot to Solr Trunk? The issue is referenced here http://markmail.org/thread/4ruznwzofyrh776j https://issues.apache.org/jira/browse/LUCENE-3490

Re: Question on DIH delta imports

2011-12-06 Thread Mark
Anyone? On 12/5/11 11:04 AM, Mark wrote: *pk*: The primary key for the entity. It is*optional*and only needed when using delta-imports. It has no relation to the uniqueKey defined in schema.xml but they both can be the same. When using in a nested entity is the PK the primary key column of

Use solr to search in a document repository

2011-12-06 Thread marotosg
Hi. I'm just thinking in the option of using solr to search in a huge document repository. My idea is reading documents(pdf,html,outlook,excel,doc,openoffice,powerpoint...) and extract the information from them and index it in Solr. Basically i'm looking for a solution to search in my documents.

Lucene 4.0 Index Format

2011-12-06 Thread Jamie Johnson
Does anyone know if this has been finalized yet?

Solr Join with Dismax

2011-12-06 Thread Pascal Dimassimo
Hi, I was trying Solr Join across 2 cores on the same Solr installation. Per example: /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant My understanding is that the restaurant query will be executed on index2 and the results of this query will be joined with the documents

Re: Lucene 4.0 Index Format

2011-12-06 Thread Mark Miller
On Tue, Dec 6, 2011 at 12:51 PM, Jamie Johnson jej2...@gmail.com wrote: Does anyone know if this has been finalized yet? It's subject to change up till release. -- - Mark http://www.lucidimagination.com

Re: Autocommit Index Size

2011-12-06 Thread Shawn Heisey
On 12/6/2011 1:01 AM, Husain, Yavar wrote: In solrconfig.xml I was experimenting with Indexing Performance. When I set the maxDocs (in autoCommit) to say 1 documents the index size is double to if I just dont use autoCommit (i.e. keep it commented, i.e commit at the end only after adding

Solr tf ifd

2011-12-06 Thread Nejla Karacan
Hello, I need the tf-idf-values from texts and now Im using Apache-Solr. I am a novice and have some Problems. My question is, how can I extract the tf-idf-values? There are many files in the folder apache-solr-3.5.0\example\solr\data\index but I cant use them. Is the Output only as a

Facet values that should always appear

2011-12-06 Thread Jamie Johnson
Is there a way within Solr to instruct the system that a certain set of values should always appear regardless of their counts when faceting?

Re: Lucene/Solr

2011-12-06 Thread Erick Erickson
If you're not using Drupal, understand that Solr is an *engine*, not a full application. You download solr from the website and install it, which is just basically unpacking it and executing ant -jar start.jar. From there you send documents to Solr (there are a number of ways to accomplish this).

RE: Autocommit Index Size

2011-12-06 Thread Husain, Yavar
Hi Shawn Absolutely perfect. It is always great reading your answers again and again as you explain the concepts so very well. Three cheers and thanks for your reply. Regards, Yavar From: Shawn Heisey [s...@elyograg.org] Sent: Wednesday, December 07,

Re: Multivalued field

2011-12-06 Thread Erick Erickson
field name=id type=string stored=true indexed=true required=true / field name=data type=text_en stored=true indexed=false / Then sometime later uniqueKeyid/uniqueKey (all this in your schema.xml file). That's it. The data field isn't analyzed at all, so the type is largely irrelevant. what you

Re: Use solr to search in a document repository

2011-12-06 Thread Pål Brattberg
Go for it, it's perfect for that! Here's a good starting point for you: http://lucene.apache.org/solr/tutorial.html / pål On Dec 6, 2011, at 6:31 PM, marotosg wrote: Hi. I'm just thinking in the option of using solr to search in a huge document repository. My idea is reading

Re: Solr Join with Dismax

2011-12-06 Thread Jeff Schmidt
Hi Pascal: I have an issue similar to yours, but also need to facet the joined documents... I've been playing with various things. There's not much documentation I can find. Looking at http://wiki.apache.org/solr/Join, in the fourth example you can see the join being relegated to a filter

Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Is there a way to specify the index version solr uses? We're currently using SolrCloud but with the index format changing I'd be preferable to be able to specify a particular index format to avoid having to do a complete reindex. Is this possible?

Re: Solr Join with Dismax

2011-12-06 Thread Pascal Dimassimo
Hi, Thanks for this! But your partner-tmo request handler is probably configured with your ing-content index, no? In my case, I'd like to execute a dismax query on the fromIndex. On Tue, Dec 6, 2011 at 2:57 PM, Jeff Schmidt j...@535consulting.com wrote: Hi Pascal: I have an issue similar to

Re: Continuous update on progress of New SolrCloud Design work

2011-12-06 Thread Per Steffensen
Yonik Seeley skrev: On Mon, Dec 5, 2011 at 6:23 AM, Per Steffensen st...@designware.dk wrote: Will it be possible to maintain a how-to-use section on http://wiki.apache.org/solr/NewSolrCloudDesign with examples, e.g. like to ones on http://wiki.apache.org/solr/SolrCloud, Yep, it was

RE: Sharing dih dictionaries

2011-12-06 Thread Brent Mills
You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev

Re: Solr Lucene Index Version

2011-12-06 Thread Alireza Salimi
Hi, I'm not sure if it would help. in solrconfig.xml: !-- Controls what version of Lucene various components of Solr adhere to. Generally, you want to use the latest version to get all bug fixes and improvements. It is highly recommended that you fully re-index after

Re: Continuous update on progress of New SolrCloud Design work

2011-12-06 Thread Per Steffensen
Andy skrev: Hi, add features corresponding to stuff that we used to use in ElasticSearch Does that mean you have used ElasticSearch but decided to try SolrCloud instead? Yes, or at least we are looking for altertives right now. Considering Solandra, SolrCloud, Katta, Riak Search,

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Thanks, but I don't believe that will do it. From my understanding that does not control the index version written, it's used to control the behavior of some analyzers (taken from some googling). I'd love if someone told me otherwise though. On Tue, Dec 6, 2011 at 3:48 PM, Alireza Salimi

Re: UUID field changed when document is updated

2011-12-06 Thread Chris Hostetter
: I've been trying to use the UUIDField in solr to maintain ids of the : pages I've crawled with nutch (as per : http://wiki.apache.org/solr/UniqueKey). The use case is that I want to : have the server able to use these ids in another database for various : statistics gathering. So I want the

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Jamie - I think the best thing that you could do here would be to lock in a version of Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly not out of the realm of possibilities of some upcoming SolrCloud capability that requires some upgrading of Lucene though, but you

RE: Sharing dih dictionaries

2011-12-06 Thread Dyer, James
Just FYI that the final piece of SOLR-2382 has not been committed, and instead has been spun off to SOLR-2943. So it you're using Trunk and you need the ability to persist a cache on disk and then read it back again later as an DIH entity, you'll need both SOLR-2943 and also a cache

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
So if I wanted to used lucene index 3.5 with SolrCloud I should be able to just move the 3.5 jars in and remove any of the snapshot jars that are present when I build locally? On Tue, Dec 6, 2011 at 4:06 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Jamie - I think the best thing that you

debugging failed documents

2011-12-06 Thread Alan Miller
Just getting started with DIH and I have a very simple setup. My dih-config.xml is querying my postgres db and does a select on a crosstab() table that returns just 100 rows. When i do a full-import i see that 22 docs fail but what debug settings do i have to tweak to see why the docs failed?

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Oh geez... no... I didn't mean 3.x JARs... I meant the trunk/4.0 ones that are there now. Erik On Dec 6, 2011, at 16:22 , Jamie Johnson wrote: So if I wanted to used lucene index 3.5 with SolrCloud I should be able to just move the 3.5 jars in and remove any of the snapshot jars

Re: Document Processing

2011-12-06 Thread Tommaso Teofili
Hello Michael, I can help you with using the UIMA UpdateRequestProcessor [1]; the current implementation uses in-memory execution of UIMA pipelines but since I was planning to add the support for higher scalability (with UIMA-AS [2]) that may help you as well. Tommaso [1] :

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
Problem is that really doesn't help me. We still have the same issue that when the 4.0 becomes final there is no migration utility from this pre 4.0 version to 4.0, right? On Tue, Dec 6, 2011 at 4:36 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Oh geez... no... I didn't mean 3.x JARs... I

To optimize or not - Solr vs Lucene

2011-12-06 Thread Scott Smith
Wasn't sure which mailing list to send this to. I'm writing an application that can be configured to run directly with lucene or with solr and I'm trying to figure out whether optimization of the index should be totally eliminated, eliminated in the lucene case only or what. If I read the 3.5

Re: Solr Lucene Index Version

2011-12-06 Thread Erik Hatcher
Right. Not sure what to advise you. We have worked on this problem with our LucidWorks platform and have some tools available to do this sort of thing, I think, but it's not generally something that you can do with Lucene going from a snapshot to a released version. Perhaps others with

Re: Solr Lucene Index Version

2011-12-06 Thread Jamie Johnson
What about modifying something like SolrIndexConfig.java to change the lucene version that is used when creating the index? (may not be the right place, but is something like this possible?) On Tue, Dec 6, 2011 at 5:13 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Right.  Not sure what to

Re: To optimize or not - Solr vs Lucene

2011-12-06 Thread Yonik Seeley
On Tue, Dec 6, 2011 at 5:04 PM, Scott Smith ssm...@mainstreamdata.com wrote: If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is rarely justified with the current lucene index implementation It's functionality is not being deprecated... it's just that the method is

Re: Lucene 4.0 Index Format

2011-12-06 Thread Jamie Johnson
Thanks for the response Mark. Is there any details on the expected Freeze date (not looking for exacts) for this? I'm thinking I'm going to catch hell if I tell our team we need to reindex the entire data set. On Tue, Dec 6, 2011 at 1:25 PM, Mark Miller markrmil...@gmail.com wrote: On Tue, Dec

RE: debugging failed documents

2011-12-06 Thread Young, Cody
In my experience with DIH, the errors for failed documents end up in the log files. Catalina.out for Tomcat. Can you check your log files? Cody -Original Message- From: Alan Miller [mailto:alan.mill...@gmail.com] Sent: Tuesday, December 06, 2011 1:25 PM To: Solr Subject: debugging

Re: Attempting to achieve something similar to PostgreSQL's pg_trgm / K-NN combo with Solr

2011-12-06 Thread Chris Hostetter
: I'm working on using trigrams for similarity matching on some data, : where there's a canonical name and lots of personalised variants, e.g.: : : canonical: My Wonderful Thing : variant: My Wonderful Thing (for Matt Patterson) I'm really not sure why you would need trigrams for something

Re: two word phrase search using dismax

2011-12-06 Thread Erick Erickson
OK, why not just bump the boost on the site field way higher than you already have? A note of caution. You'll drive yourself crazy trying to get *exact* ordering based on some arbitrary (and usually changing) set of requirements. Put what you have working in front of product management and see if

Re: Solr's FieldValueCache and Lucene's FieldCache

2011-12-06 Thread Erick Erickson
Cool! thanks, Hoss. On Mon, Dec 5, 2011 at 6:40 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Have you looked at: : http://wiki.apache.org/solr/SolrCaching this page was actually a little light on details about fieldValueCache, so i tried to fill in some of hte blanks in the latest

Re: Solr tf ifd

2011-12-06 Thread Koji Sekiguchi
(11/12/07 3:42), Nejla Karacan wrote: Hello, I need the tf-idf-values from texts and now Im using Apache-Solr. I am a novice and have some Problems. My question is, how can I extract the tf-idf-values? Nejla, You can use TermVectorComponent on your field which is needed to be set

Invoking an updateRequestProcessorChain from updateHandler

2011-12-06 Thread Jan
Hi all, I'm wondering if it's possible to configure solrconfig.xml so that the updateHandler invokes an updateRequestProcessorChain? At the moment I have modified the /update requestHandler to invoke an updateRequestProcessorChain, which is working nicely. The catch is that I have to POST

Re: Invoking an updateRequestProcessorChain from updateHandler

2011-12-06 Thread Mark Miller
You should use the LWE forums for questions about it. The crawlers are hard coded to use the lucid-update-chain currently. If you want them to use the UIMA processor you will have to modify that chain definition to include it. On Dec 6, 2011, at 8:16 PM, Jan wrote: Hi all, I'm wondering

Re: [Announce] Solr-RA, Solr with RankingAlgorithm

2011-12-06 Thread yu shen
thanks for the information 2011/12/6 Nagendra Nagarajayya nnagaraja...@transaxtions.com Spark: The code is compiled to be compliant with JDK 1.5 and above. So you will need to use at least JDK 1.5 for this to work. BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar in

Re: Solr Version Upgrade issue

2011-12-06 Thread Pawan Darira
I checked that. there are only latest jars. I am not able to figure out the issue. On Tue, Dec 6, 2011 at 6:57 PM, Mark Miller markrmil...@gmail.com wrote: Looks like you must have a mix of old and new jars. On Tuesday, December 6, 2011, Pawan Darira pawan.dar...@gmail.com wrote: Hi I

Re: Memory Leak in Solr?

2011-12-06 Thread Samarendra Pratap
Hi, one of problem is now alleviated. Number of lines with can't identify protocol in lsof output is now reduced very much. Earlier it kept increasing upto ulimit -n thus causing Too many open files error but now it is contained to a quite lesser number. This happened after I changed

cache monitoring tools?

2011-12-06 Thread Dmitry Kan
Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of

Re: Sharing dih dictionaries

2011-12-06 Thread Mikhail Khludnev
AFAIK DIH jar is separated from Solr war. Isn't there a chance to use DIH from 4.0 in Solr 3.4? James, Sorry for hijacking the thread. But, do you have a chance to review https://issues.apache.org/jira/browse/SOLR-2947 I want to provide a patch for fixing multi-threading in DIH. But formally

Solr or SQL fultext search

2011-12-06 Thread Mersad
hi Everyone, I am wondering how much benefit I get if I move from SQL server to Solr in my text -baed search project. Any help is apprechiated ! best Mersad

Re: Solr request handler queries in fiddler

2011-12-06 Thread Dmitry Kan
If you mean debugging the queries, you can use eclipse+jetty plugin setup ( http://code.google.com/p/run-jetty-run/) with solr web app ( http://hokiesuns.blogspot.com/2010/01/setting-up-apache-solr-in-eclipse.html ) On Tue, Dec 6, 2011 at 2:57 PM, Kashif Khan uplink2...@gmail.com wrote: Hi all,