Re: Solr index searcher to lucene index searcher

2013-04-26 Thread parnab kumar
Hi , Thanks Chris . For every document that matches the query i want to able to compute the following set of features for a query document pair LuceneScore ( The vector space score that lucene gives to each doc) LinkScore ( computed from nutch ) OpicScore ( computed from

Re: Weird query issues

2013-04-26 Thread Ravi Solr
Thanks Shawn, We are using 3.6.2 client and server. I cleared my browser cache several times while querying (is that similar to clear cache in solrconfig.xml ?). The query is logged in the solrj based client's application container however I see it empty in the solr's application container...so som

Re: Solr index searcher to lucene index searcher

2013-04-26 Thread Chris Hostetter
: used to call the lucene IndexSearcher . As the documents are collected in : TopDocs in Lucene , before that is passed back to Nutch , i used to look : into the top K matching documents , consult some external repository : and further score the Top K documents and reorder them in the TopDocs array

Re: How to get/set customized Solr data source properties?

2013-04-26 Thread Chris Hostetter
: : I am working on a DataSource implementation. I want to get some customized : properties when the *DataSource.init* method is called. I tried to add the ... : : My understanding from looking at other DataSources is that should work. : But initProps.getProperty("my") == nul

Re: Not In query

2013-04-26 Thread Jan Høydahl
I would start with the way you propose, a negative filter q=foo bar&fq=-id:(123 729 640 112...) This will effectively hide those doc ids, and a benefit is that it is cached so if the list of ids is long, you'll only take the performance hit the first time. I don't know your application, but if

IOexception, when using Solr 4.2.1 for indexing

2013-04-26 Thread Sarita Nair
Hi All, I get the error below on trying to index using Solr 4.2.1.  I have a single core setup and use HttpSolrServer with DefaultHttpClient to talk to Solr. #Here is how HttpSolrServer is instantiated: solrServer = new HttpSolrServer( baseURL,         configurator.createHttpClient( new BasicHttp

Re: How to define a generic field to hold all undefined fields

2013-04-26 Thread Jan Høydahl
I can highly recommend reading the documentation before asking questions :) You are using the ExtractingRequestHandler, which is documented on the WIKI like most other stuff is. The fastest way to search for Solr stuff would be using search-lucene.com http://search-lucene.com/?q=extracting+requ

Re: Weird query issues

2013-04-26 Thread Shawn Heisey
On 4/26/2013 1:01 PM, Ravi Solr wrote: > Hello Shawn, > We found that it is unrelated to the group queries instead more > related to the empty queries. Do you happen to know what could cause empty > queries like the following from SOLRJ ? I can generate similar query via > curl hitting the

facet.offset issue (previosly: [solr 3.4] anomaly during distributed facet query with 102 shards)

2013-04-26 Thread Dmitry Kan
Hi list, We have encountered a weird bug related to the facet.offset parameter. In short: the more general query is, that generates lots of hits, the higher the risk of the facet.offset parameter to stop working. In more detail: 1. Since getting all facets we need (facet.limit=1000) from around

Not In query

2013-04-26 Thread André Maldonado
Hi all. We have an index with 300.000 documents and a lot, a lot of fields. We're planning a module where users will choose some documents to exclude from their search results. So, these documents will be excluded for UserA and visible for UserB. So, we have some options to do this. The simplest

Re: Need to log query request before it is processed

2013-04-26 Thread Sudhakar Maddineni
I see.Thanks for sharing. -Sudhakar. On Friday, April 26, 2013, Timothy Potter wrote: > Solved this using a custom SearchHandler and some Log4J goodness. > Posting here in case anyone has need for logging query request before > they are executed, which in my case is useful for tracking any queri

Re: Need to log query request before it is processed

2013-04-26 Thread Timothy Potter
Solved this using a custom SearchHandler and some Log4J goodness. Posting here in case anyone has need for logging query request before they are executed, which in my case is useful for tracking any queries that cause OOMs My solution uses Log4J's NDC support to log each query request before it is

Re: Weird query issues

2013-04-26 Thread Ravi Solr
Hello Shawn, We found that it is unrelated to the group queries instead more related to the empty queries. Do you happen to know what could cause empty queries like the following from SOLRJ ? I can generate similar query via curl hitting the select handler like - http://server:port/solr/sel

Re: Prons an Cons of Startup Lazy a Handler?

2013-04-26 Thread Chris Hostetter
: In short, whether you want to keep the handler is completely independent of : the lazy startup option. I think Jack missread your question -- my interpretation is that you are asking about the pros/cons of removing 'startup="lazy"' ... : : : it startups it lazy. So what is pros and cons for

Re: Exclude Pattern at Dynamic Field

2013-04-26 Thread Jack Krupansky
No, other than to be explicit about individual patterns, which is better anyway. Generally, "*" is a crutch or experimental tool ("Let's just see what all the data and metadata is and then decide what to keep"). It is better to use explicit patterns or static schema for production use. -- Ja

relevance when merging results

2013-04-26 Thread eShard
Hi, I'm currently using Solr 4.0 final on tomcat v7.0.3x I have 2 cores (let's call them A and B) and I need to combine them as one for the UI. However we're having trouble on how to best merge these two result sets. Currently, I'm using relevancy to do the merge. For example, I search for "red"

Re: Customizing Solr GUI

2013-04-26 Thread Alexandre Rafalovitch
So, building on this: 1) Velocity is an option for internal admin interface because it is collocated with Solr and therefore does not 'hide' it 2) Blacklight is the (Rails-based) application layer and the Solr is internal behind it, so it does provide the security. Hope this helps to understand th

Re: DataImportHandler - Indexing xml content

2013-04-26 Thread Alexandre Rafalovitch
Have you looked at: http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource ? Regards, Alex. On Fri, Apr 26, 2013 at 12:29 PM, Peri Subrahmanya wrote: > I have a column in my database that is of type long text and holds xml > content. I was wondering when I define the entity recor

Re: Customizing Solr GUI

2013-04-26 Thread Jack Krupansky
Generally, your UI web pages should communicate with your own application layer, which in turn communicates with Solr, but you should try to avoid having Solr itself visible to the outside world. -- Jack Krupansky -Original Message- From: kneerosh Sent: Friday, April 26, 2013 12:46 P

Re: excluding something from copyfield source?

2013-04-26 Thread Gora Mohanty
On 26 April 2013 20:51, Furkan KAMACI wrote: > Hi; > > I use that: > > > > however I want to exclude something i.e. author field. How can I do that? Instead of using *, use separate copyField directives for the fields that you want copied. You can use more restrictive globs also, e.g., Regards

Customizing Solr GUI

2013-04-26 Thread kneerosh
Hi, I want to customize Solr gui, and I learnt that the most popular options are 1. Velocity- which is integrated with Solr. The format and options can be customized 2. Project Blacklight Pros and cons? Secondly I read that one can delete data by just running a delete query in the URL. Does e

DataImportHandler - Indexing xml content

2013-04-26 Thread Peri Subrahmanya
I have a column in my database that is of type long text and holds xml content. I was wondering when I define the entity record is there a way to provide a custom extractor that will take in the xml and return rows with appropriate fields to be indexed. Thank you, Peri Subrahmanya On 4/26/13

Re: Using another way instead of DIH

2013-04-26 Thread Shawn Heisey
On 4/25/2013 9:00 AM, xiaoqi wrote: > i using DIH to build index is slow , when it fetch 2 million rows , it will > spend 20 minutes , very slow. If it takes 20 minutes for two million records, I'd say it's working very well. I do six simultaneous MySQL imports of 13 million records each. It ta

Exclude Pattern at Dynamic Field

2013-04-26 Thread Furkan KAMACI
I use that at my Solr 4.2.1: however can I exlude some patterns from it?

Re: Solr Indexing Rich Documents

2013-04-26 Thread Furkan KAMACI
Is there any example at wiki for Manifold? 2013/4/26 Ahmet Arslan > Hi Furkan, > > post.jar meant to be used as example, quick start etc. For production > (incremental updates, deletes) consider using http://manifoldcf.apache.orgfor > indexing rich documents. It utilises ExtractingRequestHandle

excluding something from copyfield source?

2013-04-26 Thread Furkan KAMACI
Hi; I use that: however I want to exclude something i.e. author field. How can I do that?

Re: SolrDocument getFieldNames() exclude dynamic fields?

2013-04-26 Thread Luis Lebolo
Apologies, I wasn't storing these dynamic fields. On Fri, Apr 26, 2013 at 11:01 AM, Luis Lebolo wrote: > Hi All, > > I'm using SolrJ's QueryResponse to retrieve all SolrDocuments from a > query. When I use SolrDocument's getFieldNames(), I get back a list of > fields that excludes dynamic field

Re: Solr Indexing Rich Documents

2013-04-26 Thread Ahmet Arslan
Hi Furkan, post.jar meant to be used as example, quick start etc. For production (incremental updates, deletes) consider using http://manifoldcf.apache.org for indexing rich documents. It utilises ExtractingRequestHandler feature of solr. --- On Fri, 4/26/13, Furkan KAMACI wrote: > From: Fur

SolrDocument getFieldNames() exclude dynamic fields?

2013-04-26 Thread Luis Lebolo
Hi All, I'm using SolrJ's QueryResponse to retrieve all SolrDocuments from a query. When I use SolrDocument's getFieldNames(), I get back a list of fields that excludes dynamic fields (even though I know they are not empty). Is there a way to get a list of all fields for a given SolrDocument? Th

RE: Using another way instead of DIH

2013-04-26 Thread Dyer, James
Here are some things I would try: 1. Make sure the parent entity is only returning 1 row per solr document. If not, move the problems joins to child entities to their own queries and child entities. 2. For the child entites, use caching. This prevents the "n+1" select problem. The changes a

Re: SOLR Install

2013-04-26 Thread jnduan
If you unpack the solr.war file,you'll find some configures in web.xml like: SolrRequestFilter org.apache.solr.servlet.SolrDispatchFilter SolrRequestFilter /* Zookeeper org.apache.solr.servlet.ZookeeperInfoServlet

RE: Using another way instead of DIH

2013-04-26 Thread Dyer, James
yes, I misspoke. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: xiaoqi [mailto:belivexia...@gmail.com] Sent: Thursday, April 25, 2013 8:37 PM To: solr-user@lucene.apache.org Subject: RE: Using another way instead of DIH Thanks for help . "data-config.xml" ? i

AutoSuggest+Grouping in one request

2013-04-26 Thread Rounak Jain
Hi everyone, Search dropdowns on popular sites like Amazon (example image) use autosuggested words along with grouping (Field Collapsing in Solr). While I can replicate the same functionality in Solr using two requests (first to obtain suggestions, second for the a

Re: Log Monitor System for SolrCloud and Logging to log4j at SolrCloud?

2013-04-26 Thread Mark Miller
Slf4j is meant to work with existing frameworks - you can set it up to work with log4j, and Solr will use log4j by default in the about to be released 4.3. http://wiki.apache.org/solr/SolrLogging - Mark On Apr 26, 2013, at 7:19 AM, Furkan KAMACI wrote: > I want to use GrayLog2 to monitor my l

Re: SOLR Install

2013-04-26 Thread jnduan
Hi Peri, I think that document mesa you can deploy your own web app and solr in one container like tomcat,but with different context path. If you want to bring solr in your project, you just need add some maven dependencies like: org.apache.solr solr-core 4.2

Re: How to define a generic field to hold all undefined fields

2013-04-26 Thread Jack Krupansky
A dynamic field with the name pattern "*" and a type of "string", stored="true", indexed="true" and multiValued="true" should be good enough for a "generic" field. Generally, only use thing in test/experiment/development. It's not recommended as an approach for production apps. There is a co

Re: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2013-04-26 Thread Furkan KAMACI
I have not indicated a URL and it solved as you mention. Because default URL does not include /extract 2013/4/26 Furkan KAMACI > Ok, solved > > > 2013/4/26 Raymond Wiker > >> On Fri, Apr 26, 2013 at 2:45 PM, Furkan KAMACI > >wrote: >> >> > >> > I use that command to post: >> > java -Durl=http:/

Re: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2013-04-26 Thread Furkan KAMACI
Ok, solved 2013/4/26 Raymond Wiker > On Fri, Apr 26, 2013 at 2:45 PM, Furkan KAMACI >wrote: > > > > > I use that command to post: > > java -Durl=http://localhost:8983/solr/update/extract -Dauto -jar > post.jar > > 523387.pdf > > > > I think you need to have the collection name in the url... som

Re: uniqueKey required false for multivalued id when indexing rich documents

2013-04-26 Thread Gora Mohanty
On 26 April 2013 18:38, Furkan KAMACI wrote: > I am new to Solr and try to index rich files. I have defined that at my > schema: [...] > This will not work: Please see http://wiki.apache.org/solr/UniqueKey for different use cases for the uniqueKey. For documents, I usually use the document name

Re: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2013-04-26 Thread Gora Mohanty
On 26 April 2013 18:15, Furkan KAMACI wrote: > Could anybody help me for my error. When I try to post documents with > post.jar I get that error: [...] > I use that command to post: > java -Durl=http://localhost:8983/solr/update/extract -Dauto -jar post.jar > 523387.pdf The URL should be http://

Re: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2013-04-26 Thread Raymond Wiker
On Fri, Apr 26, 2013 at 2:45 PM, Furkan KAMACI wrote: > > I use that command to post: > java -Durl=http://localhost:8983/solr/update/extract -Dauto -jar post.jar > 523387.pdf > I think you need to have the collection name in the url... something like http://localhost:8983/solr/mycollection/update

How to define a generic field to hold all undefined fields

2013-04-26 Thread Furkan KAMACI
I sen some documents to my Solr to be indexed. However I get such kind of errors: ERROR: [doc=0579B002] unknown field 'name' I know that I should define a field named 'name' at mu schema. However there maybe many of fields like that. How can I define a generic field that holds all non defined val

uniqueKey required false for multivalued id when indexing rich documents

2013-04-26 Thread Furkan KAMACI
I am new to Solr and try to index rich files. I have defined that at my schema: and there is a line at my schema: id should I make it like that: for my purpose?

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI
Jack, thanks for your answers. Ok, when I remove -Durl parameter I think it works, thanks. However I think that I have a problem with my schema. I get that error: Apr 26, 2013 3:52:21 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=/home/ll/Des

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Jack Krupansky
Maybe you are confusing things by mixing instructions - there are SEPARATE instructions for directly using SolrCell and implicitly using it via post.jar. Pick which you want and stick with it. DO NOT MIX the instructions. You wrote: " I run that command: java -Durl=http://localhost:8983/solr/

SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2013-04-26 Thread Furkan KAMACI
Could anybody help me for my error. When I try to post documents with post.jar I get that error: SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/update/extract.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI
I think that I should start a new thread for my question to help people who searches for same situation. 2013/4/26 Furkan KAMACI > If you can help me it would be nice. I get that error: > > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/update/extract.. > Enter

Re: Solr Indexing Rich Documents

2013-04-26 Thread Furkan KAMACI
Thanks for the answer, I get an error now: FileNotFound Exception as I mentioned at other thread. Now I' trying to solve it. 2013/4/26 Jack Krupansky > It's called SolrCell or the ExtractingRequestHandler (/update/extract), > which the newer post.jar knows to use for some file types: > http://wi

Re: Prons an Cons of Startup Lazy a Handler?

2013-04-26 Thread Jack Krupansky
Lazy startup simply means that you are willing to tolerate a slight delay on the first request to that request handler. It also has the side effect that if there are any problems with starting up the handler, they won't be seen until that first request. In short, whether you want to keep the

Re: Lucene native facets

2013-04-26 Thread Jack Krupansky
Sure, but they are completely different conceptual models of faceting - Solr is dynamic, based on the actual data for the hierarchy, while Lucene is static, based on a predefined taxonomy that must be meticulously created before any data is added. Solr answers the question: what structure does

Re: Solr Indexing Rich Documents

2013-04-26 Thread Jack Krupansky
It's called SolrCell or the ExtractingRequestHandler (/update/extract), which the newer post.jar knows to use for some file types: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Friday, April 26, 2013 4:48 AM To: sol

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI
If you can help me it would be nice. I get that error: SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/update/extract.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing f

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Jan Høydahl
http://wiki.apache.org/solr/post.jar -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 26. apr. 2013 kl. 13:28 skrev Furkan KAMACI : > Hi Raymond; > > Now I get that error: SimplePostTool: WARNING: IOException while reading > respons

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI
Hi Raymond; Now I get that error: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: 2013/4/26 Raymond Wiker > You could start by doing > > java post.jar -help > > --- the 7th example shows exactly what you need to do to add a document id. > > On Fri, Ap

Log Monitor System for SolrCloud and Logging to log4j at SolrCloud?

2013-04-26 Thread Furkan KAMACI
I want to use GrayLog2 to monitor my logging files for SolrCloud. However I think that GrayLog2 works with log4j and logback. Solr uses slf4j. How can I solve this problem and what logging monitoring system does folks use?

Atomic Update and stored copy-fields

2013-04-26 Thread raulgrande83
Hello everybody, We are using last version of Solr (4.2.1) and making some tests on Atomic Updates. The Solr wiki says that: /(...) requires that all fields in your SchemaXml must be configured as stored="true" except for fields which are destinations -- which must be configured as stored="false

Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-26 Thread Dmitry Kan
Hi, 1. Ruled out possibility to test 4.2.1 router against 3.4 shard farm for obvious reasons (java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format). 2. Tried jetty, but same result. On Thu, Apr 25, 2013 at 5:16 PM, Dmitry Kan wrote: > Thanks,

Re: Using another way instead of DIH

2013-04-26 Thread xiaoqi
below is my data-import.xml any suggestion ?

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Raymond Wiker
You could start by doing java post.jar -help --- the 7th example shows exactly what you need to do to add a document id. On Fri, Apr 26, 2013 at 11:30 AM, Furkan KAMACI wrote: > I use Solr 4.2.1 and these are my fields: > > multiValued="false" /> > > > > > multiValued="true"/> > > stored=

Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI
I use Solr 4.2.1 and these are my fields: I run that command: java -Durl=http://localhost:8983/solr/update/extract -jar post.jar 523387.pdf However I get that error, any ideas? Apr 26, 2013 12:26:51 PM org.apache.solr.common.SolrException log SEVERE: org.apache

Solr Indexing Rich Documents

2013-04-26 Thread Furkan KAMACI
I have a large corpus of rich documents i.e. pdf and doc files. I think that I can use directly the example jar of Solr. However for a real time environment what should I care? Also how do you send such kind of documents into Solr to index, I think post.jar does not handle that file type? I should

Re: what is the maximum XML file size to import?

2013-04-26 Thread Sharmila Thapa
Thanks to all for your suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263p4059113.html Sent from the Solr - User mailing list archive at Nabble.com.

Lucene native facets

2013-04-26 Thread William Bell
Since facets are now included in Lucene, why don't we add a pass through from Solr? The current facet code can live on but we could create new param like facet.lucene=true? Seems like a great enhancement ! -- Bill Bell billnb...@gmail.com cell 720-256-8076

Prons an Cons of Startup Lazy a Handler?

2013-04-26 Thread Furkan KAMACI
I will use SolrCloud and theis main purpose will be rich document indexing. Solr example includes that definition: it startups it lazy. So what is pros and cons for removing it for my situation?

Re: How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-26 Thread William Bell
Why don't we add a parameter to allow non programmers to change it? Compression=FAST|etc On Thursday, April 25, 2013, Chris Hostetter wrote: > : Subject: How do set compression for compression on stored fields in SOLR > 4.2.1 > : > : https://issues.apache.org/jira/browse/LUCENE-4226 > : It menti

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-26 Thread Dmitry Kan
Alan, Shawn, If backporting to 3.x is hard, no worries, we don't necessarily require the patch as we are heading to 4.x eventually. It is just much easier within our organization to test on the existing solr 3.4 as there are a few of internal dependencies and custom code on top of solr. Also solr

Re: Using another way instead of DIH

2013-04-26 Thread Majirus FANSI
Hi, It simply means the configuration file of your DIH. Cheers On 26 April 2013 03:37, xiaoqi wrote: > Thanks for help . > > "data-config.xml" ? i can not find this file , u mean data-import.xml or > solrconfig.xml ? > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabb