Unknown field

2012-05-17 Thread Tolga
Hi, Is there a way what fields to add to schema.xml prior to crawling with nutch, rather than crawling over and over again and fixing the fields one by one? Regards,

Re: Question about sampling

2012-05-17 Thread Lance Norskog
Yes. The trick is to use a hash value on each document. The SignatureUpdateProcessor provides a tool for this. Store the hash value in a hex string field. Now, do wildcard queries on the hash string: hash:a* will randomly choose 1/16 of the documents. hash:00* will pick 1/256 of the documents. On

Re: fq syntax question

2012-05-17 Thread Chris Hostetter
: No. fq queries are standard syntax queries. But they can be arbitrarily : complex, i.e. fq=model:(member OR new_member) using param refrences, you can also do some interesting things like... fq={!term f=model v=$model}&model=member ...which can come in handy for hardcoding certain rules

Re: Posting JSON Data to Solr using XHR?

2012-05-17 Thread Chris Hostetter
: I am trying to post JSON Data to Solr using XHR / JQuery and it doesn't seem You are not POSTing any JSON data. In this method... : var jqxhr = $.post(url, { "id" : "978-0545139700", : "cat" : "book", : "name" : "Harry Potter and the Deathly Hallows", :

Re: Urgent! Highlighting not working as expected

2012-05-17 Thread Chris Hostetter
copyField is a literal operation that happens at index time -- but it really has no bearing what so ever on highlighting done at query time. there is no "memory" of what source fields any values came from, so it doesn't affect things in any way. You haven't provided any details about your sch

Re: highlighter not respecting sentence boundry

2012-05-17 Thread abhayd
hi It did work in many cases but now is see many cases where it is not working. Is this something to do with analysis. I m using word delimiter factory on the field which is being used as hi.field. Should this field be not tokenized? use one field for search and copy of it for hl.field? -- V

Re: Exception in DataImportHandler (stack overflow)

2012-05-17 Thread Shawn Heisey
On 5/17/2012 3:01 PM, Dyer, James wrote: Do you think this behavior is because, while the indexing is paused, you reach some type of timeout so either your db or the jdbc cuts the connection? Or, ar you thinking something in the DIH/JDBCDataSource code is causing the connection to drop under

Re: boost not showing up in Solr 3.6 debugQueries?

2012-05-17 Thread Robert Muir
On Thu, May 17, 2012 at 4:51 PM, Tom Burton-West wrote: > But in Solr 3.6 I am not seeing the boost factor called out. > >  On the other hand it looks like it may now be incoroporated in the > queryNorm (Please see example below). > > Is there a bug in Solr 3.6 debugQueries?  Is there some new be

RE: Exception in DataImportHandler (stack overflow)

2012-05-17 Thread Dyer, James
Shawn, Do you think this behavior is because, while the indexing is paused, you reach some type of timeout so either your db or the jdbc cuts the connection? Or, ar you thinking something in the DIH/JDBCDataSource code is causing the connection to drop under these circumstances? James Dyer E-

Re: Exception in DataImportHandler (stack overflow)

2012-05-17 Thread Shawn Heisey
On 5/15/2012 3:42 PM, Jon Drukman wrote: I fixed it for now by upping the wait_timeout on the mysql server. Apparently Solr doesn't like having its connection yanked out from under it and/or isn't smart enough to reconnect if the server goes away. I'll set it back the way it was and try yo

boost not showing up in Solr 3.6 debugQueries?

2012-05-17 Thread Tom Burton-West
Hello all, In Solr 3.4, the boost factor is explicitly shown in debugQueries: 0.37087926 = (MATCH) sum of: 0.3708323 = (MATCH) weight(ocr:dog^1000.0 in 215624), product of: 0.995 = queryWeight(ocr:dog^1000.0), product of: 1000.0 = boost 2.32497 = idf(docFreq=237626, maxDocs

Re: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Jack Krupansky
SKU should be type "string" and then SKU_text would be your text type. Or, you can do it the opposite: SKU would be text and SKU_string for the raw string value for precise wildcards and faceting. The Solr example does have "sku" as a text field. You can do it that way or the opposite. Whiche

RE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Prachi Phatak
So do you mean I should change it " class="solr.TextField" to " class="solr.StrField"? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 17, 2012 3:00 PM To: solr-user@lucene.apache.org Subject: Re: org.apache.solr.common.SolrException: org.apa

Re: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Jack Krupansky
Sorry, my suggestion for the escaped left parenthesis is if you change SKU to be a string field. And then have SKU_text as a copy of that field (add a copyField to your schema.xml for SKU to SKU_text) but with some "text" type - then you could simply say SKU_text:soft . -- Jack Krupansky

Re: using Tika (ExtractingRequestHandler)

2012-05-17 Thread Ahmet Arslan
> i'm looking at using Tika to index a > bunch of documents. the wiki page seems to be a little bit > out of date ("// TODO: this is out of date as of Solr 1.4 - > dist/apache-solr-cell-1.4.jar and all of > contrib/extraction/lib are needed") and it also looks a > little incomplete. > > is there a

Re: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Jack Krupansky
Code? I'm not sure what you're referring to. These changes are in schema.xml and solrconfig.xml. In your query, you need to change: SKU:soft(*^1.0 to SKU:soft\(*^1.0 -- Jack Krupansky -Original Message- From: Prachi Phatak Sent: Thursday, May 17, 2012 3:25 PM To: solr-user@lucene.

RE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Prachi Phatak
Can I do this in the configuration or I have to change my code. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 17, 2012 2:23 PM To: solr-user@lucene.apache.org; Prachi Phatak Subject: Re: org.apache.solr.common.SolrException: org.apache.lucen

Re: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Jack Krupansky
Okay, first, now that we can see your data, it looks to me like you should keep it in two fields: 1) a "string" field for exact match, faceting, and precise wildcarding, and 2) copy to a "text" field for searching by keyword. For the latter, use a field type/analyzer comparable to "text_en_split

using Tika (ExtractingRequestHandler)

2012-05-17 Thread Welty, Richard
i'm looking at using Tika to index a bunch of documents. the wiki page seems to be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 - dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed") and it also looks a little incomplete. is there an actual list

org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse

2012-05-17 Thread Prachi Phatak
 My configuration > >      >        >        words="stopwords.txt" enablePositionIncrements="true" /> >         maxGramSize="15" side="front"/> >        >         >      >      >        >        ignoreCase="true" expand="true"/> >         maxGramSize="15" side="front"/> >                 

RE: Use DIH with more than one entity at the same time

2012-05-17 Thread Dyer, James
The wiki here indicates that you can specify "entity" more than once on the request and it will run multiple entities at the same time, in the same handler: http://wiki.apache.org/solr/DataImportHandler#Commands But I can't say for sure that this actually works! Having been in the DIH code, I

Re: Use DIH with more than one entity at the same time

2012-05-17 Thread Jack Krupansky
Okay, the answer is “Yes, sort of, but...” “One annoyance is because of how DIH is designed, you need a separate handler set up in solrconfig.xml for each DIH you plan to run. So you have to plan in advance how many DIH instances you want to run, which config files they'll use, etc.” See: htt

Issue with DIH when database is down

2012-05-17 Thread Rahul Warawdekar
Hi, I am using Solr 3.4 on Tomcat 6 and using DIH to index data from a MS SQL Server 2008 database. In case my database is down, or is refusing connections due to any reason, DIH throws an exception as mentioned below "org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to exec

Re: Use DIH with more than one entity at the same time

2012-05-17 Thread Sergio Martín Cantero
Thanks Jack, but that´s not what I want. I don´t want multiple entities in one invocation, but two simultaneous invocations of the DIH with different entities. Thanks.

Re: Use DIH with more than one entity at the same time

2012-05-17 Thread Jack Krupansky
Yes. From the doc: "Multiple 'entity' parameters can be passed on to run multiple entities at once. If nothing is passed, all entities are executed." See: http://wiki.apache.org/solr/DataImportHandler But that is one invocation of DIH, not two separate updates as you tried. -- Jack Krupansky

Re: highlighter not respecting sentence boundry

2012-05-17 Thread abhayd
hi Added hl.useFastVectorHighlighter=true to query. I was already doing term vectors. This worked like a charm. -- View this message in context: http://lucene.472066.n3.nabble.com/highlighter-not-respecting-sentence-boundry-tp3984327p3984416.html Sent from the Solr - User mailing list archive at

Use DIH with more than one entity at the same time

2012-05-17 Thread Sergio Martín Cantero
I´m new to this list, so... Hello everybody. I´m trying to run the DIH with more than one entity at the same time, but only the first entity I call is being indexed. The other doesn´t get any response. For example: First call: http://localhost:8080/solr/dataimport?command=full-import&clean=fal

RE: Issue in Applying patch file

2012-05-17 Thread Dyer, James
Recently Lucene/Solr went to a new build process using Ivy. Simply put, dependent .jar files are no longer checked in with Lucene/Solr sources. Instead while building, Ivy now downloads them from 'repo1.maven.org' From the error you sent it seems like you do not have access to the Maven repos

Re: Solr string field stripping new lines & line breaks

2012-05-17 Thread jacousteau
Thank you, but I actually just forgot to reload the core0 when I changed the field type. oops. On Thu, May 17, 2012 at 3:52 PM, iorixxx [via Lucene] < ml-node+s472066n3984405...@n3.nabble.com> wrote: > > Hi, is there any way to preserve > > newlines or line breaks when submitting > > content to a

Re: Indexing & Searching MySQL table with Hindi and English data

2012-05-17 Thread Ahmet Arslan
> A search with keyword in Hindi retrieve emptly result > set.  Also a > retrieved hindi record displays junk characters. Could it be URIEncoding setting of your servlet container? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Re: Solr string field stripping new lines & line breaks

2012-05-17 Thread Ahmet Arslan
> Hi, is there any way to preserve > newlines or line breaks when submitting > content to a Solr string field? String is indexed verbatim. Are you using wt=xml in a browser? Try using wt=php

Solr string field stripping new lines & line breaks

2012-05-17 Thread jacousteau
Hi, is there any way to preserve newlines or line breaks when submitting content to a Solr string field? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-string-field-stripping-new-lines-line-breaks-tp3984384.html Sent from the Solr - User mailing list archive at Nabble.co

Re: indexing Dublin core xml files

2012-05-17 Thread Jack Krupansky
You will first have to map your xml files into Solr xml format. You will have to do that yourself outside of Solr. At the same time, you should map any DCMI metadata field names to the corresponding field names, such as "dc:title" to "title". A number of the DC field names are already in the So

RE: Issue in Applying patch file

2012-05-17 Thread mechravi25
Hi James, Thank you for your reply. That issue got resolved;but now, when Im trying to build the solr using "ant dist" command, its resulting in the following error. [ivy:retrieve] :: resolving dependencies :: org.apache.lucene#analyzers-phonetic;working@XXXYYN [ivy:retrieve] confs: [default] [

RE: Issue in Applying patch file

2012-05-17 Thread mechravi25
Hi, Thank you for your reply . That error was resolved but now Im not able to build the solr project using "ant dist" to generate the war file. It is resulting in the following error. - | |

Spellcheck suggestions unavailable during rebuild

2012-05-17 Thread Andrei Amariei
Hello, I am using Solr 3.5.0 with a IndexBasedSpellChecker configured, and I noticed that during rebuild, suggestions are not available. After looking at the source code, I saw that IndexBasedSpellChecker.build(...) calls spellchecker.clearIndex() before spellchecker.indexDirectory(...) and I t

RE: Workaround needed to sort on Multivalued fields indexed in SOLR

2012-05-17 Thread Bob Sandiford
How are you hoping that Sort will work on a multivalued field? Normally, trying to do this makes no sense. For example, if you have two authors for a document: Smith, John Jones, Joe Then would you expect the document to sort under 'S' for Smith, or 'J' for Jones? There's prob

Re: Sorting fields of text_general fieldType

2012-05-17 Thread Ahmet Arslan
> The title sort works in a strange manner because the SOLR > server treats > title string based on Upper Case or Lower Case String. Thus > if we sort in > ascending order, first the title with numeric shows up then > the titles in > alphabetical order which starts with Upper Case & after > that th

Re: highlighter not respecting sentence boundry

2012-05-17 Thread Ahmet Arslan
> I also tried boundary scanner > &q=iphone&hl.boundaryScanner=simple&hl.fragsize=200&hl.fragmenter=regex&hl.fl=body hl.boundaryScanner parameter makes sense for FastVectorHighlighter only. To activate it you need to use &hl.useFastVectorHighlighter=true "FastVectorHighlighter requires the fiel