Re: DIH transformer script size limitations with Jetty?

2010-08-12 Thread Shalin Shekhar Mangar
On Thu, Aug 12, 2010 at 5:42 AM, harrysmith harrysmith...@gmail.com wrote: To follow up on my own question, it appears this is only an issue when using the DataImport console debugging tools. It looks like when submitting the debugging request, the data-config.xml is sent via a GET request,

Indexing Hanging during GC?

2010-08-12 Thread Rebecca Watson
Hi, When indexing large amounts of data I hit a problem whereby Solr becomes unresponsive and doesn't recover (even when left overnight!). I think i've hit some GC problems/tuning is required of GC and I wanted to know if anyone has ever hit this problem. I can replicate this error (albeit taking

Re: Analysing SOLR logfiles

2010-08-12 Thread Jay Flattery
Thanks - splunk looks overkill. We're extremely small scale - were hoping for something open source :-) - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Wed, August 11, 2010 11:14:37 PM Subject: Re: Analysing SOLR logfiles

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
Hi Robert! Since the example given was http being slow, its worth mentioning that if queries are one word urls [for example http://lucene.apache.org] these will actually form slow phrase queries by default. do you mean that http://lucene.apache.org will be split up into http lucene

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
Hi Tom, I tried again with: queryResultCache class=solr.LRUCache size=1 initialSize=1 autowarmCount=1/ and even now the hitratio is still 0. What could be wrong with my setup? ('free -m' shows that the cache has over 2 GB free.) Regards, Peter. Hi Peter, Can you give

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
Hi Tom! Hi Peter, Can you give a few more examples of slow queries? Are they phrase queries? Boolean queries? prefix or wildcard queries? I am experimenting with one word queries only at the moment. If one word queries are your slow queries, than CommonGrams won't help.

Re: Analysing SOLR logfiles

2010-08-12 Thread Rebecca Watson
we've just started using awstats - as suggested by the solr 1.4 book. its open source!: http://awstats.sourceforge.net/ On 12 August 2010 18:18, Jay Flattery jayc...@rocketmail.com wrote: Thanks - splunk looks overkill. We're extremely small scale - were hoping for something open source :-)

Re: Improve Query Time For Large Index

2010-08-12 Thread Robert Muir
exactly! On Thu, Aug 12, 2010 at 5:26 AM, Peter Karich peat...@yahoo.de wrote: Hi Robert! Since the example given was http being slow, its worth mentioning that if queries are one word urls [for example http://lucene.apache.org] these will actually form slow phrase queries by default.

Re: Multiple Facet Dates

2010-08-12 Thread Raphaël Droz
On 05/08/2010 09:59, Raphaël Droz wrote: Hi, I saw this post : http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html I didn't see work in progress or plans about this feature on the list and bugtracker. Does someone already created a patch, pof, ... I wouldn't have been able

Solr branches

2010-08-12 Thread Tomasz Wegrzanowski
Hi, I'm having oome problems with solr. From random browsing I'm getting an impression that a lot of memory fixes happened recently in solr and lucene. Could you give me a quick summary how (un)stable are different lucene / solr branches and how much improvement I can expect?

Re: Analysing SOLR logfiles

2010-08-12 Thread Peter Karich
I wonder too, that there shouldn't be a special tool which analyzes solr logfiles (e.g. parses qtime, the parameters q, fq, ...) Because there are some other open source log analyzers out there: http://yaala.org/ http://www.mrunix.net/webalizer/ Another free tool is newrelic.com (you will

indexing???

2010-08-12 Thread satya swaroop
Hi all, The indexing part of solr is going good,but i got a error on indexing a single pdf file. when i searched for the error in the mailing list i found that the error was due to copyright of that file. can't we index a file which has copy rights or any digital rights??? regards, satya

Indexing large files using Solr Cell causes OutOfMemory error

2010-08-12 Thread Lannig Carina
Hi, I'm trying to index a txt-File (~150MB) using Solr Cell/Tika. The curl command aborts due to a java.lang.OutOfMemoryError. * java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209)

Re: Solr branches

2010-08-12 Thread Koji Sekiguchi
(10/08/12 21:06), Tomasz Wegrzanowski wrote: Hi, I'm having oome problems with solr. From random browsing I'm getting an impression that a lot of memory fixes happened recently in solr and lucene. Could you give me a quick summary how (un)stable are different lucene / solr branches and how

Re: Schema Definition Question

2010-08-12 Thread kenf_nc
One way I've done to handle this, and it works only for some types of data, is to put the searchable part of the sub-doc in a search field (indexed=true) and put an xml or json representation of the sub-doc in a stored only field. Then if the main doc is hit via search I can grab the xml or json,

Re: Indexing Hanging during GC?

2010-08-12 Thread dc tech
I am a little confused - how did 180k documents become 100m index documents? We use have over 20 indices (for different content sets), one with 5m documents (about a couple of pages each) and another with 100k+ docs. We can index the 5m collection in a couple of days (limitation is in the source)

Re: Solr branches

2010-08-12 Thread Tomasz Wegrzanowski
On 12 August 2010 13:46, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/08/12 21:06), Tomasz Wegrzanowski wrote: Hi, I'm having oome problems with solr. From random browsing I'm getting an impression that a lot of memory fixes happened recently in solr and lucene. Could you give me a quick

Re: Indexing large files using Solr Cell causes OutOfMemory error

2010-08-12 Thread Gora Mohanty
On Thu, 12 Aug 2010 14:32:19 +0200 Lannig Carina lan...@ssi-schaefer-noell.com wrote: Hi, I'm trying to index a txt-File (~150MB) using Solr Cell/Tika. The curl command aborts due to a java.lang.OutOfMemoryError. [...] AFAIK Tika keeps the whole file in RAM and posts it as one single

Re: Indexing Hanging during GC?

2010-08-12 Thread Rebecca Watson
sorry -- i used the term documents too loosely! 180k scientific articles with between 500-1000 sentences each and we index sentence-level index documents so i'm guessing about 100 million lucene index documents in total. an update on my progress: i used GC settings of: -XX:+UseConcMarkSweepGC

Deleting with the DIH sometimes doesn't delete

2010-08-12 Thread Qwerky
I'm doing deletes with the DIH but getting mixed results. Sometimes the documents get deleted, other times I can still find them in the index. What would prevent a doc from getting deleted? For example, I delete 594039 and get this in the logs; 2010-08-12 14:41:55,625 [Thread-210] INFO

index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: *

Re: index pdf files

2010-08-12 Thread Marco Martinez
To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma,

RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much. I didn't know how to make any changes in schema.xml for pdf files. I used solr default schema.xml. Please tell me what I need do in schema.xml. The simple java program I use is following. I also attached that pdf file. I really appreciate your help!

how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user
please excuse this newbie question, but: I want to upgrade solr to a version but not to the latest version in the trunk (because there are so many changes that I would have to test against, and modify my custom classes for, and behavior changes, and deal with the lucene index change, etc) My

Re: Indexing Hanging during GC?

2010-08-12 Thread dc tech
1) I assume you are doing batching interspersed with commits 2) Why do you need sentence level Lucene docs? 3) Are your custom handlers/parsers a part of SOLR jvm? Would not be surprised if you a memory/connection leak their (or it is not releasing some resource explicitly) In general, we have

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread Yonik Seeley
Another option is the 3x branch - that should still be able to read indexes from Solr 1.4/Lucene 2.9 I personally don't expect a 1.5 release to ever materialize. There will eventually be a Lucene/Solr 3.1 release off of the 3x branch, and a Lucene/Solr 4.0 release off of trunk. -Yonik

Re: Solr Doc Lucene Doc !?

2010-08-12 Thread stockii
no help ? =( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1114172.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user
Thanks Yonik but http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt says that the lucene index has changed Upgrading from Solr 1.4 -- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no

Re: Indexing Hanging during GC?

2010-08-12 Thread Rebecca Watson
hi, 1) I assume you are doing batching interspersed with commits as each file I crawl for are article-level each add contains all the sentences for the article so they are naturally batched into the about 500 documents per post in LCF. I use auto-commit in Solr: autoCommit

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread Yonik Seeley
On Thu, Aug 12, 2010 at 12:24 PM, solr-user solr-u...@hotmail.com wrote: Thanks Yonik but http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt says that the lucene index has changed Right - but it will be able to read your older index. Do you need Solr 1.4 to be able

edismax pf2 and ps

2010-08-12 Thread Ron Mayer
Short summary: Is there any way I can specify that I want a lot of phrase slop for the pf parameter, but none at all for the pf2 parameter? I find the 'pf' parameter with a pretty large 'ps' to do a very nice job for providing a modest boost to many documents that are quite well related

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user
no, once upgraded I wouldnt need to have an older solr read the indexes. misunderstood the note. thx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1115694.html Sent from the Solr - User mailing list

RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do it. I defined fields in schema.xml and created data-configuration file by using xpath for xml files. Would you please tell me if I need do it for pdf files and how I can do?

RE: Improve Query Time For Large Index

2010-08-12 Thread Burton-West, Tom
Hi Peter, If hits aren't showing up, and you aren't getting any queryResultCache hits even with the exact query being repeated, something is very wrong. I'd suggest first getting the query result cache working, and then moving on to look at other possible bottlenecks. What are your

Re: index pdf files

2010-08-12 Thread Stefan Moises
Maybe this helps: http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 Cheers, Stefan Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]: Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do

Re: Solr Doc Lucene Doc !?

2010-08-12 Thread kenf_nc
Are you just trying to learn the tiny details of how Solr and DIH work? Is this just an intellectual curiosity? Or are you having some specific problem that you are trying to solve? If you have a problem, could you describe the symptoms of the problem? I am using Solr, DIH, and several other

Results from More then One Cors?

2010-08-12 Thread Jörg Agatz
Hallo Users... I tryed to get results from more then one Cores.. But i dont know how.. Maby you have a Idea.. I need it into PHP King

RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! I defined dynamic field in schema.xml as following: dynamicField name=metadata_* type=string indexed=true stored=true multiValued=false/ But I wonder what I should put for uniqueKey/uniqueKey. I really appreciate your help! -Original Message- From:

Re: Solr Doc Lucene Doc !?

2010-08-12 Thread stockii
i write a little thesis about this. and i need to know how solr is using lucene -in which way. in example of using dih and searching. so for my better understanding .. ;-) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1118089.html Sent from

RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much. I got it work now. I really appreciate your help! Xiaohui -Original Message- From: Stefan Moises [mailto:moi...@shoptimax.de] Sent: Thursday, August 12, 2010 1:58 PM To: solr-user@lucene.apache.org Subject: Re: index pdf files Maybe this helps:

possible bug in sorting by Function?

2010-08-12 Thread solr-user
I was looking at the ability to sort by Function that was added to solr. For the most part it seems to work. However solr doesn't seem to like to sort by certain functions. For example, this sum works:

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user
small typo in last email: second sum should have been hsin, but I notice that the problem also occurs when I leave it as sum -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118260.html Sent from the Solr - User mailing list

Field getting tokenized prior to charFilter on select query

2010-08-12 Thread Andrew Chalupa
I'm attempting to make use of PatternReplaceCharFilterFactory, but am running into issues on both 1.4.1 ( I ported it) and on nightly (4.0-2010-07-27).  It seems that on a real query the charFilter isn't executed prior to the tokenizer. I modified the example configuration included in the

XSL import/include relative to app server home directory...

2010-08-12 Thread Brian Johnson
Hello, I'm customizing my XML response using with the XSLTResponseWriter using wt=xslttr=transform.xsl. Because I have a few use-cases to support, I wanted to break up the common bits and import/include them from multiple top level xslt files, but it appears that the base directory of the

Require some advice

2010-08-12 Thread Pavan Gupta
Hi, I am new to text search and mining and have been doing research for different available products. My application requires reading a SMS message (unstructured) and finding out entities such as person name, area, zip , city and skills associated with the person. SMS would be in form of free

RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I got the following error when I index some pdf files. I wonder if anyone has this issue before and how to fix it. Thanks so much in advance! *** html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP

Free Webinar: Findability: Designing the Search Experience

2010-08-12 Thread Erik Hatcher
Here's perhaps the coolest webinar we've done to date, IMO :) I attended Tyler's presentation at Lucene EuroCon* and thoroughly enjoyed it. Search UI/UX is a fascinating topic to me, and really important to do well for the applications most of us are building. I'm pleased to pass along

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user
problem could be related to some oddity in sum()?? some more examples: note: Latitude and Longitude are fields of type=double works: http://10.0.11.54:8994/solr/select?q=*:*sort=sum(sum(1,1.0))%20asc http://10.0.11.54:8994/solr/select?q=*:*sort=sum(Latitude,Latitude)%20asc

RE: Require some advice

2010-08-12 Thread Michael Griffiths
Solr is a search engine, not an entity extraction tool. While there are some decent open source entity extraction tools, they are focused on processing sentences and paragraphs. The structural differences in text messages means you'd need to do a fair amount of work to get decent entity

SOLR-788 - disributed More Like This

2010-08-12 Thread Shawn Heisey
I tried some time ago to use SOLR-788. Ultimately I was able to get both patch versions to apply (separately), but neither worked. The suggestion I received when I commented on the issue was to download the specific release mentioned in the patch and then update, but the patch was created

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user
issue resolve. problem was that solr.war was silently not being overwritten by new version. will try to spend more time debugging before posting. -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1121349.html Sent from the Solr -

Re: General questions about distributed solr shards

2010-08-12 Thread Shawn Heisey
On 8/11/2010 3:27 PM, JohnRodey wrote: 1) Is there any information on preferred maximum sizes for a single solr index. I've read some people say 10 million, some say 80 million, etc... Is there any official recommendation or has anyone experimented with large datasets into the tens of

Re: clustering component

2010-08-12 Thread Matt Mitchell
Hey thanks Stanislaw! I'm going to try this against the current trunk tonight and see what happens. Matt On Wed, Jul 28, 2010 at 8:41 AM, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: The patch should also work with trunk, but I haven't verified it yet. I've just added a

Hierarchical faceting

2010-08-12 Thread Mats Bolstad
Hey all, I am doing a search on hierarchical data, and I have a hard time getting my head around the following problem. I want a result as follows, in one single query only: USA (3) California (2) Arizona (1) Europe (4) Norway (3) Oslo (3) Sweden (1) How it looks in the XML/JSON response

Re: Phrase search

2010-08-12 Thread Chris Hostetter
: I'm trying to match Apple 2 but not Apple2 using phrase search, this is why I have it quoted. : I was under the impression --when I use phrase search-- all the : analyzer magic would not apply, but it is!!! Otherwise, how would I : search for a phrase?! well .. yes ... even with phrase

Re: Solr query result cache size and expire property

2010-08-12 Thread Chris Hostetter
: please help - how can I calculate queryresultcache size (how much RAM should : be dedicated for that). I have 1,5 index size, 4 mio docs. : QueryResultWindowSize is 20. : Could I use expire property on the documents in this cache? There is no expire property, items are automaticly removed from

Re: How to extend the BinaryResponseWriter imposed by Solrj

2010-08-12 Thread Chris Hostetter
: I'm trying to extend the writer used by solrj : (org.apache.solr.response.BinaryResponseWriter), i have declared it in ... : I see that it is initialized, but when i try to set the 'wt' param to : 'myWriter' : : solrQuery.setParam(wt,myWriter), nothing happen, it's still using the :

can searcher.getReader().getFieldNames() return only stored fields?

2010-08-12 Thread Gerald
CollectionString myFL = searcher.getReader().getFieldNames(IndexReader.FieldOption.ALL); will return all fields in the schema (i.e. index, stored, and indexed+stored). CollectionString myFL = searcher.getReader().getFieldNames(IndexReader.FieldOption.INDEXED ); likely returns all fields that

Re: Data Import Handler Query

2010-08-12 Thread Manali Joshi
Thanks Alexey. That solved the issue. I am now able to get all images information in the index. On Thu, Aug 12, 2010 at 12:47 AM, Alexey Serba ase...@gmail.com wrote: Try to define image solr fields - db columns mapping explicitly in image entity, i.e. entity name=image query=select

Re: Duplicate a core

2010-08-12 Thread Chris Hostetter
: Is it possible to duplicate a core? I want to have one core contain only : documents within a certain date range (ex: 3 days old), and one core with : all documents that have ever been in the first core. The small core is then : replicated to other servers which do real-time processing on it,

SOLR Query

2010-08-12 Thread Moiz Bhukhiya
Hi there, I've a problem querying SOLR for a specific field with a query string that contains spaces. I added following lines in the schema.xml to add my own defined fields. Fields are: ap_name, ap_address, ap_dob, ap_desg, ap_sec. Since all these fields are beginning with ap_, I included the

Re: analysis tool vs. reality

2010-08-12 Thread Chris Hostetter
: Furthermore, I would like to add its not just the highlight matches : functionality that is horribly broken here, but the output of the analysis : itself is misleading. : : lets say i take 'textTight' from the example, and add the following synonym: : : this is broken = broke : : the query

Re: analysis tool vs. reality

2010-08-12 Thread Robert Muir
On Thu, Aug 12, 2010 at 7:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: You say it's bogus because the qp will divide on whitesapce first -- but you're assuming you know what query parser will be used ... the field query parser (to name one) doesn't split on whitespace first. That's

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-12 Thread Chris Hostetter
: It returns in around a second. When I execute the attached code it takes just : over three minutes. The optimal for me would be able get closer to the : performance I'm seeing with curl using Solrj. I think your problem may be that StreamingUpdateSolrServer buffers up commands and sends

Re: analysis tool vs. reality

2010-08-12 Thread Chris Hostetter
: You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace first. That's my : point: analysis.jsp doesn't make any assumptions about what query

Re: Hierarchical faceting

2010-08-12 Thread Jayendra Patil
We were able to get the hierarchy faceting working with a work around approach. e.g. if you have Europe//Norway//Oslo as an entry 1. Create a new multivalued field with string type field name=country_facet type=string indexed=true stored=true multiValued=true/ 2. Index the field for

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-12 Thread Chris Hostetter
: : That should still be true in the the official 4.0 release (i really should : have said When 4.0 can no longer read SOlr 1.4 indexes), ... : i havne't been following the detials closely, but i suspect that tool : hasn't been writen yet because there isn't much point until the full :

Re: edismax pf2 and ps

2010-08-12 Thread Jayendra Patil
We pretty much had the same issue, ended up customizing the ExtendedDismax code. In your case its just a change of a single line addShingledPhraseQueries(query, normalClauses, phraseFields2, 2, tiebreaker, pslop); to addShingledPhraseQueries(query, normalClauses, phraseFields2,

Re: DIH and multivariable fields problems

2010-08-12 Thread Lance Norskog
Please add a JIRA issue for this. https://issues.apache.org/jira/secure/BrowseProject.jspa On Tue, Aug 10, 2010 at 6:59 PM, kenf_nc ken.fos...@realestate.com wrote: Glad I could help. I also would think it was a very common issue. Personally my schema is almost all dynamic fields. I have

Re: DataImportHandler in Solr 1.4.1: exception handling in FileListEntityProcessor

2010-08-12 Thread Lance Norskog
Please add a JIRA issue for this. On Wed, Aug 11, 2010 at 6:24 AM, Sascha Szott sz...@zib.de wrote: Sorry, there was a mistake in the stack trace. The correct one is: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: /home/doe/foo is

Re: analysis tool vs. reality

2010-08-12 Thread Robert Muir
On Thu, Aug 12, 2010 at 8:07 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace

Re: Solr 1.4.1 and 3x: Grouping of query changes results

2010-08-12 Thread Chris Hostetter
: Does not return document as expected: : id:1234 AND (-indexid:1 AND -indexid:2) AND -indexid:3 : : Has anyone else experienced this? The exact placement of the parens isn't : key, just adding a level of nesting changes the query results. ... : I could be wrong but I think this has

Re: index pdf files

2010-08-12 Thread Chris Hostetter
: Subject: index pdf files : References: aanlktim1wgref511p+unovqcu=b0usxnm8vxzn5bu...@mail.gmail.com : 4c63ed43.4030...@r.email.ne.jp : aanlkti=28tulxqjtibrwcbxtok0avwbvbrjnxpdej...@mail.gmail.com : In-Reply-To: aanlkti=28tulxqjtibrwcbxtok0avwbvbrjnxpdej...@mail.gmail.com

Re: Indexing large files using Solr Cell causes OutOfMemory error

2010-08-12 Thread Chris Hostetter
: Subject: Indexing large files using Solr Cell causes OutOfMemory error : References: aanlktinfbtudv4lpjh40vjzderto1-dn7gztnjxfv...@mail.gmail.com : In-Reply-To: aanlktinfbtudv4lpjh40vjzderto1-dn7gztnjxfv...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on

Re: Filter Performance in Solr 1.3

2010-08-12 Thread Lance Norskog
There was a major Lucene change in filter handling from Solr 1.3 to Solr 1.4. They are much much faster in 1.4. Really Lucene 2.4.1 to Lucene 2.9.2. The filter is now consulted much earlier in the search process, thus weeding out many more documents early. It sounds like in Solr 1.3, you should

Re: PDF file

2010-08-12 Thread Chris Hostetter
: Subject: PDF file : References: 20100729152139.321c4...@ibis : aanlktinhby5iasd3q9iep7dr8tymajozvk8curih1...@mail.gmail.com : In-Reply-To: aanlktinhby5iasd3q9iep7dr8tymajozvk8curih1...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When

Re: In multicore env, can I make it access core0 by default

2010-08-12 Thread Chris Hostetter
: In-Reply-To: aanlktimwvhxxdhpup5hl-2e1teh9pu6yetopgu=98...@mail.gmail.com : References: aanlktimwvhxxdhpup5hl-2e1teh9pu6yetopgu=98...@mail.gmail.com : aanlktim46b_hcfpf2r6t=b8y_weq4bbhgi=8mappz...@mail.gmail.com : Subject: In multicore env, can I make it access core0 by default

Re: hl.usePhraseHighlighter

2010-08-12 Thread Chris Hostetter
: Subject: hl.usePhraseHighlighter : References: 1281125904548-1031951.p...@n3.nabble.com : 960560.55971...@web52904.mail.re2.yahoo.com : In-Reply-To: 960560.55971...@web52904.mail.re2.yahoo.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a

Re: Indexing and ExtractingRequestHandler

2010-08-12 Thread Lance Norskog
This is probably true about Luke. The trunk has a new Lucene format and does not read any previous format. The trunk is a busy code base. The 3.1 branch is slated to be the next Solr release, and is probably a better base for your testing. Best of all is to use the Solr 1.4.1 binary release. On

Re: Deleting with the DIH sometimes doesn't delete

2010-08-12 Thread Lance Norskog
Which version of Solr is this? How many documents are there in the index? Etc. It is hard for us to help you without more details. On Thu, Aug 12, 2010 at 8:32 AM, Qwerky neil.j.tay...@hmv.co.uk wrote: I'm doing deletes with the DIH but getting mixed results. Sometimes the documents get

Re: indexing???

2010-08-12 Thread Erick Erickson
Can you provide more details? What is the error you're receiving? What do you think is going on? It might be helpful if you reviewed: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Aug 12, 2010 at 8:21 AM, satya swaroop sswaro...@gmail.com wrote: Hi all, The indexing

Re: Results from More then One Cors?

2010-08-12 Thread Erick Erickson
There is no information to go on here. Please review http://wiki.apache.org/solr/UsingMailingLists and add some more details... Best Erick On Thu, Aug 12, 2010 at 2:09 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: Hallo Users... I tryed to get results from more then one Cores.. But i

Re: SOLR Query

2010-08-12 Thread Erick Erickson
You'll get a lot of insight into what's actually happening if you append debugQuery=true to your queries, or check the debug checkbox in the solr admin page. But I suspect (and it's a guess since you haven't included your schema) that your problem is that you're mixing explicit and default

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-12 Thread Robert Muir
On Thu, Aug 12, 2010 at 8:29 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: It was a big part of the proposal regarding hte creation of hte 3x branch ... that index format compabtibility between major versions would no longer be supported by silently converted on first write -- instead

DataImportHandler and SAXParseExceptions with Jetty

2010-08-12 Thread harrysmith
Win XP, Solr 1.4.1 out of the box install, using jetty. If I add greater than or less than (ie or ) in any xml field and attempt to load or run from the DataImportConsole I receive a SAXParseException. Example follows: If I don't have a 'less than' it works just fine. I know this must work,

Re: SOLR Query

2010-08-12 Thread Moiz Bhukhiya
I tried ap_address:(tom+cruise) and that worked. I am sure its the same problem as you suspected! Thanks a lot Erick( users!) for your time. Moiz On Thu, Aug 12, 2010 at 8:51 PM, Erick Erickson erickerick...@gmail.comwrote: You'll get a lot of insight into what's actually happening if you