Re: DIH: using variables in nested entities

2010-03-13 Thread Tricia Williams
For anyone interested, my issue (I think) was because I had specified the url field as a multivalued field. I wasn't able to create a test case that emulated my problem. This guess is based on gradual fiddling with my configs. My concern is no longer pressing but I do have a couple

Re: DIH template multivalued fields

2010-03-13 Thread Lukas Kahwe Smith
On 13.03.2010, at 08:01, blargy wrote: I was actually able to accomplish (althought not pretty) what I wanted using a regex transformer. entity name=item transformer=RegexTransformer query=select *, 'valueA, valueB' values from items field

RE: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Christopher Ball
Ok, let me try and explaining what I am hoping to achieve at a higher level: I want to aggressively remove stop words to reduce the size of my index, but there are certain domain specific multiword phrases which include stop words that I need to retain in the index. So I want to stop out words

RE: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread MitchK
Christopher, maybe the SynonymFilter can help you to solve your problem. Let me try to explain: If you create an extra field in the index for your use-case, you can boost matches of them in a special way. The next step is creating an extra synonym-file. as much as = SpecialPhrase1 in amount

How to Combine Dismax Query Handler and Clustering Component

2010-03-13 Thread Allahbaksh Asadullah
Hi, How do we combine clustering component and Dismax query handler? Regards, allahbaksh

DIH - Out of Memory error when using CachedsqlEntityProcessor

2010-03-13 Thread JavaGuy84
Hi, I am using CachedsqlEntityProcessor in my DIH dataconfig to reduce the number of queries executed against the database , Entity1 query=select * from x processor= CachedsqlEntityProcessor/ Entity2 query=select * from y processor= CachedsqlEntityProcessor cachekey=id cachelookup=x.id/ I

Solr Logging XML

2010-03-13 Thread blargy
How can I enable logging of all the xml posted to my Solr server? Is this possible? As of right now all I see in the logs are the request params when querying. While I am on the topic of logging I have one other question too. Is it possible to use custom variables in the logging.properties file

DataImportHandler development console

2010-03-13 Thread blargy
Is there any documentation on this screen? (and dont point me http://wiki.apache.org/solr/DataImportHandler) When using the Full-import, Status, Reload-Config, Document-Count and Full Import With Cleaning everything works as expected but when I use any of the following I get an exception: Debug

Re: DIH - Out of Memory error when using CachedsqlEntityProcessor

2010-03-13 Thread JavaGuy84
Sorry forgot to attach the error log, Error Log: - org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMe moryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:650) at

Re: DataImportHandler development console

2010-03-13 Thread blargy
Also how would one auto-commit after a delta-import? I click on the commit, clean and verbose checkboxes but those seem to have no affect. blargy wrote: Is there any documentation on this screen? (and dont point me http://wiki.apache.org/solr/DataImportHandler) When using the

Re: DIH - Out of Memory error when using CachedsqlEntityProcessor

2010-03-13 Thread Erick Erickson
Have you searched the users' list? This question has come up multiple times and you'll find your question has probably already been answered. Let us know if you come up blank... Best Erick On Sat, Mar 13, 2010 at 3:56 PM, JavaGuy84 bbar...@gmail.com wrote: Sorry forgot to attach the error

Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-13 Thread Lance Norskog
HTMLStripCharFilter is only in the analyzer: it creates searchable terms from the HTML input. The raw HTML is stored and fetched. There are some bugs in term positions and highlighting, An EntityProcessor wrapping the HTMLStripCharFIlter would be really useful. On Tue, Mar 9, 2010 at 5:31 AM,

Re: DIH - Out of Memory error when using CachedsqlEntityProcessor

2010-03-13 Thread JavaGuy84
Erik, I have seen many posts regarding out of memory error but I am not sure whether they are using cachesqlEntityProcessor.. I want to know if there is a way to flush out the buffer of cache instead of storing everything in cache. I can clearly see the heapsize growing like anything if I use

Re: Boundary match as part of query language?

2010-03-13 Thread Lance Norskog
One way is to add magic 'beginning' and 'end' terms, then do phrase searches with those terms. On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, Sometimes you need to anchor your search to start/end of field. Example: 1. title=New York Yankees 2.

Re: SOLR Search Query : Exception : Software caused connection abort: recv failed

2010-03-13 Thread Lance Norskog
It is usually a limitation in the servlet container. You could try using embedded Solr or using an HTTP POST instead of an HTTP GET. However, in this case it is probably not possible. If these long filter queries never change, you could embed these in the solrconfig.xml declaration for a request

Re: Updating FAQ for International Characters?

2010-03-13 Thread Lance Norskog
You might also try using CDATA blocks to wrap your Unicode text. It is usually much easier to view the text while debugging these problems. On Thu, Mar 11, 2010 at 12:13 AM, Eric Pugh ep...@opensourceconnections.com wrote: So I am using Sunspot to post over, which means an extra layer of

Re: DIH - Out of Memory error when using CachedsqlEntityProcessor

2010-03-13 Thread Mark Miller
I don't really follow DataImportHandler, but it looks like its using an unbounded cache (simple HashMap). Perhaps we should make the cache size configurable? The impl seems a little odd - the caching occurs in the base class - so caching impls that extends it don't really have full control -

Re: Cant commit on 125 GB index

2010-03-13 Thread Lance Norskog
What is timing out? The external HTTP request? Commit times are a sawtooth and slowly increase. My record is 59 minutes, but I was doing benchmarking. On Thu, Mar 11, 2010 at 1:46 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Hi, I'm having timeouts commiting on a 125 GB index

RE: Cant commit on 125 GB index

2010-03-13 Thread Frederico Azeiteiro
Yes, the http request is timing out even when using values of 10m. Normally the commit takes about 10s. I did an optimize (it took 6h) and it looks good for now... 59m? well i didn't wait that long, i restarted the solr instance and tried again. I'll try to use autocommit on a near

Managing configuration files/Environment variables

2010-03-13 Thread blargy
How are you guys solving the problem with managing all of your configuration difference between development and production. For example when deploying to production I need to change the data-config.xml (DataImportHandler) database settings. I also have some ant scripts to start/stop tomcat as

Re: DIH field options

2010-03-13 Thread Gregg Hoshovsky
You can use mysql , select *, “staticdata” as staticdata from table x. As long as your field name is staticdata, this should add it there. On 3/12/10 8:39 AM, Tommy Chheng tommy.chh...@gmail.com wrote: Haven't tried this myself but try adding a default value and don't specify it during the

Re: HTMLStripTransformer not working with data importer

2010-03-13 Thread Lance Norskog
DIH has special handling for upper lower case field names. It is possible your config is running afoul of this. Try using different names for the Solr fields than the database fields. On 3/11/10, James Ostheimer james.osthei...@gmail.com wrote: Hi- I can't seem to make any of the

Re: Cant commit on 125 GB index

2010-03-13 Thread Lance Norskog
Commit actions are in the jetty log. I don't have a script to pull them out in a spread-sheet-able form, but that would be useful. On 3/13/10, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Yes, the http request is timing out even when using values of 10m. Normally the commit takes

Re: A bunch of questions

2010-03-13 Thread Mark Miller
On 03/12/2010 09:44 AM, Shawn Heisey wrote: Does SolrCloud's notion of a collection, which appears to use cores, override normal multi-core usage for building an offline index and quickly swapping it into production? A collection will normally be composed of multiple cores. By default

Re: Distributed search fault tolerance

2010-03-13 Thread Mark Miller
On 03/09/2010 04:28 PM, Shawn Heisey wrote: I attended the Webinar on March 4th. Many thanks to Yonik for putting that on. That has led to some questions about the best way to bring fault tolerance to our distributed search. High level question: Should I go with SolrCloud, or stick with

RE: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Christopher Ball
Thank you for the idea Mitch, but it just doesn't seem right that I should have to revert to Scoring when what I really need seems so fundamental. Logically, what I want is a phrase filter factory that would match on phrases listed in a file, like stopwords, but in this case index the match and

Re: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Lance Norskog
CommonGrams is a tool for this. It makes is a into a token, but then is and a are still removed as stopwords. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory On 3/13/10, Christopher Ball christopher.b...@metaheuristica.com wrote: Thank you for the idea

Re: SOLR Search Query : Exception : Software caused connection abort: recv failed

2010-03-13 Thread Mark Miller
You can usually raise the header size limit by editing the config of your servlet container. That can only get you so far though, and different browsers have their own limits. Your best bet, as Lance said, is either posting or sticking them in solconfig. You can post by using the

Re: Best performance for facet dates in trunk using solr.TrieDateField

2010-03-13 Thread Yonik Seeley
On Wed, Mar 3, 2010 at 7:51 AM, Marc Sturlese marc.sturl...@gmail.com wrote: I am testing date facets in trunk with huge index. Aparently, as the default solrconfig.xml shows, the fastest way to run dace facets queries is index the field with this data type:    !-- A Trie based date field for

Re: Distributed search fault tolerance

2010-03-13 Thread Mark Miller
My response to this was mangled by my email client - sorry - hopefully this one comes through a little easier to read ;) On 03/09/2010 04:28 PM, Shawn Heisey wrote: I attended the Webinar on March 4th. Many thanks to Yonik for putting that on. That has led to some questions about the best