RE: Facet sort numeric values

2012-08-15 Thread Aleksander Akerø
Oh brilliant, didn't think of it being possible to configure that way. Had made my own untokenized type, so I guess it would be better for me to control datatype this way. Bonus question (hehe): What if these field values also contain alphanumeric values? E.g. Alpha, Bravo, Omega, ... How would

Fwd: Solr 3.5 result grouping is failing

2012-08-15 Thread chethan
Hi, I'm trying to group (field collapse) my search results on a field called site. The schema says that it has to be indexed: *field name=site type=string stored=false indexed=true/.* But when I try to query the results with *group.field=sitegroup.limit=100, *I see only 1 group of results being

Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-15 Thread Konrad Lötzsch
I don't know wether this was discussed previously, but if you tell the synonmyfilter to not break your synonyms (which might be the default). In this case, the parts of the synonyms get new word positions. So you could use a Keywordtokenizer to avoid that behaviour: filter

Re: Query regarding dataimporthandler

2012-08-15 Thread Shalin Shekhar Mangar
There is no way to do it within DataImportHandler but you can configure autoCommit in solrconfig.xml to automatically commit pending updates by time or number of documents. On Tue, Aug 14, 2012 at 4:11 PM, ravicv ravichandra...@gmail.com wrote: Hi, Is there any way for intermediate commits

Re: scanned pdf with solr cell

2012-08-15 Thread Ahmet Arslan
When I send a scanned pdf to extraction request handler, below icon appears in my Dock. http://tinypic.com/r/2mpmo7o/6 http://tinypic.com/r/28ukxhj/6 I found that text-extractable pdf files triggers above weird icon too. curl

Re: scanned pdf with solr cell

2012-08-15 Thread Paul Libbrecht
Ahmet, the dock icon appears when AWT starts, e.g. when a font is loaded. You can prevent it using the headless mode but this is likely to trigger an exception. Same if your user is not UI-logged-in. hope it helps. Paul Le 15 août 2012 à 01:30, Ahmet Arslan a écrit : Hi All, I have set

Re: scanned pdf with solr cell

2012-08-15 Thread Ahmet Arslan
the dock icon appears when AWT starts, e.g. when a font is loaded. You can prevent it using the headless mode but this is likely to trigger an exception. Same if your user is not UI-logged-in. Hi Paul, thanks for the explanation. So is it nothing to worry about?

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-15 Thread Tirthankar Chatterjee
Hi Erick, You are so right on the memory calculations. I am happy that I know now that I was doing something wrong. Yes I am getting confused with SQL. I will back up and let you know the use case. I am tracking file versions. And I want to give an option to browse your system for the latest

Re: scanned pdf with solr cell

2012-08-15 Thread Paul Libbrecht
Le 15 août 2012 à 13:03, Ahmet Arslan a écrit : Hi Paul, thanks for the explanation. So is it nothing to worry about? it is nothing to worry about except to remember that you can't run this step in a daemon-like process. (on Linux, I had to set-up a VNC-server for similar tasks) paul

Re: Switch from Sphinx to Solr - some basics please

2012-08-15 Thread Ahmet Arslan
Because I have set a post in Stackoverflow, I wan't, that there is dublicate questions. Can you please read this post: http://stackoverflow.com/questions/11956608/sphinx-user-is-switching-to-solr Your questions require Sphinx knowledge. I suggest you to read these book(s)

Re: Switch from Sphinx to Solr - some basics please

2012-08-15 Thread nnikolay
HI iorixxx, thanks for the reply. Well you don't need sphinx knowledge to answer my questions. I have write you what I want: 1. I need to have 2 seprate indexes. In Stackoverlfow I became the answer I need to start 2 cores for example. How many cores can I run for solr? I have for example over

How to design index for related versioned database records

2012-08-15 Thread Stefan Burkard
Hi solr-users I have a case where I need to build an index from a database. ***Data structure*** The data is spread across multiple tables and in each table the records are versioned - this means that one real record can exist multiple times in a table, each with different validFrom/validUntil

Re: Switch from Sphinx to Solr - some basics please

2012-08-15 Thread Ahmet Arslan
1. I need to have 2 seprate indexes. In Stackoverlfow I became the answer I need to start 2 cores for example. How many cores can I run for solr? Please see : http://search-lucene.com/m/6rYti2ehFZ82 I have for example jobs form country A, jobs from country B and so on until 100

Re: RAMDirectoryFactory bug

2012-08-15 Thread Michael Della Bitta
Hi, Lance, Thanks for your reply! It seems as if RAMDirectoryFactory is being passed the correct path to the index, as it's being logged correctly. It just doesn't recognize it as an index. Michael Della Bitta Appinions | 18 East 41st St., Suite

Re: How to design index for related versioned database records

2012-08-15 Thread Jack Krupansky
The date checking can be implemented using range query as a filter query, such as fq=startDate:[* TO NOW] AND endDate:[NOW TO *] (You can also use an frange query.) Then you will have to flatten the database tables. Your Solr schema would have a single merged record type. You will have to

Re: scanned pdf with solr cell

2012-08-15 Thread Michael Della Bitta
You can try passing -Djava.awt.headless=true as one of the arguments when you start Jetty to see if you can get this to go away with no ill effects. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017

Re: scanned pdf with solr cell

2012-08-15 Thread Ahmet Arslan
You can try passing -Djava.awt.headless=true as one of the arguments when you start Jetty to see if you can get this to go away with no ill effects. I started jetty using : 'java -Djava.awt.headless=true -jar start.jar' and successfully indexed two pdf files. That icon didn't appeared :)

Re: RAMDirectoryFactory bug

2012-08-15 Thread Mark Miller
On Aug 14, 2012, at 4:34 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi everyone, It looks like I found a bug with RAMDirectoryFactory (I know, I know...) Fair warning - RAMDir use in Solr is like a third class citizen. You probably should be using the mmap dir

Re: RAMDirectoryFactory bug

2012-08-15 Thread Michael Della Bitta
Yes, moving to mmap was on our roadmap. I'm in the middle of moving our infrastructure from 1.4 to 3.6.1, and didn't want to make too many changes at the same time. However, this bug might push us over the edge to mmap and away from ram. I'll file a bug regardless. Thanks! Michael Della Bitta

RE: Solr 4.0 - Join performance

2012-08-15 Thread David Smiley (@MITRE.org)
You would index rectangles of 0 height but that have a left edge 'x' of the start time and a right edge 'x' of your end time. You can index a variable number of these per Solr document and then query by either a point or another rectangle to find documents which intersect your query shape. It

Re: Index not loading

2012-08-15 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 5:37 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson erickerick...@gmail.com wrote: This is quite odd, it really sounds like you're not actually committing. So, some questions. 1 What happens if you search before

Re: Switch from Sphinx to Solr - some basics please

2012-08-15 Thread Walter Underwood
These do require some Sphinx knowledge. I could answer them on StackOverflow because I converted Chegg from Sphinx to Solr this year. As I said there, read about Solr cores. They are independent search configurations and indexes within one Solr server: http://wiki.apache.org/solr/CoreAdmin

Re: Duplicated facet counts in solr 4 beta: user error

2012-08-15 Thread Erick Erickson
No problem, and thanks for posting the resolution If you have the time and energy, anyone can edit the Wiki if you create a logon, so any clarification you'd like to provide to keep others from having this problem would be most welcome! Best Erick On Tue, Aug 14, 2012 at 6:13 PM, Buttler,

Re: Facet sort numeric values

2012-08-15 Thread Erick Erickson
the problem you're running into is that lexical ordering of numeric data != numeric ordering. If you have a mixed alpha and numeric data, you man not care if the alpha stuff is first, i.e. asdb456 asdf490 sorts fine. Problems happen with 9jsdf 100ukel the 100ukel comes first. So if you have a

Re: Solr 3.5 result grouping is failing

2012-08-15 Thread Erick Erickson
Please attach the results of adding debugQuery=on to your query in both the success and failure case, there's very little information to go on here. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Wed, Aug 15, 2012 at 12:57 AM, chethan chethan.p...@gmail.com wrote:

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-15 Thread Erick Erickson
No, sharding into multiple cores on the same machine still is limited by the physical memory available. It's still lots of stuf on a limited box. But try backing up and re-thinking the problem a bit. Some possibilities off the top of my head: 1 have a new field current. when you update a

Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-08-15 Thread David Smiley (@MITRE.org)
Hey solr-user, are you by chance indexing LineStrings? That is something I never tried with this spatial index. Depending on which iteration of LSP you are using, I figure you'd either end up indexing a vast number of points along the line which would be slow to index and make the index quite

Does DataImportHandler do any sanitizing?

2012-08-15 Thread Jon Drukman
I am pulling some fields from a mysql database using DataImportHandler and some of them have invalid XML in them. Does DataImportHandler do any kind of filtering/sanitizing to ensure that it will go in OK or is it all on me? Example bad data: orphaned ampersands (Peanut Butter Jelly), curly

Re: Does DataImportHandler do any sanitizing?

2012-08-15 Thread Michael Della Bitta
Hi, Jon, As far as I know, DataImportHandler doesn't transfer data to the rest of Solr via XML so it shouldn't be a problem... Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence

custom complex field - PolyField

2012-08-15 Thread Leonardo Souza
Hi, I have to index a tuple like ('blah', 'more blah info') in a multivalued field type. I have read about the PolyField type and it seems the best solution so far but i can't find documentation pointing how to use or implement a custom field. Any help is appreciated. -- Leonardo S Souza

solr.xml entries got deleted when powered off

2012-08-15 Thread vempap
Hello, I created an index = all the schema.xml solrconfig.xml files are created with content (I checked that they have contents in the xml files). But, if I poweroff the system restart again - the contents of the files are gone. It's like 0 bytes files. Even, the solr.xml file which got

Re: solr.xml entries got deleted when powered off

2012-08-15 Thread Leonardo Souza
Just guessing,. disk full? -- Abraços, Leonardo S Souza 2012/8/15 vempap phani.vemp...@emc.com Hello, I created an index = all the schema.xml solrconfig.xml files are created with content (I checked that they have contents in the xml files). But, if I poweroff the system restart

Re: solr.xml entries got deleted when powered off

2012-08-15 Thread vempap
nopes .. there is good amount of space left on disk -- View this message in context: http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001502.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr.xml entries got deleted when powered off

2012-08-15 Thread vempap
It's happening when I'm not doing a clean shutdown. Are there any more scenarios it might happen ? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001503.html Sent from the Solr - User mailing list archive at

RE: solr.xml entries got deleted when powered off

2012-08-15 Thread Buttler, David
You are not putting these files in /tmp are you? That is sometimes wiped by different OS's on shutdown -Original Message- From: vempap [mailto:phani.vemp...@emc.com] Sent: Wednesday, August 15, 2012 3:31 PM To: solr-user@lucene.apache.org Subject: Re: solr.xml entries got deleted when

RE: solr.xml entries got deleted when powered off

2012-08-15 Thread vempap
No, I'm not keeping them in /tmp -- View this message in context: http://lucene.472066.n3.nabble.com/solr-xml-entries-got-deleted-when-powered-off-tp4001496p4001506.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-15 Thread Chris Hostetter
: 2 Use external file fields (EFF) for the same purpose, that : won't require you to re-index the doc. The trick : here is you use the value in the EFF as a multiplier : for the score (that's what function queries do). So older : versions of the doc have scores of 0 and just

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Nicholas Ball
Haven't managed to find a good way to do this yet. Does anyone have any ideas on how I could implement this feature? Really need to move docs across from one core to another atomically. Many thanks, Nicholas On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball nicholas.b...@nodelay.com wrote:

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
在 2012-7-2 傍晚6:37,Nicholas Ball nicholas.b...@nodelay.com写道: That could work, but then how do you ensure commit is called on the two cores at the exact same time? that may needs something like two phrase commit in relational dB. lucene has prepareCommit, but to implement 2pc, many things need

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
do you really need this? distributed transaction is a difficult problem. in 2pc, every node could fail, including coordinator. something like leader election needed to make sure it works. you maybe try zookeeper. but if the transaction is not very very important like transfer money in bank, you

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
http://zookeeper.apache.org/doc/r3.3.6/recipes.html#sc_recipes_twoPhasedCommit On Thu, Aug 16, 2012 at 7:41 AM, Nicholas Ball nicholas.b...@nodelay.com wrote: Haven't managed to find a good way to do this yet. Does anyone have any ideas on how I could implement this feature? Really need to

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-15 Thread Tirthankar Chatterjee
Awesome thanks a lot, I am already on it with option 1. We need to track delete to flip the previous one as the current. Erick Erickson erickerick...@gmail.com wrote: No, sharding into multiple cores on the same machine still is limited by the physical memory available. It's still lots of stuf

Re: Does DataImportHandler do any sanitizing?

2012-08-15 Thread Lance Norskog
If you want to sanitize them during indexing, the regular expression tools can do this. You would create a regular expression that matches bogus elements. There is a regular expression transformer in the DIH, and a regular expression CharFilter inside the Lucene text analysis stack. On Wed, Aug