Re: Field collapsing bad performances, schema redesign

2013-02-05 Thread Mickael Magniez
I don't see how join would help me I don't really have parent/child relationship, the only common field is product_id For example, for different sheos of same model, img and title is different for each model : [ {product_id:1, offer_id:1,title:Converse all star - red - 8, attribute_size:8,

Re: Updating data

2013-02-05 Thread Dikchant Sahi
If I understand it right, you want the json to only the new fields and not the field that has already been indexed/stored. Check out Solr Atomic updates. Below are some links which might help. http://wiki.apache.org/solr/Atomic_Updates http://yonik.com/solr/atomic-updates/ Remember, it requires

Re: Field collapsing bad performances, schema redesign

2013-02-05 Thread Mikhail Khludnev
It seems like product-offer relation is what usually represents like product-SKU or product-UPC, model-item, eg http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html for me that denormalized children can be indexed by the following block: {product_id:1, brand:Converse,

auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread Rohan Thakur
hi everyone is their any way in which we can auto trigger the delta import to update index in solr if their any update in sql database. thanks regards Rohan

Is Solr always trimming result objects ?

2013-02-05 Thread Marc Hermann
Hi everyone, i am new to SOLR and I have experienced a strange behavior: I have defined a field called name of type string. This field contains values with trailing spaces such as John. This is intended and the space needs to survive. The space is in the index as I can see when using Luke.

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread Alexandre Rafalovitch
If you have your deltaQuery setup in DIH, that should check for updates. Then you just ping DIH Url periodically to get it to check. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that

Re: Is Solr always trimming result objects ?

2013-02-05 Thread Otis Gospodnetic
Looks like a bug to me. Otis Solr ElasticSearch Support http://sematext.com/ On Feb 5, 2013 5:44 AM, Marc Hermann marc.herm...@isb-ag.de wrote: Hi everyone, i am new to SOLR and I have experienced a strange behavior: I have defined a field called name of type string. This field contains

Re: How to import this Json-line by DIH?

2013-02-05 Thread Sagar Joshi1304
Hello, Have you found any way to achieve this? I am also want to do the same. Thanks in Advance, Sagar -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544p4038535.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread Rohan Thakur
that is gud but it will not take care of the field that is dynamical changing like and want to do realtime update for that fieldits not possible to set cron to cal DIH every second... On Tue, Feb 5, 2013 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: If you have your deltaQuery

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread Alexandre Rafalovitch
If your field changes that often, you probably do not want to reindex Solr to match. You will loose all caches on every commit. Maybe you are looking at something like ExternalFileField but backed by a database? http://sujitpal.blogspot.ca/2011/05/custom-sorting-in-solr-using-external.html

AW: Is Solr always trimming result objects ?

2013-02-05 Thread Marc Hermann
Is there a bug entry for this ? Have you been able to reproduce this behaviour ? -Ursprüngliche Nachricht- Von: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Gesendet: Dienstag, 5. Februar 2013 12:52 An: solr-user@lucene.apache.org Betreff: Re: Is Solr always trimming result

Re: Multi-threaded post.jar?

2013-02-05 Thread Upayavira
Right, but what's the Windows equivalent? Not sure there is one. Upayavira On Tue, Feb 5, 2013, at 04:56 AM, Walter Underwood wrote: Easier than: solrpost.sh a*.xml a.log solrpost.sh b*.xml b.log solrpost.sh c*.xml c.log and so on? We have a fair selection of Solr servers where

Re: Really bad query performance for date range queries

2013-02-05 Thread Erick Erickson
On the surface, this doesn't seem right. Your corpus isn't very large and it sounds like you have some significant hardware supporting it. I'm guessing you have some other issues here like; - inappropriate memory allocation - too-frequent commits - too many queries for your hardware - indexing to

Upgrading Tika in place

2013-02-05 Thread Tod
I'm running an older version of Solr - 3.4.0.2011.09.09.09.06.17. It seems the version of Tika that came with it has trouble with some PDF files and newer Office documents. I've checked the latest Tika release and it solves these problems. I'd like to just drop in the necessary Tika jars

RE: Upgrading Tika in place

2013-02-05 Thread Markus Jelsma
Hi, You also need pdfbox-1.7.1 and possibly also fontbox and jempbox 1.7.1. Cheers, Markus -Original message- From:Tod listac...@gmail.com Sent: Tue 05-Feb-2013 13:59 To: solr-user@lucene.apache.org Subject: Upgrading Tika quot;in placequot; I'm running an older version of Solr

Re: Is Solr always trimming result objects ?

2013-02-05 Thread Upayavira
Try the same with plain HTTP, queries, and use wt=xml and wt=json, to see whether the space survives there (remember to view source). If there is a bug, it could be localised to the solrj case. Upayavira On Tue, Feb 5, 2013, at 10:44 AM, Marc Hermann wrote: Hi everyone, i am new to SOLR

Re: Boost by Nested Query / Join Needed?

2013-02-05 Thread jp
HiI have simialr need of boosing the specific records based on the user profile. We have master table which has details about warehouses and we have another table where user preferred warehouses exists. When the user searches for warehouses, we need to boost warehouses which are preferred to the

Re: SolrCloud: Does each node has to contain the same indexes?

2013-02-05 Thread Joey Dale
There is no reason that wont work. You may have to creating the collections using the cores api rather than the collections api, but it shouldn't be too bad. -Joey On 2/5/13 6:35 AM, Mathias Hodler wrote: Hi, can I set up a SolrCloud with 3 nodes (no sharding, only replicas) like in the

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread jp
You could use SQL service Broker External Activation service to monitor the changes and post the changes into the Solr Index using update request handler in soft commit mode --JP -- View this message in context:

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread Rohan Thakur
hi jp thanks can you provide me any good link for thisthanks regards Rohan On Tue, Feb 5, 2013 at 6:52 PM, jp gj...@ramco.com wrote: You could use SQL service Broker External Activation service to monitor the changes and post the changes into the Solr Index using update request handler

which analyzer is used for facet.query?

2013-02-05 Thread Kai Gülzau
Hi all, which analyzer is used for the facet.query? This is my schema.xml: fieldType name=uima_nouns_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.UIMATypeAwareAnnotationsTokenizerFactory descriptorPath=/uima/AggregateSentenceDEAE.xml

RE: Indexing nouns only with UIMA works - performance issue?

2013-02-05 Thread Kai Gülzau
So with https://issues.apache.org/jira/browse/LUCENE-4749 it's possible to set the ModelFile? tokenizer class=solr.UIMAAnnotationsTokenizerFactory descriptorPath=/uima/AggregateSentenceAE.xml tokenType=org.apache.uima.SentenceAnnotation ngramsize=2

Re: Indexing nouns only with UIMA works - performance issue?

2013-02-05 Thread Tommaso Teofili
right, that should be possible (if using trunk or branch_4x, which will be 4.2). Tommaso 2013/2/5 Kai Gülzau kguel...@novomind.com So with https://issues.apache.org/jira/browse/LUCENE-4749 it's possible to set the ModelFile? tokenizer class=solr.UIMAAnnotationsTokenizerFactory

Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Hi: I'm working on a search engine for several PDF documents, right now one of the requirements is that we can provide not only the documents matching the search criteria but the page that match the criteria. Normally tika only extracts the text content and does not do this distinction, but

Re: Indexing several parts of PDF file

2013-02-05 Thread Upayavira
This would involve you querying against every page in your document, which will be too many fields and will break quickly. The best way to do it is to index pages as documents. You can use field collapsing to group pages from the same document together. Upayavira On Tue, Feb 5, 2013, at 02:00

Re: Indexing several parts of PDF file

2013-02-05 Thread VIGNESH S
Yes.. I also think the same..Better Index each Page as Documents On Tue, Feb 5, 2013 at 7:35 PM, Upayavira u...@odoko.co.uk wrote: This would involve you querying against every page in your document, which will be too many fields and will break quickly. The best way to do it is to index pages

GitHub Code Search Outages

2013-02-05 Thread Arcadius Ahouansou
Anyone seen this article from GitHub? https://github.com/blog/1397-recent-code-search-outages Not to start any war here... I just feel like the community or LucidWorks should approach them. Thanks. Arcadius.

Re: Multi-threaded post.jar?

2013-02-05 Thread Jan Høydahl
Wiki page exists already: http://wiki.apache.org/solr/post.jar I'm happy to consider a refactoring, especially if it make it SIMPLER to read and interact with and doesn't add a ton of mandatory dependencies. It should probably still be possible to say something like javac

Re: Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Thanks for the advice the thing with this approach is that we are using nutch as our crawler for the intranet, and right now, doing this (indexing one crawled document as several solr documents) it's not possible without changing the way nutch works. Is there any other workaround this? Thanks

Re: Need to create SolrServer objects without checking server availability

2013-02-05 Thread Shawn Heisey
On 2/4/2013 3:33 PM, Michael Della Bitta wrote: Ah, OK, sorry to be terse! 1. Create a class that implements SolrServer from the SolrJ project: http://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html 2. Make the constructor of that class take as arguments the

Controlling traffic between solr 4.1 nodes

2013-02-05 Thread Michael Tracey
Hey all, new to Solr 4.x, and am wondering if there is any way that I could have a single collection (single or multiple shards) replicated into two datacenters, where only 1 solr instance in each datacenter communicate. (for example, 4 servers in one DC, 4 servers in another datacenter and

Re: Multi-threaded post.jar?

2013-02-05 Thread Upayavira
By dependencies, do you mean other java classes? I was thinking of splitting it out into a few classes, each of which is clearer in its purpose. Upayavira On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote: Wiki page exists already: http://wiki.apache.org/solr/post.jar I'm happy to consider

Re: Configuring the jetty shipped with Solr

2013-02-05 Thread Ali, Saqib
Thanks Alex. I was able to bind jetty to 127.0.0.1 so that it only accepts connections from localhost using the following: Set name=hostSystemProperty name=jetty.host default=127.0.0.1 //Set But how I do set it so that it can accept connections from certain non-localhost IP addresses as well?

Querying for ~2000 integers - better model?

2013-02-05 Thread Luis Lebolo
Hello! First time poster so {insert ignorance disclaimer here ;)}. I'm building a web application backed by an Oracle database and we're using Lucene Solr to index various lists of entities (via DIH). We then harness Solr's faceting to allow the user to filter through their searches. One aspect

Re: Querying for ~2000 integers - better model?

2013-02-05 Thread Mikhail Khludnev
Hello Luis, Your problem seems fairly obvious (hard to solve problem). Where these set of orange id come from? Does an user enter thousand of these ids into web-form? On Tue, Feb 5, 2013 at 8:49 PM, Luis Lebolo luis.leb...@gmail.com wrote: Hello! First time poster so {insert ignorance

Re: solr atomic update

2013-02-05 Thread Marcos Mendez
Any ideas on this? Begin forwarded message: From: Marcos Mendez mar...@jitisoft.com Subject: solr atomic update Date: January 31, 2013 7:09:23 AM EST To: solr-user@lucene.apache.org Is there a way to do an atomic update (inc by 1) and retrieve the value in one operation?

Re: solr atomic update

2013-02-05 Thread Walter Underwood
You cannot do that. Solr does document-level updates with batch commits. The value will be available after the batch commit completes. With Solr4 you can do a realtime get after the commit, but it is still two operations. wunder On Feb 5, 2013, at 9:09 AM, Marcos Mendez wrote: Any ideas on

Re: Adding replacement node to 4.x cluster with down node

2013-02-05 Thread Mike Schultz
Just to clarify, I want to be able to replace the down node with a host with a different name. If I were repairing that particular machine and replacing it, there would be no problem. But I don't have control over the name of my replacement machine. -- View this message in context:

Re: Querying for ~2000 integers - better model?

2013-02-05 Thread Luis Lebolo
Hi Mikhail, Thanks for the interest! The user selects various Oranges from the website. The list of Orange IDs then gets placed into a table in our database. For example, the user may want to search oranges from Florida (a state filter) planted a week ago (a data filter). We then display 600

Re: Querying for ~2000 integers - better model?

2013-02-05 Thread Jack Krupansky
Could you describe in more detail what the user queries (not the facet/filters) would actually look like. What are they actually looking for in terms of documents? In terms of modeling, the idea behind a query is that it identifies a set of documents which will then be scored for relevancy,

Re: Adding replacement node to 4.x cluster with down node

2013-02-05 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-4078 will be useful - that should make it in 4.3. Until then, you want to get that node out, and you need the new node to be assigned to the same shard. I guess I might try: Add your new node - explicitly tell it what shard to join by setting the

Re: Querying for ~2000 integers - better model?

2013-02-05 Thread Mikhail Khludnev
Bingo! The user doesn't choose 600 ids, it chooses two filters. Lucene/Solr way implies that a query is short, long queries always harm IRsystem efficiency. You should index date and state as oranges attributes and then join set of oranges to apples. There are two main approaches of joining in

Edismax and mm per field

2013-02-05 Thread Arcadius Ahouansou
Hello. Currently, edismax applies mm to the combination of all fields listed in qf. I would like to have mm applied individually to those fields instead. I have seen this asked before. For instance, the query: 1) defType=edismaxq=leo fostermm=2qf=title^5

Re: Really bad query performance for date range queries

2013-02-05 Thread sausarkar
We have a 96GB ram machine with 16 processors. the JVM is set to use 60 GB. The test that we are running are purely query there is no indexing going on. I dont see garbage collection when I attach visualVM but see frequent CPU spikes ~once every minute. -- View this message in context:

Correct way for getting SolrCore?

2013-02-05 Thread Ryan Josal
Hey guys, I am writing an UpdateRequestProcessorFactory plugin which needs to have some initialization code in the init method. I need to build some information about each SolrCore in memory so that when an update comes in for a particular SolrCore, I can use the data for the appropriate

Re: Correct way for getting SolrCore?

2013-02-05 Thread Mark Miller
The request should give you access to the core - the core to the core descriptor, the descriptor to the core container, which knows about all the cores. - Mark On Feb 5, 2013, at 4:09 PM, Ryan Josal rjo...@rim.com wrote: Hey guys, I am writing an UpdateRequestProcessorFactory plugin

ngrams or truncation for multilingual searching in Solr

2013-02-05 Thread Tom Burton-West
Hello all, We have a large number of languages which we currently index all in one index. The paper below uses ngrams as a substitute for language-specific stemming and got good results with a number of complex languages.Has anyone tried doing this with Solr? They also got fairly good

Re: Really bad query performance for date range queries

2013-02-05 Thread Shawn Heisey
On 2/5/2013 12:51 PM, sausarkar wrote: We have a 96GB ram machine with 16 processors. the JVM is set to use 60 GB. The test that we are running are purely query there is no indexing going on. I dont see garbage collection when I attach visualVM but see frequent CPU spikes ~once every minute. A

RE: Correct way for getting SolrCore?

2013-02-05 Thread Ryan Josal
Is there any way I can get the cores and do my initialization in the @Override public void init(final NamedList args) method? I could wait for the first request, but I imagine I'd have to deal with indexing requests piling up while I iterate over every document in every index. Ryan

RE: Really bad query performance for date range queries

2013-02-05 Thread Petersen, Robert
Hi Shawn, I've looked at the xing JVM before but don't use it. jHiccup looks like a really useful tool. Can you tell us how you are starting it up? Do you start it wrapping the app container (ie tomcat / jetty)? Thanks Robi -Original Message- From: Shawn Heisey

RE: Correct way for getting SolrCore?

2013-02-05 Thread Ryan Josal
By way of the deprecated SolrCore.getSolrCore method, SolrCore.getSolrCore().getCoreDescriptor().getCoreContainer().getCores() Solr starts up in an infinite recursive loop of loading cores. I understand now that the UpdateProcessorFactory is initialized as part of the core initialization, so

Re: Really bad query performance for date range queries

2013-02-05 Thread Shawn Heisey
On 2/5/2013 3:19 PM, Petersen, Robert wrote: Hi Shawn, I've looked at the xing JVM before but don't use it. jHiccup looks like a really useful tool. Can you tell us how you are starting it up? Do you start it wrapping the app container (ie tomcat / jetty)? Instead of just calling

Re: Configuring the jetty shipped with Solr

2013-02-05 Thread Arcadius Ahouansou
What you may want to do is open up Jetty and get a proper firewall in place to filter clients IP? Arcadius. On 5 February 2013 16:47, Ali, Saqib docbook@gmail.com wrote: Thanks Alex. I was able to bind jetty to 127.0.0.1 so that it only accepts connections from localhost using the

Re: Correct way for getting SolrCore?

2013-02-05 Thread Mark Miller
The SolrCoreAware interface? - Mark On Feb 5, 2013, at 5:42 PM, Ryan Josal rjo...@rim.com wrote: By way of the deprecated SolrCore.getSolrCore method, SolrCore.getSolrCore().getCoreDescriptor().getCoreContainer().getCores() Solr starts up in an infinite recursive loop of loading cores.

RE: Really bad query performance for date range queries

2013-02-05 Thread Petersen, Robert
Hi Shawn, I'm running solr in Tomcat on RHEL. It looks like what you're doing is making jHiccup wrap around the whole JVM by doing it that way, is that right? That's pretty cool if so. I'll see if I can set it up in my dev environment tomorrow. Thanks, Robi -Original Message- From:

Re: Multicore search with ManifoldCF security not working

2013-02-05 Thread Ahmet Arslan
Hello, Aha so you are using nabble. Please follow the instructions described here : http://manifoldcf.apache.org/en_US/mail.html And subscribe 'ManifoldCF User Mailing List' and send your question there. Ahmet --- On Mon, 1/28/13, eShard zim...@yahoo.com wrote: From: eShard zim...@yahoo.com

Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-05 Thread jp
The following link provides on using external activator for tracking DB changes http://ajitananthram.wordpress.com/2012/05/26/auditing-external-activator/ --JP -- View this message in context: