Re: Complication - can block joins help?

2014-01-26 Thread William Bell
Is there an example for using payloads for 4.6? Without any custom code for this? On Sun, Jan 26, 2014 at 10:30 PM, William Bell wrote: > OK, > > In order to do boosting, we often will create a dynamic field in SOLR. For > example: > > A Professional hire out for work, I want to boost those wh

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Varun Thacker
Hi Dileepa, If I understand correctly this is what happens in your system correctly : 1. DIH Sends data to Solr 2. You have written a custom update processor ( http://wiki.apache.org/solr/UpdateRequestProcessor) which the asks your Stanbol server for meta data, adds it to the document and then in

Complication - can block joins help?

2014-01-26 Thread William Bell
OK, In order to do boosting, we often will create a dynamic field in SOLR. For example: A Professional hire out for work, I want to boost those who do "woodworking". George Smith - builds chairs, and builds desks. He builds the most desks in the country (350 a year). And his closest competitor d

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody
Yes Ahmet. I want to use the last_index_time to find the documents imported in the last /dataimport process and send them through a update process. I have explained this requirement in my other thread. Thanks, Dileepa On Mon, Jan 27, 2014 at 3:23 AM, Ahmet Arslan wrote: > Hi, > > last_index_ti

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Dileepa Jayakody
Hi Ahmet, On Mon, Jan 27, 2014 at 3:26 AM, Ahmet Arslan wrote: > Hi, > > Here is what I understand from your Question. > > You have a custom update processor that runs with DIH. But it is slow. You > want to run that text enhancement component after DIH. How would this help > to speed up thing

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Ahmet Arslan
Hi, Here is what I understand from your Question. You have a custom update processor that runs with DIH. But it is slow. You want to run that text enhancement component after DIH. How would this help to speed up things? In this approach you will read/query/search already indexed and committed

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Ahmet Arslan
Hi, last_index_time traditionally is used to query Database. But it seems that you want to query solr, right? On Sunday, January 26, 2014 11:15 PM, Dileepa Jayakody wrote: Hi Ahmet, Thanks a lot. It means I can use the last_index_time to query documents indexed during the last dataimport r

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody
Hi Ahmet, Thanks a lot. It means I can use the last_index_time to query documents indexed during the last dataimport request? I need to run a subsequent update process to all documents imported from a dataimport. Thanks, Dileepa On Mon, Jan 27, 2014 at 1:33 AM, Ahmet Arslan wrote: > Hi Dileep

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Ahmet Arslan
Hi Dileepa, It is the time that the last dataimport process started. So it is safe to use it when considering updated documenets during the import. Ahmet On Sunday, January 26, 2014 9:10 PM, Dileepa Jayakody wrote: Hi All, Can I please know what timestamp in the dataimport process is reord

What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody
Hi All, Can I please know what timestamp in the dataimport process is reordered as the last_index_time in dataimport.properties? Is it the time that the last dataimport process started ? OR Is it the time that the last dataimport process finished? Thanks, Dileepa

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Dileepa Jayakody
Hi all, Any ideas on how to run a reindex update process for all the imported documents from a /dataimport query? Appreciate your help. Thanks, Dileepa On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody < dileepajayak...@gmail.com> wrote: > Hi All, > > I did some research on this and found so

Tie breakers when sorting equal items

2014-01-26 Thread Scott Smith
I promised to ask this on the forum just to confirm what I assume is true. Suppose you're returning results using a sort order based on some field (so, not relevancy). For example, suppose it's a date field which indicates when the document was loaded into the solr index. Suppose two items hav

Re: How to handle multiple sub second updates to same SOLR Document

2014-01-26 Thread Elisabeth Benoit
yutz Envoyé de mon iPhoneippj Le 26 janv. 2014 à 06:13, Shalin Shekhar Mangar a écrit : > There is no timestamp versioning as such in Solr but there is a new > document based versioning which will allow you to specify your own > (externally assigned) versions. > > See the "Document Centric Ve

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread simon
Erick's probably too modest to say so ;=) , but he wrote a great blog entry on indexing with SolrJ - http://searchhub.org/2012/02/14/indexing-with-solrj/ . I took the guts of the code in that blog and easily customized it to write a very fast indexer (content from MySQL, I excised all the Tika c

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread Erick Erickson
1> That's what I'd do. For incremental updates you might have to create a trigger on the main table and insert rows into another table that is then used to do the incremental updates. This is particularly relevant for deletes. Consider the case where you've ingested all your data then rows are dele

Re: Fwd: Search Engine Framework decision

2014-01-26 Thread Ahmet Arslan
Rashmi, As far as I know Nutch is a web crawler. I don't think it can crawl documents from Microsoft Share Point. ManifoldCF is a better fit in your case. Regarding versioning if you don't have previous setups, then use latest versions of each. Ahmet On Sunday, January 26, 2014 5:24 PM, ra

RE: Solr server requirements for 100+ million documents

2014-01-26 Thread Susheel Kumar
Thank you Erick for your valuable inputs. Yes, we have to re-index data again & again. I'll look into possibility of tuning db access. On SolrJ and automating the indexing (incremental as well as one time) I want to get your opinion on below two points. We will be indexing separate sets of ta

Fwd: Search Engine Framework decision

2014-01-26 Thread rashmi maheshwari
Hi, I want to creating a POC to search INTRANET along with documents uploaded on intranet. Documents(PDF, excel, word document, text files, images, videos) are also exists on SHAREPOINT. sharepoint has Authentication access at module level(folder level). My interanet website is http://myintranet/

Re: What is the "right" way to bring a failed SolrCloud node back online?

2014-01-26 Thread Nathan Neulinger
Thanks, yeah, I did just that - and sent the script in on SOLR-5665 if anyone wants a copy. Script is trivial, but you're welcome to stick it (trivial) in contrib or something if it's at all useful to anyone. -- Nathan On 01/26/2014 08:28 AM, Mark Miller wrote: We are working on a new mode (wh

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread Erick Erickson
Dumping the raw data would probably be a good idea. I guarantee you'll be re-indexing the data several times as you change the schema to accommodate different requirements... But it may also be worth spending some time figuring out why the DB access is slow. Sometimes one can tune that. If you go

Re: What is the "right" way to bring a failed SolrCloud node back online?

2014-01-26 Thread Mark Miller
We are working on a new mode (which should become the default) where ZooKeeper will be treated as the truth for a cluster. This mode will be able to handle situations like this - if the cluster state says a core should exist on a node and it doesn’t, it will be created on startup. The way thin