Re: intro

2019-03-27 Thread Matteo Grolla
Sorry for the delay, I would have answered today. Greet Piergiorgio on my behalf Matteo Grolla Il giorno mer 20 mar 2019 alle ore 13:54 ashish kumar singh < ashishrohitraj...@gmail.com> ha scritto: > Thanks for your help ,I have talked to piergiorgio . > > On Wednesday,

Re: proposals for writing manifold connectors

2014-07-02 Thread Matteo Grolla
Hi Karl, what I'm saying is that his connector doesn't need them and if this means that it can be implemented efficiently without modifying MCF framework that would be great. -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il giorno 02/lug/2014

proposals for writing manifold connectors

2014-07-01 Thread Matteo Grolla
Hi, I wrote a repository connector for crawling solrxml files https://github.com/matteogrolla/mcf-filesystem-xml-connector The work is based on the filesystem connector but I made several hopefully interesting changes which could be applied elsewhere. I have also a couple of questions

Re: proposals for writing manifold connectors

2014-07-01 Thread Matteo Grolla
, maybe empty it at the end of every crawl. How could I do that? If I were to synthesize this mail in one sentence I'd say: Given simple crawling requirements I'd like o be able to implement an MCF solution that is performant and simple to manage thanks -- Matteo Grolla Sourcesense - making sense

Re: proposals for writing manifold connectors

2014-07-01 Thread Matteo Grolla
there or wouldn't be disruptive to introduce. Mail exchange doesn't make this kind of discussion easy To make precise proposals I should probably give a detailed look at the framework source code -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il giorno 01

Re: Solr Extracting request handler

2014-06-18 Thread Matteo Grolla
Hi Alessandro, ideally I think that text extraction from rich documents should be Manifold responsibility, not Solr's So the ideal place to implement it would be in the new document processing pipeline (using Tika) -- Matteo Grolla Sourcesense - making sense of Open Source http

Re: getDocumentVersions returning null

2014-06-17 Thread Matteo Grolla
doesn't remove what it didn't index itself (not all crawlers behave this way) So I made another test indexing doc D with manifold and everything works as expected hope this helps others -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il

document without binary

2014-06-16 Thread Matteo Grolla
) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:945) Why this constraint? -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com

getDocumentVersions returning null

2014-06-16 Thread Matteo Grolla
settings for the crawling mode. Supposing that my source repository gives me the list of deleted documents, what should I do to handle the deletion? Cheers -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com

Re: processing document addition and delete in order

2014-06-14 Thread Matteo Grolla
You perfectly described the situation. If I could set of xml files where each set represents a snapshot of the source system state then my crawler would fit manifold design much better. I'll see if it's possible. For sure concurrency can be better exploited this way. -- Matteo Grolla

questions emerged designing a connector to index solrxml documents

2014-06-13 Thread Matteo Grolla
and see that both the directory and the file have state processed the document has been ingested so I think the ingest method caused the status change what method caused the state change for the directory? -- Matteo Grolla Sourcesense - making sense of Open Source http

Re: questions emerged designing a connector to index solrxml documents

2014-06-13 Thread Matteo Grolla
and there are no exceptions this could be enough to put /toIndex/hd.xml in state processed am I right? -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il giorno 13/giu/2014, alle ore 17:54, Karl Wright ha scritto: HI Matteo, What I'd

Re: questions emerged designing a connector to index solrxml documents

2014-06-13 Thread Matteo Grolla
Really thanks again I'm figuring out how it works. By the way: I bought ManifoldCF in Action great documentation!!! -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il giorno 13/giu/2014, alle ore 18:29, Karl Wright ha scritto: Hi Matteo

processing document addition and delete in order

2014-06-13 Thread Matteo Grolla
could embed in the identifier the command Ex. instead of stuffing the identifier hd-samsing-500GB I could stuff add hd-samsung-500GB The question is: Am I running into huge troubles trying to implement this requirement or not? -- Matteo Grolla Sourcesense - making sense of Open

Re: processing document addition and delete in order

2014-06-13 Thread Matteo Grolla
} in parallel then delete{doc1} then proceed in parallel till the next delete -- Matteo Grolla Sourcesense - making sense of Open Source http://www.sourcesense.com Il giorno 13/giu/2014, alle ore 19:06, Karl Wright ha scritto: One other point: if the reason that you would