Sorry for the delay,
I would have answered today.
Greet Piergiorgio on my behalf
Matteo Grolla
Il giorno mer 20 mar 2019 alle ore 13:54 ashish kumar singh <
ashishrohitraj...@gmail.com> ha scritto:
> Thanks for your help ,I have talked to piergiorgio .
>
> On Wednesday,
Hi Karl,
what I'm saying is that his connector doesn't need them
and if this means that it can be implemented efficiently without modifying MCF
framework that would be great.
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 02/lug/2014
Hi,
I wrote a repository connector for crawling solrxml files
https://github.com/matteogrolla/mcf-filesystem-xml-connector
The work is based on the filesystem connector but I made several hopefully
interesting changes which could be applied elsewhere.
I have also a couple of questions
, maybe empty it at the end of every crawl.
How could I do that?
If I were to synthesize this mail in one sentence I'd say:
Given simple crawling requirements I'd like o be able to implement an MCF
solution that is performant and simple to manage
thanks
--
Matteo Grolla
Sourcesense - making sense
there or wouldn't be disruptive to
introduce.
Mail exchange doesn't make this kind of discussion easy
To make precise proposals I should probably give a detailed look at the
framework source code
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 01
Hi Alessandro,
ideally I think that text extraction from rich documents should be
Manifold responsibility, not Solr's
So the ideal place to implement it would be in the new document processing
pipeline (using Tika)
--
Matteo Grolla
Sourcesense - making sense of Open Source
http
doesn't remove what it didn't index
itself (not all crawlers behave this way)
So I made another test indexing doc D with manifold and everything
works as expected
hope this helps others
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il
)
at
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:945)
Why this constraint?
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
settings for the crawling
mode.
Supposing that my source repository gives me the list of deleted documents,
what should I do to handle the deletion?
Cheers
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
You perfectly described the situation.
If I could set of xml files where each set represents a snapshot of the source
system state then my crawler would fit manifold design much better.
I'll see if it's possible. For sure concurrency can be better exploited this
way.
--
Matteo Grolla
and see that both the
directory and the file have state processed
the document has been ingested so I think the ingest method caused the
status change
what method caused the state change for the directory?
--
Matteo Grolla
Sourcesense - making sense of Open Source
http
and there are no exceptions
this could be enough to put /toIndex/hd.xml in state processed
am I right?
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 13/giu/2014, alle ore 17:54, Karl Wright ha scritto:
HI Matteo,
What I'd
Really thanks again
I'm figuring out how it works.
By the way:
I bought ManifoldCF in Action
great documentation!!!
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 13/giu/2014, alle ore 18:29, Karl Wright ha scritto:
Hi Matteo
could embed in the identifier the command
Ex.
instead of stuffing the identifier hd-samsing-500GB
I could stuff add hd-samsung-500GB
The question is: Am I running into huge troubles trying to implement this
requirement or not?
--
Matteo Grolla
Sourcesense - making sense of Open
} in parallel
then delete{doc1}
then proceed in parallel till the next delete
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 13/giu/2014, alle ore 19:06, Karl Wright ha scritto:
One other point: if the reason that you would
15 matches
Mail list logo