Re: Revisiting: Should Manifold include Pipelines

2012-01-12 Thread Karl Wright
Hi Mark, > > I'm not sure if this question is revisiting the motivation for preferring > this in MCF, or a technical question about how to package metadata for > different engines that might want it in a different format. > I'm looking not so much for justification, but for enough context as t

Re: Revisiting: Should Manifold include Pipelines

2012-01-11 Thread Mark Bennett
Hi Karl, On Wed, Jan 11, 2012 at 4:21 AM, Karl Wright wrote: > Hi Mark, > > I think I'd describe this simplified proposal as "pipeline" (vs. > "Pipeline". Your original description was the latter.) This proposal > is simpler but does not have the ability to amalgamate content from > multiple c

Re: Revisiting: Should Manifold include Pipelines

2012-01-11 Thread Karl Wright
Hi Mark, I think I'd describe this simplified proposal as "pipeline" (vs. "Pipeline". Your original description was the latter.) This proposal is simpler but does not have the ability to amalgamate content from multiple connectors, correct? As long as it is just modifying the content and metada

Re: Revisiting: Should Manifold include Pipelines

2012-01-11 Thread Mark Bennett
Hi Karl, Still pondering our last discussion. Wondering if I got things off track. As a start, what if I backtracked a bit, to this: What's the easiest way to do this: * A connector that tweaks metadata form a single source. * Sits between any existing MCF datasource connector and the main MCF

Re: Revisiting: Should Manifold include Pipelines

2012-01-10 Thread Mark Bennett
Hi Karl, I wanted to acknowledge and thank you for your 2 emails. I need to think a bit. I *do* have answers to some of your concerns, and I hopefully reasonable sounding ones at that. Also, maybe I should take another look at Nutch - BUT Manifold's Web UI is so much further along, and more inl

Re: Revisiting: Should Manifold include Pipelines

2012-01-10 Thread Karl Wright
As an exercise in understanding, it might be helpful to consider how exactly a document specification in today's ManifoldCF would morph if you wanted a connection to be a pipeline component rather than what it is today. Right now, the document specification for a job is an XML doc of a form that o

Re: Revisiting: Should Manifold include Pipelines

2012-01-09 Thread Karl Wright
Hi Mark, Please see below. On Mon, Jan 9, 2012 at 9:53 PM, Mark Bennett wrote: > Hi Karl, > > Thanks for the reply, most comments inline. > > General comments: > > I was wondering if you've used a custom pipeline like FAST ESP or > Ultraseek's old "patches.py", and if there were any that you lik

Re: Revisiting: Should Manifold include Pipelines

2012-01-09 Thread Mark Bennett
Hi Karl, Thanks for the reply, most comments inline. General comments: I was wondering if you've used a custom pipeline like FAST ESP or Ultraseek's old "patches.py", and if there were any that you liked or disliked? In more recent times the OpenPipeline effort has been a bit nascent, I think i

Re: Revisiting: Should Manifold include Pipelines

2012-01-09 Thread Karl Wright
Hi Mark, I have some initial impressions; please read below. On Mon, Jan 9, 2012 at 9:29 AM, Mark Bennett wrote: > We've been hoping to do some work this year to embed pipeline processing > into MCF, such as UIMA or OpenPipeline or XPump. > > But reading through some recent posts there was a dis

Revisiting: Should Manifold include Pipelines

2012-01-09 Thread Mark Bennett
We've been hoping to do some work this year to embed pipeline processing into MCF, such as UIMA or OpenPipeline or XPump. But reading through some recent posts there was a discussion about leaving this sort of thing to the Solr pipeline, and it suddenly dawned on me that maybe not everybody was on