Nils Kaiser wrote:
Hello!

Well, I am an happy cocoon user but I am working on a new project and not
sure if cocoon is the right choice for it. The goal of the thing is to be
able to extract information from existing web sites and to transform it and
publish it to different channels. I already have built a project with cocoon
and from the basic idea it covers pretty much with what coocoon is.

The bad thing is that we have some complex requirements for this project and
I am not sure how these can be solved using existing cocoon technology. Here
what we need to be able to do:

1) Logical decisions in the pipeline depending on pipeline content... Yes I
know this is not in the idea of the current cocoon pipeline. But seems to be
pretty important in this project. On a higher view it might be the necessity
to run transformers only if a certain condition refering to the pipeline
applies.

2) As we are working with a lot of information coming from different pages
and 'following' links to get this information, we need a more dynamical
approach to get content. Example:

User requests our URL
- GET content of page 1
- Page 1 has a link to page 2 (which is not known before)
- GET content of page 2
- Transform content of page 1 and page 2

We cant be sure how many links we have to follow and we have to do some
transformations after the request so this can't be a single component... The
problem I see is not accessing and transforming the content. It is the fact
of "restarting" the pipeline dynamically (depending on the content of page
1).

I am not sure how this is possible with cocoon, especially because I have
only little knowledge of flow. Is it the answer??

I also know that it is possible to adress those issues with some tricks: for
example using a special transformer followed by a CIncludeTransformer to
call the same pipeline using cocoon:// for implementing pipeline "restart",
or having a set of DOM transformers which make decisions and write things to
the session to simulate conditional behavior of the pipeline.

I am very open to innovative solutions like having an own pipeline
implementation or a DOM based framework which can adress content based
decisions better.

I am just afraid we might end with a kind of monster doing some ugly things
to force cocoon to do thing it is not supposed to ;)

What you're asking for is a long-time no-no of Cocoon which we call "dynamic pipelines". Currently there is no pipeline implementation that would allow this. That's the bad news. The good news is that the ice is breaking :-) I understand that this doesn't help you very much ATM, Cocoon gets massivly refactored which will make it much simpler to plugin your own processor implementations. I don't know in which timeframes you're thinking, but if you can wait some more weeks, you should get some solid ground to start from to contribute to a dynamic pipelines implementation (which maybe doesn't need to be XML based ...)

Having said this, if you have time and the needs to work on Cocoon itself and want to get involved in desicions, it is the *perfekt* point of time now for this as nothing is carved in stone :-)

Your second requirement sounds to be easier to be implemented based on the current codebase, but I'm not sure if I understand it completly. I assume that you were talking about a pipeline like this:

<map:match pattern="aggregator/**">
  <map:generate type="html" src="http://{1}"/>
  <map:transform src="filter-all-links.xslt"/>
  <map:transform type="cinclude"/>
  <map:serialize/>
</map:match>

The filter-all-links.xslt generates cinclude elements, that recursivly call the aggregator/** pipeline. You'd also have to implement the decision there if you want to follow a link or not. If this involved some complex desicions, you would have to implement some custom transformer that can make this decision as writting Java is for some usecases easier than writing XSLT - in particular if you have to call some kind of business logc. If you have to follow all links (= no decision), you will have to make sure that you don't create an infinite loop but I guess you have already thought of this ;-)

One final thought: I don't know where all those pages that you want to aggregate are located but you could run into some serious performance problems. It would be a good idea to think of caching right from the beginning.

--
Reinhard Pötz Independent Consultant, Trainer & (IT)-Coach
{Software Engineering, Open Source, Web Applications, Apache Cocoon}

                                       web(log): http://www.poetz.cc
--------------------------------------------------------------------

        

        
                
___________________________________________________________ Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to