+-------[ Tim Nash ]---------------------- | > | > I would start with a list of requirements... | > | | The requirements are to run distributed map/reduce on 'live' xml data | that is stored by the zope application server.
And I want a Ferrari. That's about as much of a requirement as those. | >You'll be calling | >out to something else to do map/reduce and return you the results. | | Agreed, but what is the storage mechanism for the files used in that process? | If it is the hadoop file system then you can't use live data, you | would have to copy the files to the hadoop file sytem, correct? Well if you're ONLY storing them in hadoop via some mechanism then no. And if your data is large enough to warrant using hadoop you're never going to store them in Zope. | It still looks to me like a zope to virtual file system mapping would | be useful. Procfs is a virtual filesystem, devfs is a virtual filesystem. smb and nfs mounts are virtual filesystems that shadow actual filesystems, these would work out of the box with LocalFS. Until you can mount Hadoop in someway, it is not a filesystem, it's just an application with with an API. | Unfortunately it also looks like I am the only one who | wants it so I'm not going to post it to the gsoc mailing list. If you want a python library to interact with hadoop, write one, it's not hard to turn java into python. Then write a product that consists of; Top level object that acts as a container that talks to hadoop (contains all the logic to create files/directories etc). Sub-objects that represent directories inside hadoop (as a Folder inside Zope) Sub-objects that represent (XML) files inside hadoop Add a methods and ZPTs that perform your operations and display results. Then you can just navigate through your XML data. The "hard" part will be talking to hadoop from python; Although see here; http://www.stat.purdue.edu/~sguha/code.html#hadoopy Although it would probably a lot easier to use ctypes on the c lib and making a nicer interface using that. Once you can turn a URL (http://hadoop.example.com/tnash/xml/xml_001) into a hadoop "URI" (hadoop xml/001) you're pretty much done; You can use "popen" to run your map/reduce command from inside your "object" and to fetch the results to display inside Zope (probably fairly inefficient, but, it'd work). Or just get the job number and scrape the webserver... Oh but you wanted to store the files IN zope... so you can ignore all that. -- Andrew Milton [EMAIL PROTECTED] _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )