> So the idea is to produce index pages and index sites and anything else > required? IMHO if it's to be used for search indexes (e.g. using the > Librarian plugin), it should support keyword indexing of the content of > freesites. > I agree. For that, we can use plugins to extract meta-data and keywords from documents (HTML, ODT, etc). But won't keyword indexing cause index size troubles ? My solution would be to ignore documents too heavy (just index their names and meta-data) ?
> On Mon, May 08, 2006 at 04:10:00AM +0200, Jerome Flesch wrote: > > Greetings, > > > > Please find below my second project proposal I've submit to Google. > > > > > > Project proposal > > ---------------- > > > > The main goal of this project would be the creation of a freesites and > > files spider. > > > > > > 1) My proposition > > > > My idea would be to make two programs: The spider and a index > > viewer. > > > > Spider would have to start indexing by a given Freenet URI, going from > > links to links. After reaching a given recursion depth, spider will > > restart from starting point, and update its index. It could be set to > > publish on Freenet, at a given rate, obtained index (for example, > > daily, every two days, etc). Spider could index files and freesites on > > different criterias: We could, for example, use meta-data for > > freesites, but for files like ogg, odt, mpg, etc, we could use in > > addition their internal tags. It would require a set of filters, one > > for each kind of file, but indexing would be more complete. > > > > Use of a specific index format and an index viewer may offer various > > advantages: The main advantage is that once loaded, indexes can be > > sorted and displayed in different ways. For example, we can imagine a > > view in tree (for this one, we would need to care of links loops), or > > more simply, in list. The index viewer would provide the possibility > > to sort entries by a given meta-data fields order. > > > > > > 2) Technical aspect > > > > For the spider, we have 2 possibilities: Make a plugin for the node, > > or make a separated program. I think a separated program would be > > better, because it would avoid to overload node. > > To limit bandwidth use, user will have to specify a given time between > > each request made by the spider. > > > > To allow spider to parse a maximum of file formats, to find specific > > tags and new URI to explore, a plugin mechanism could be a good > > solution. > > > > To avoid portability problems, I think making Spider and viewer in > > Java would be better. Viewer would then use Swing for the GUI. > > > > > > 3) Possible evolution > > > > One interesting evolution would be to reuse already existing indexes: > > If one spider discovers an already existing index, it could make a > > link in its index to this one, and try avoid to index sites already > > indexed by this index. > > > > One other interesting feature would be to allow user to export index > > in HTML and to upload it on Freenet. We can even imagine that spider > > could do it automatically each time it upload a new version of its > > index on Freenet. > > > > > > Brief biography > > --------------- > > > > I'm 20 years old french. I'm currently studying software engineering > > at the UTBM, Universit? de Technologique de Belfort-Montb?liard > > (French University). I've already obtained a two-years technical > > degree (DUT) in Telecommunications and Networking. > > > > During my DUT final training period which was at IrES (Subatomic > > Research Institute of Strasbourg, France), I had to work with various > > Java technologies, like Struts, OJB, Tomcat, etc. > > > > Thanks to some university projects, I have already a good knowledge of > > Swing graphical interfaces [2]. > > > > Until now, my only participations to the Open Source movement was to > > write some articles about GrSecurity patch and Prelude Intrusion > > Detector. It's why, with the Google Summer of Code, I've seen a good > > opportunity to integrate an Open Source project as Freenet. > > > > Until 1st July, I will have different exams and projects to return, so > > my availability may vary, but I will try to do my best to keep time > > for this project. After 1st July, I will be able to dedicate my whole > > time to this project. > > > > > > About this proposal > > ------------------- > > > > Even though this second proposal interests me, please consider that I > > would prefer, if it's possible, work on my first proposal (file > > upload and download utility). > > > > > > Best regards, > > > > -- > > Jerome Flesch. > > > > > > [1] > > http://archives.freenetproject.org/message/20060504.164033.3c90cb65.en.ht > >ml [2] https://jflesch.kwain.net/articles/90.php : One of my Java > > university project : A train / bus / subway / tramway network > > simulator. > > _______________________________________________ > > Tech mailing list > > Tech at freenetproject.org > > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech -- Jerome Flesch.
