> So the idea is to produce index pages and index sites and anything else
> required? IMHO if it's to be used for search indexes (e.g. using the
> Librarian plugin), it should support keyword indexing of the content of
> freesites.
>
I agree.
For that, we can use plugins to extract meta-data and keywords from documents 
(HTML, ODT, etc).
But won't keyword indexing cause index size troubles ? My solution would be to 
ignore documents too heavy (just index their names and meta-data) ?


> On Mon, May 08, 2006 at 04:10:00AM +0200, Jerome Flesch wrote:
> > Greetings,
> >
> > Please find below my second project proposal I've submit to Google.
> >
> >
> > Project proposal
> > ----------------
> >
> > The main goal of this project would be the creation of a freesites and
> > files spider.
> >
> >
> > 1) My proposition
> >
> > My idea would be to make two programs: The spider and a index
> > viewer.
> >
> > Spider would have to start indexing by a given Freenet URI, going from
> > links to links. After reaching a given recursion depth, spider will
> > restart from starting point, and update its index. It could be set to
> > publish on Freenet, at a given rate, obtained index (for example,
> > daily, every two days, etc). Spider could index files and freesites on
> > different criterias: We could, for example, use meta-data for
> > freesites, but for files like ogg, odt, mpg, etc, we could use in
> > addition their internal tags. It would require a set of filters, one
> > for each kind of file, but indexing would be more complete.
> >
> > Use of a specific index format and an index viewer may offer various
> > advantages: The main advantage is that once loaded, indexes can be
> > sorted and displayed in different ways. For example, we can imagine a
> > view in tree (for this one, we would need to care of links loops), or
> > more simply, in list. The index viewer would provide the possibility
> > to sort entries by a given meta-data fields order.
> >
> >
> > 2) Technical aspect
> >
> > For the spider, we have 2 possibilities: Make a plugin for the node,
> > or make a separated program. I think a separated program would be
> > better, because it would avoid to overload node.
> > To limit bandwidth use, user will have to specify a given time between
> > each request made by the spider.
> >
> > To allow spider to parse a maximum of file formats, to find specific
> > tags and new URI to explore, a plugin mechanism could be a good
> > solution.
> >
> > To avoid portability problems, I think making Spider and viewer in
> > Java would be better. Viewer would then use Swing for the GUI.
> >
> >
> > 3) Possible evolution
> >
> > One interesting evolution would be to reuse already existing indexes:
> > If one spider discovers an already existing index, it could make a
> > link in its index to this one, and try avoid to index sites already
> > indexed by this index.
> >
> > One other interesting feature would be to allow user to export index
> > in HTML and to upload it on Freenet. We can even imagine that spider
> > could do it automatically each time it upload a new version of its
> > index on Freenet.
> >
> >
> > Brief biography
> > ---------------
> >
> > I'm 20 years old french. I'm currently studying software engineering
> > at the UTBM, Universit? de Technologique de Belfort-Montb?liard
> > (French University). I've already obtained a two-years technical
> > degree (DUT) in Telecommunications and Networking.
> >
> > During my DUT final training period which was at IrES (Subatomic
> > Research Institute of Strasbourg, France), I had to work with various
> > Java technologies, like Struts, OJB, Tomcat, etc.
> >
> > Thanks to some university projects, I have already a good knowledge of
> > Swing graphical interfaces [2].
> >
> > Until now, my only participations to the Open Source movement was to
> > write some articles about GrSecurity patch and Prelude Intrusion
> > Detector. It's why, with the Google Summer of Code, I've seen a good
> > opportunity to integrate an Open Source project as Freenet.
> >
> > Until 1st July, I will have different exams and projects to return, so
> > my availability may vary, but I will try to do my best to keep time
> > for this project. After 1st July, I will be able to dedicate my whole
> > time to this project.
> >
> >
> > About this proposal
> > -------------------
> >
> > Even though this second proposal interests me, please consider that I
> > would prefer, if it's possible, work on my first proposal (file
> > upload and download utility).
> >
> >
> > Best regards,
> >
> > --
> > Jerome Flesch.
> >
> >
> > [1]
> > http://archives.freenetproject.org/message/20060504.164033.3c90cb65.en.ht
> >ml [2] https://jflesch.kwain.net/articles/90.php : One of my Java
> >   university project : A train / bus / subway / tramway network
> >   simulator.
> > _______________________________________________
> > Tech mailing list
> > Tech at freenetproject.org
> > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech

-- 
Jerome Flesch.

Reply via email to