If you are using Nutch 2.2.1, the crawled data is already stored in a Nosql database, e.g., Apache HBase. What you need to develop is a client code that reads data out of this database. You probably would need to understand how fields are stored. I recommend have a look in the mapping configuration file, e.g., conf/gora-hbase-mapping.xml.
On Jul 30, 2556 BE, at 4:28 PM, Weder Carlos Vieira wrote: > Hello everyone, > > I would like to say that Nutch 2.2.1 is working very well. > I spent the last few days testing this new version, I liked a lot, > congratulations. > > Now I would like to receive some tips of you, I want to create a new > website interface to read the urls crawled, parsed and saved on the > database and show its contents on the pages. > > My doubt is, what the best way that I can read this data? > What can I use for middle way, between database and my application, to > facilitate the selection of data and obtain the best results. > > > Could you share with me some stuff to read, some tips, some experience? how > can I design this structure? > > > Thanks.

