Hi Leo, >From the times both the fetching and parsing took, I suspecting that maybe Nutch didn't actually fetch the URL, however this may not be the case as I have nothing to benchmark it on. Unfortuantely on the occasion the URL http://wiki.apache.org actually redirects to http://wiki.apache.org/general/so I'm going to post my log output from last URL you specified in an attempt to clear this one up. The following confirms that you are accurate with your observations that not only does this produce invalid segments but also nothing is fetched in the process.
Therefore the reason that we are getting the - skipping invalid segment message is that we are not actually fetching any content. My initial thoughts were that your urlfilters were not set properly and I think that this is part of the case. Please follow the syntax very carefully and it will work perfectly for you as follows regex-urlfilter.txt -------------------------- # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/[^/]+)/[^/]+\1/[^/]+\1/ # crawl URLs in the following domains. +^http://([a-z0-9]*\.)*seek.com.au/ # accept anything else #+. seed file ---------------------- http://www.seek.com.au It sounds really trivial but I think that the trailing '/' in in your seed file may have been making all of the difference. Please try, test with readdb and readseg and comment back. Sorry for the delayed posts on this one I have not had much time to get to it. Hope all goes to plan. Evidence can be seen below lewis@lewis-01:~/ASF/branch-1.4/runtime/local$ bin/nutch readdb crawldb -stats CrawlDb statistics start: crawldb Statistics for CrawlDb: crawldb TOTAL urls: 48 retry 0: 48 min score: 0.017 avg score: 0.041125 max score: 1.175 status 1 (db_unfetched): 47 status 2 (db_fetched): 1 CrawlDb statistics: done On Thu, Jul 21, 2011 at 3:30 AM, Leo Subscriptions <llsub...@zudiewiener.com > wrote: > Following are the suggested commands and the result as suggested > I left the redirect as 0 as 'crawl' works without any issues. The > problem only occurs when running the individual commands. > > ------- nutch-site.xml ------------------------------- > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > <name>http.agent.name</name> > <value>listers spider</value> > </property> > > <property> > <name>fetcher.verbose</name> > <value>true</value> > <description>If true, fetcher will log more verbosely.</description> > </property> > > <property> > <name>http.verbose</name> > <value>true</value> > <description>If true, HTTP will log more verbosely.</description> > </property> > > </configuration> > --------------------------------------------------------------- > > ------ Individual commands and results------------------------- > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed/urls > Injector: starting at 2011-07-21 12:24:52 > Injector: crawlDb: /home/llist/nutchData/crawl/crawldb > Injector: urlDir: /home/llist/nutchData/seed/urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-07-21 12:24:55, elapsed: 00:00:02 > > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > generate /home/llist/nutchData/crawl/crawldb > /home/llist/nutchData/crawl/segments -topN 100 > Generator: starting at 2011-07-21 12:25:16 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 100 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: Partitioning selected urls for politeness. > Generator: segment: /home/llist/nutchData/crawl/segments/20110721122519 > Generator: finished at 2011-07-21 12:25:20, elapsed: 00:00:03 > > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > fetch /home/llist/nutchData/crawl/segments/20110721122519 > Fetcher: Your 'http.agent.name' value should be listed first in > 'http.robots.agents' property. > Fetcher: starting at 2011-07-21 12:26:36 > Fetcher: segment: /home/llist/nutchData/crawl/segments/20110721122519 > Fetcher: threads: 10 > QueueFeeder finished: total 1 records + hit by time limit :0 > -finishing thread FetcherThread, activeThreads=1 > fetching http://wiki.apache.org/ > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=0 > Fetcher: finished at 2011-07-21 12:26:40, elapsed: 00:00:04 > > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > parse /home/llist/nutchData/crawl/segments/20110721122519 > ParseSegment: starting at 2011-07-21 12:27:22 > ParseSegment: > segment: /home/llist/nutchData/crawl/segments/20110721122519 > ParseSegment: finished at 2011-07-21 12:27:24, elapsed: 00:00:01 > > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > updatedb /home/llist/nutchData/crawl/crawldb > -dir /home/llist/nutchData/crawl/segments/20110721122519 > CrawlDb update: starting at 2011-07-21 12:28:03 > CrawlDb update: db: /home/llist/nutchData/crawl/crawldb > CrawlDb update: segments: > [file:/home/llist/nutchData/crawl/segments/20110721122519/parse_text, > file:/home/llist/nutchData/crawl/segments/20110721122519/content, > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_parse, > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_data, > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_fetch, > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_generate] > CrawlDb update: additions allowed: true > CrawlDb update: URL normalizing: false > CrawlDb update: URL filtering: false > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_text > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/content > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_parse > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_data > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_fetch > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_generate > CrawlDb update: Merging segment data into db. > CrawlDb update: finished at 2011-07-21 12:28:04, elapsed: 00:00:01 > > > ------------------------------------------------------------------------------------ > > > > On Wed, 2011-07-20 at 21:58 +0100, lewis john mcgibbney wrote: > > > There is no documentation for individual commands used to run a Nutch 1.3 > > crawl so I'm not sure where there has been a mislead. In the instance > that > > this was required I would direct newer users to the legacy documentation > for > > the time being. > > > > My comment to Leo was to understand whether he managed to correct the > > invalid segments problem. > > > > Leo, if this still persists may I ask you to try again, I will do the > same > > and will be happy to provide feedback > > > > May I suggest the following > > > > > > use the following commands > > > > inject > > generate > > fetch > > parse > > updatedb > > > > At this stage we should be able to ascertain if something is correct and > > hopefully debug. May I add the following... please make the following > > additions to nutch-site. > > > > fetcher verbose - true > > http verbose - true > > check for redirects and set accordingly > > > > > > On Wed, Jul 20, 2011 at 1:39 PM, Julien Nioche < > > lists.digitalpeb...@gmail.com> wrote: > > > > > The wiki can be edited and you are welcome to suggest improvements if > there > > > is something missing > > > > > > On 20 July 2011 13:31, Cam Bazz <camb...@gmail.com> wrote: > > > > > > > Hello, > > > > > > > > I think there is a mislead in the documentation, it does not tell us > > > > that we have to parse. > > > > > > > > On Wed, Jul 20, 2011 at 11:42 AM, Julien Nioche > > > > <lists.digitalpeb...@gmail.com> wrote: > > > > > Haven't you forgotten to call parse? > > > > > > > > > > On 19 July 2011 23:40, Leo Subscriptions <llsub...@zudiewiener.com > > > > > > wrote: > > > > > > > > > >> Hi Lewis, > > > > >> > > > > >> You are correct about the last post not showing any errors. I just > > > > >> wanted to show that I don't get any errors if I use 'crawl' and to > > > prove > > > > >> that I do not have any faults in the conf files or the > directories. > > > > >> > > > > >> I still get the errors if I use the individual commands inject, > > > > >> generate, fetch.... > > > > >> > > > > >> Cheers, > > > > >> > > > > >> Leo > > > > >> > > > > >> > > > > >> > > > > >> On Tue, 2011-07-19 at 22:09 +0100, lewis john mcgibbney wrote: > > > > >> > > > > >> > Hi Leo > > > > >> > > > > > >> > Did you resolve? > > > > >> > > > > > >> > Your second log data doesn't appear to show any errors however > the > > > > >> > problem you specify if one I have witnessed myself while ago. > Since > > > > >> > you posted have you been able to replicate... or resolve? > > > > >> > > > > > >> > > > > > >> > On Sun, Jul 17, 2011 at 1:03 AM, Leo Subscriptions > > > > >> > <llsub...@zudiewiener.com> wrote: > > > > >> > > > > > >> > I've used crawl to ensure config is correct and I don't > get > > > > >> > any errors, > > > > >> > so I must be doing something wrong with the individual > > > steps, > > > > >> > but can;t > > > > >> > see what. > > > > >> > > > > > >> > > > > > >> > > > > > > > > -------------------------------------------------------------------------------------------------------------------- > > > > >> > > > > > >> > llist@LeosLinux:~/nutchData > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > >> > > > > > >> > crawl /home/llist/nutchData/seed/urls > > > > >> > -dir /home/llist/nutchData/crawl > > > > >> > -depth 3 -topN 5 > > > > >> > solrUrl is not set, indexing will be skipped... > > > > >> > crawl started in: /home/llist/nutchData/crawl > > > > >> > rootUrlDir = /home/llist/nutchData/seed/urls > > > > >> > threads = 10 > > > > >> > depth = 3 > > > > >> > solrUrl=null > > > > >> > topN = 5 > > > > >> > Injector: starting at 2011-07-17 09:31:19 > > > > >> > > > > > >> > Injector: crawlDb: /home/llist/nutchData/crawl/crawldb > > > > >> > > > > > >> > > > > > >> > Injector: urlDir: /home/llist/nutchData/seed/urls > > > > >> > > > > > >> > Injector: Converting injected urls to crawl db entries. > > > > >> > Injector: Merging injected urls into crawl db. > > > > >> > > > > > >> > > > > > >> > Injector: finished at 2011-07-17 09:31:22, elapsed: > 00:00:02 > > > > >> > Generator: starting at 2011-07-17 09:31:22 > > > > >> > > > > > >> > Generator: Selecting best-scoring urls due for fetch. > > > > >> > Generator: filtering: true > > > > >> > Generator: normalizing: true > > > > >> > > > > > >> > > > > > >> > Generator: topN: 5 > > > > >> > > > > > >> > Generator: jobtracker is 'local', generating exactly one > > > > >> > partition. > > > > >> > Generator: Partitioning selected urls for politeness. > > > > >> > > > > > >> > > > > > >> > Generator: > > > > >> > segment: > /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > Generator: finished at 2011-07-17 09:31:26, elapsed: > > > 00:00:04 > > > > >> > > > > > >> > Fetcher: Your 'http.agent.name' value should be listed > > > first > > > > >> > in > > > > >> > 'http.robots.agents' property. > > > > >> > > > > > >> > > > > > >> > Fetcher: starting at 2011-07-17 09:31:26 > > > > >> > Fetcher: > > > > >> > segment: > /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > > > > > >> > Fetcher: threads: 10 > > > > >> > QueueFeeder finished: total 1 records + hit by time > limit :0 > > > > >> > fetching http://www.seek.com.au/ > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > > >> > -finishing thread FetcherThread, activeThreads=1 > > > > >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > > > >> > -finishing thread FetcherThread, activeThreads=0 > > > > >> > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > > > > >> > -activeThreads=0 > > > > >> > > > > > >> > > > > > >> > Fetcher: finished at 2011-07-17 09:31:29, elapsed: > 00:00:03 > > > > >> > ParseSegment: starting at 2011-07-17 09:31:29 > > > > >> > ParseSegment: > > > > >> > segment: > /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > ParseSegment: finished at 2011-07-17 09:31:32, elapsed: > > > > >> > 00:00:02 > > > > >> > CrawlDb update: starting at 2011-07-17 09:31:32 > > > > >> > > > > > >> > CrawlDb update: db: /home/llist/nutchData/crawl/crawldb > > > > >> > CrawlDb update: segments: > > > > >> > > > > > >> > > > > > >> > [/home/llist/nutchData/crawl/segments/20110717093124] > > > > >> > > > > > >> > CrawlDb update: additions allowed: true > > > > >> > > > > > >> > > > > > >> > CrawlDb update: URL normalizing: true > > > > >> > CrawlDb update: URL filtering: true > > > > >> > > > > > >> > CrawlDb update: Merging segment data into db. > > > > >> > > > > > >> > > > > > >> > CrawlDb update: finished at 2011-07-17 09:31:34, > elapsed: > > > > >> > 00:00:02 > > > > >> > : > > > > >> > : > > > > >> > : > > > > >> > : > > > > >> > > > > > >> > > > > > > > > ----------------------------------------------------------------------------------------------- > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Sat, 2011-07-16 at 12:14 +1000, Leo Subscriptions > wrote: > > > > >> > > > > > >> > > Done, but now get additional errors: > > > > >> > > > > > > >> > > ------------------- > > > > >> > > llist@LeosLinux:~/nutchData > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > updatedb /home/llist/nutchData/crawl/crawldb > > > > >> > > -dir > /home/llist/nutchData/crawl/segments/20110716105826 > > > > >> > > CrawlDb update: starting at 2011-07-16 11:03:56 > > > > >> > > CrawlDb update: db: > /home/llist/nutchData/crawl/crawldb > > > > >> > > CrawlDb update: segments: > > > > >> > > > > > > >> > > > > > >> > [file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch, > > > > >> > > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/content, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data, > > > > >> > > > > > > >> > > > > > >> > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text] > > > > >> > > CrawlDb update: additions allowed: true > > > > >> > > CrawlDb update: URL normalizing: false > > > > >> > > CrawlDb update: URL filtering: false > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/content > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > > > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text > > > > >> > > CrawlDb update: Merging segment data into db. > > > > >> > > CrawlDb update: finished at 2011-07-16 11:03:57, > elapsed: > > > > >> > 00:00:01 > > > > >> > > ------------------------------------------- > > > > >> > > > > > > >> > > On Sat, 2011-07-16 at 02:36 +0200, Markus Jelsma > wrote: > > > > >> > > > > > > >> > > > fetch, then parse. > > > > >> > > > > > > > >> > > > > I'm running nutch 1.3 on 64 bit Ubuntu, following > are > > > > >> > the commands and > > > > >> > > > > relevant output. > > > > >> > > > > > > > > >> > > > > ---------------------------------- > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > > > > >> > inject /home/llist/nutchData/crawl/crawldb > > > > >> /home/llist/nutchData/seed > > > > >> > > > > Injector: starting at 2011-07-15 18:32:10 > > > > >> > > > > Injector: crawlDb: > /home/llist/nutchData/crawl/crawldb > > > > >> > > > > Injector: urlDir: /home/llist/nutchData/seed > > > > >> > > > > Injector: Converting injected urls to crawl db > > > entries. > > > > >> > > > > Injector: Merging injected urls into crawl db. > > > > >> > > > > Injector: finished at 2011-07-15 18:32:13, > elapsed: > > > > >> > 00:00:02 > > > > >> > > > > ================= > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > generate /home/llist/nutchData/crawl/crawldb > > > > >> > > > > /home/llist/nutchData/crawl/segments Generator: > > > starting > > > > >> > at 2011-07-15 > > > > >> > > > > 18:32:41 > > > > >> > > > > Generator: Selecting best-scoring urls due for > fetch. > > > > >> > > > > Generator: filtering: true > > > > >> > > > > Generator: normalizing: true > > > > >> > > > > Generator: jobtracker is 'local', generating > exactly > > > one > > > > >> > partition. > > > > >> > > > > Generator: Partitioning selected urls for > politeness. > > > > >> > > > > Generator: > > > > >> > segment: > /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Generator: finished at 2011-07-15 18:32:45, > elapsed: > > > > >> > 00:00:03 > > > > >> > > > > ================== > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > > > > >> > fetch > /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Fetcher: Your 'http.agent.name' value should be > > > listed > > > > >> > first in > > > > >> > > > > 'http.robots.agents' property. > > > > >> > > > > Fetcher: starting at 2011-07-15 18:34:55 > > > > >> > > > > Fetcher: > > > > >> > segment: > /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Fetcher: threads: 10 > > > > >> > > > > QueueFeeder finished: total 1 records + hit by > time > > > > >> > limit :0 > > > > >> > > > > fetching http://www.seek.com.au/ > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=2 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=1 > > > > >> > > > > -activeThreads=1, spinWaiting=0, > > > fetchQueues.totalSize=0 > > > > >> > > > > -finishing thread FetcherThread, activeThreads=0 > > > > >> > > > > -activeThreads=0, spinWaiting=0, > > > fetchQueues.totalSize=0 > > > > >> > > > > -activeThreads=0 > > > > >> > > > > Fetcher: finished at 2011-07-15 18:34:59, elapsed: > > > > >> > 00:00:03 > > > > >> > > > > ================= > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > updatedb /home/llist/nutchData/crawl/crawldb > > > > >> > > > > -dir > > > /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > CrawlDb update: starting at 2011-07-15 18:36:00 > > > > >> > > > > CrawlDb update: db: > > > /home/llist/nutchData/crawl/crawldb > > > > >> > > > > CrawlDb update: segments: > > > > >> > > > > > > > > >> > > > > > >> > [file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch, > > > > >> > > > > > > > > >> > > > > > >> > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate, > > > > >> > > > > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/content] > > > > >> > > > > CrawlDb update: additions allowed: true > > > > >> > > > > CrawlDb update: URL normalizing: false > > > > >> > > > > CrawlDb update: URL filtering: false > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > >> > > > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/content > > > > >> > > > > CrawlDb update: Merging segment data into db. > > > > >> > > > > CrawlDb update: finished at 2011-07-15 18:36:01, > > > > >> > elapsed: 00:00:01 > > > > >> > > > > ----------------------------------- > > > > >> > > > > > > > > >> > > > > Appreciate any hints on what I'm missing. > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Lewis > > > > >> > > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > -- > > > > > * > > > > > *Open Source Solutions for Text Engineering > > > > > > > > > > http://digitalpebble.blogspot.com/ > > > > > http://www.digitalpebble.com > > > > > > > > > > > > > > > > > > > > > -- > > > * > > > *Open Source Solutions for Text Engineering > > > > > > http://digitalpebble.blogspot.com/ > > > http://www.digitalpebble.com > > > > > > > > > > > > -- *Lewis*