CLASSIFICATION: UNCLASSIFIED Damn email policy. You get it.
Thanks, Kris ~~~~~~~~~~~~~~~~~~~~~~~~~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.musshorn....@mail.mil ~~~~~~~~~~~~~~~~~~~~~~~~~~ -----Original Message----- From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) [mailto:kris.t.musshorn....@mail.mil] Sent: Thursday, July 21, 2016 8:02 AM To: user@nutch.apache.org Subject: RE: [Non-DoD Source] tutorial work thru (UNCLASSIFIED) All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser. ---- CLASSIFICATION: UNCLASSIFIED Clarification... Seed.txt contains Caution-https://the.website.mil/inside/ not Caution-Caution-https://the.website.mil/inside/ Thanks, Kris ~~~~~~~~~~~~~~~~~~~~~~~~~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.musshorn....@mail.mil ~~~~~~~~~~~~~~~~~~~~~~~~~~ -----Original Message----- From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) [Caution-mailto:kris.t.musshorn....@mail.mil] Sent: Thursday, July 21, 2016 7:38 AM To: user@nutch.apache.org Subject: [Non-DoD Source] tutorial work thru (UNCLASSIFIED) All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser. ---- CLASSIFICATION: UNCLASSIFIED Working thru the tutorial for v1 of nutch. urls/seed.txt contains Caution-Caution-https://the.website.mil/inside/ regex-urlfilter.txt contains edits... # accept anything else #+. # limit to the.website.mil +^Caution-Caution-https://([a-z0-9]*\.)the.website.mil/inside Yet nothing gets populated in the crawl db... bin/nutch inject crawl/crawldb urls Injector: starting at 2016-07-21 07:32:02 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Total number of urls rejected by filters: 1 Injector: Total number of urls after normalization: 0 Injector: Merging injected urls into crawl db. Injector: overwrite: false Injector: update: false Injector: URLs merged: 0 Injector: Total new urls injected: 0 Thanks, Kris ~~~~~~~~~~~~~~~~~~~~~~~~~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.musshorn....@mail.mil ~~~~~~~~~~~~~~~~~~~~~~~~~~ CLASSIFICATION: UNCLASSIFIED CLASSIFICATION: UNCLASSIFIED CLASSIFICATION: UNCLASSIFIED