RE: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
doesn't work, the only real solution is "don't do that". You'll have to intercept the doc and omit that data, perhaps write a custom update processor to throw out huge fields or the like. Best, Erick On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM ARL

Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED I am trying to index from nutch 1.12 to SOLR 6.1.0. Got this error. java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/ARLInside: Exception writing document id https://emcstage.

RE: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
Lynch. He looked at me like I had three heads and >> didn’t even answer me. >> >> Ultraseek also has great support for sites that need login. If you >> use that, you’ll need to find a way to do that with another crawler. >> >> wunder >> Walter Und

SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED We are currently using ultraseek and looking to deprecate it in favor of solr/nutch. Ultraseek runs all the time and auto detects when pages have changed and automatically reindexes them. Is this possible with SOLR/nutch? Thanks, Kris ~~ Kri

RE: [Non-DoD Source] Re: config question (UNCLASSIFIED)

2016-07-28 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
://www.solr-start.com/ On 28 July 2016 at 23:22, Musshorn, Kris T CTR USARMY RDECOM ARL (US) wrote: > CLASSIFICATION: UNCLASSIFIED > > I am trying to integrate nutch 1.12 with solr 5.5.2. > > In the setup documents from here... > Caution-https://wiki.apache.org/nutch/NutchTutor

config question (UNCLASSIFIED)

2016-07-28 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED I am trying to integrate nutch 1.12 with solr 5.5.2. In the setup documents from here... https://wiki.apache.org/nutch/NutchTutorial it says to replace the schema.xml file in the core with the schema.xml from nutch. The install of solr does not have a schema.xml f

config question (UNCLASSIFIED)

2016-07-19 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED SOLR 6.1.0 + Nutch 2.3.1 works? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.

RE: [Non-DoD Source] Re: SimplePostTool error (UNCLASSIFIED)

2016-07-15 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
15, 2016 at 9:01 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) wrote: > CLASSIFICATION: UNCLASSIFIED > > How do I correct this error when running the simple post tool against a > website? > The tool successfully indexed for about 30 mins before throwing this error > and termi

SimplePostTool error (UNCLASSIFIED)

2016-07-15 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED How do I correct this error when running the simple post tool against a website? The tool successfully indexed for about 30 mins before throwing this error and terminating. [Fatal Error] :642:15: XML document structures must start and end within the same entity. Exc

Simple Post Tool result question (UNCLASSIFIED)

2016-07-14 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED POSTed web resource https://xx/inside/news/dispatches///view.cfm?id=9128 (depth: 4) What is the significance of the /// ? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research

RE: [Non-DoD Source] Re: SimplePost tool (UNCLASSIFIED)

2016-07-14 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
tool (UNCLASSIFIED) No, the tool itself is just a glorified Solr-savvy `curl`.Any deduplication would have to be in the Solr-side update handling (or before the post tool is invoked by selectively choosing what to send it). Erik > On Jul 14, 2016, at 12:36 PM, Musshorn, Kris T CTR

SimplePostTool details (UNCLASSIFIED)

2016-07-14 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED Does the simple post tool index the contents of a document (word,txt,xls,pdf etc) that it encounters? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving G

SimplePost tool (UNCLASSIFIED)

2016-07-14 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED Does the simple post tool accomplish deduplication? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch

POST options (UNCLASSIFIED)

2016-07-13 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED I am looking for documentation of every option you can set when indexing a core with POST.. Anyone seen such a document? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab

simple setup help (UNCLASSIFIED)

2016-07-05 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED Can someone walk a noob through setting up a dataimport handler? I need to index a coldfusion website Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor – Catapult Technology Inc. US Army Research Lab Aberdeen Proving Gro