Thanks Karl,

I'm guessing (1).
Seeing logs, at first 401 response returned then passed auth and tried to fetch 
document.
After that, below message is shown.
I'll try to debug isDataIngestable().

Thank you,
Shinichiro Abe

On 2013/10/08, at 15:42, Karl Wright <[email protected]> wrote:

> Hi Abe-san,
> 
> This means that the method:
> 
>   protected boolean isDataIngestable(IFingerprintActivity activities, String 
> documentIdentifier, DocumentURLFilter filter)
>     throws ServiceInterruption, ManifoldCFException
> 
> ... has returned false.  If you look at the method, you will see that it 
> checks the following:
> 
> (1) That the response code is 200 (could be this if the login is failing, 
> because then 401 is returned).
> (2) That the output connector will accept documents of that length.
> (3) Whether the output connector will accept the URL as given.
> (4) Whether the url matches the urls allowed for indexing in the web job.
> (5) If there is no mime type at all (which I think is not correct; it should 
> probably ask the output connector even in this case).
> (6) Whether the output connector will accept the document's mime type.
> 
> In most cases, the method logs its decision, so you may see additional output 
> that could clarify why the document is being excluded.  If no additional 
> message is being output, then it is either case (1) or (5).  You would have 
> to add logging code to figure out which one it is.
> 
> Thanks,
> Karl
> 
> 
> 
> On Tue, Oct 8, 2013 at 2:24 AM, Shinichiro Abe <[email protected]> 
> wrote:
> Hi,
> I'm sure that Web Connector supports Basic Authentication and can crawl http 
> sites,
> but I'm not sure about the case that https SSL site with https basic 
> Authentication.
> I can register basic auth with https:// regex, user and password, but 
> crawling failed.
> Does this support https basic Authentication?
> Now I watch the log, the logs shows "WEB: Decided not to ingest
> 'https://server.com/url/' because it did not match ingestability criteria".
> What does it mean about this message?
> 
> Regards,
> Shinichiro Abe
> 

Reply via email to