Re: Google Drive processing

Ethan Wilansky Mon, 27 Oct 2014 12:20:18 -0700

Hi Karl,
In simple history there are no indexing activity records showing 0. All of the 
content on this Google Drive endpoint are either small uploaded files (docx, 
pptx, pdf, txt) or Google Docs generated documents, spreadsheets and 
presentations.


With regard to opening a ticket, it might not be worth your while. Ultimately, 
our use case is that we will be leveraging an ES Output Connection for 
retrieving metadata and we will store the binaries on the file system. We don’t 
want to use the ES Attachment plug-in, which is why I thought we might be able 
to combine the ES Output Connection and a File System Connection in a job. I 
suppose another option would be to involve Tika, but I’m not clear on whether 
this will allow me to store the metadata in ES with a pointer to the binary in 
the file system.

Thanks,
Ethan


> 
> 
> 
> 
> 
> 
> 
> On Oct 27, 2014, at 2:27 PM, Karl Wright <[email protected]> wrote:
> 
> Hi Ethan,
> 
> This does not sound like it is related in any way to the google drive 
> connection, unless for some reason the google API is considering some of the 
> documents fetched to have only metadata and no content.  In this case, you'd 
> see size of zero in the simple history for indexing activity record.  Is that 
> what you see?
> 
> As for the filename issues -- file system output connection is supposed to 
> emulate WGET.  However, there are a number of known issues with this 
> connector, for example CONNECTORS-814, and I believe the handling of "&" is 
> one such issue.  I don't think these characters are allowed file names on 
> several operating systems.
> 
> Please open a ticket, and describe how you think it should behave (e.g. how 
> it should map &'s in urls to legal file name characters), and I'll try to 
> come up with a quick patch.
> 
> Karl
> 
> 
> On Mon, Oct 27, 2014 at 12:15 PM, Ethan Wilansky <[email protected] 
> <mailto:[email protected]>> wrote:
> I’ve run a job that uses a Google Drive Repository Connection and File System 
> Output Connection. My output is pointing to d:\temp\mf on the machine running 
> ManifoldCF. 
> 
> Upon running the job, job status shows:
> Error: Could not create file 
> 'd:\temp\mf\https\doc-0g-1c-docs.googleusercontent.com 
> <http://doc-0g-1c-docs.googleusercontent.com/>\docs\securesc\288dijb8     
> lhptipmnpc6n3dap4bdki35j\ek70aeovi25lp7aibkar61h90pi1i2c3\1414418400000\14058876669334088852\07105634325979498590\0B4rsPDZwaBMUZjI3VGpzZi10dUU?h=00194472260389282923&e=download&gd=true'
>  (The filename, directory name, or volume label syntax is incorrect)
> 
> This same report that the file name, label or syntax is incorrect is being 
> reported by the file system one more time. So, out of 12 files total, 10 are 
> processed. However, for the files that are reported as successfully 
> processed, none of the files appear in the file system. 
> 
> I think the file system path is unusual beyond what I’ve specified for the 
> job (d:\temp\mf). I’m seeing something like the following as the path 
> structure:
> D:\temp\mf\https\doc-0g-1c-docs.googleusercontent.com 
> <http://doc-0g-1c-docs.googleusercontent.com/>\docs\securesc\288dijb8lhptipmnpc6n3dap4bdki35j\ek3m4mhv978b7a2elgov6cm9nipbv36e\1414418400000\13058876669334088852\07105634445979498592
> 
> Document Status and Queue Status show nothing unusual. I’m running on 
> ManifoldCF release (v1.7.1)
> 
> Could this be an issue with the way I’m configuring the File System Output 
> Connection or is there something else I need to configure? I properly 
> configured the refresh token, client id and client secret in the Repository 
> Connection. 
> 
> I’ve attached the JSON for the Repository Connection (with client id, client 
> secret and refresh token values removed), my Output Connection and Job 
> Definition.
> 
> Thanks in advance for your feedback
> Ethan
> 
> 
> 
> 
> ,
> 
>

Re: Google Drive processing

Reply via email to