Here's the full code for this class: https://svn.apache.org/repos/asf/manifoldcf/trunk/connectors/email/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/email/EmailConnector.java
Karl On Tue, Feb 7, 2017 at 5:14 PM, Karl Wright <daddy...@gmail.com> wrote: > Hi Cihad, > > The variable attachmentIndex is *supposed* to be null except when an > attachment is being processed. The code should look like this: > > if (attachmentIndex == null) { > // It's an email > ... > } else { > // It's an attachment > attachmentNumber = attachmentIndex; > ... > } > > > Karl > > > On Tue, Feb 7, 2017 at 4:43 PM, Cihad Guzel <cguz...@gmail.com> wrote: > >> Hi Karl, >> >> I added LOG line for testing. It looks attachmentIndex is null. >> >> 2017-02-08 0:11 GMT+03:00 Karl Wright <daddy...@gmail.com>: >> >>> I attached a second patch (to apply on top of the first patch). Please >>> let me know if that fixes the issue. >>> >>> Karl >>> >>> >>> On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <cguz...@gmail.com> wrote: >>> >>>> Hi Karl, >>>> >>>> I have an error as follow: >>>> >>>> FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For >>>> input string: "myFolder/test:<CADNgPDgSXHeWo >>>> 0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>" >>>> java.lang.NumberFormatException: For input string: "myFolder/test:< >>>> cadngpdgsxhewo0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>" >>>> at java.lang.NumberFormatException.forInputString(NumberFormatE >>>> xception.java:65) >>>> at java.lang.Integer.parseInt(Integer.java:580) >>>> at java.lang.Integer.parseInt(Integer.java:615) >>>> at org.apache.manifoldcf.crawler.connectors.email.EmailConnecto >>>> r.processDocuments(EmailConnector.java:705) >>>> at org.apache.manifoldcf.crawler.system.WorkerThread.run(Worker >>>> Thread.java:399) >>>> >>>> >>>> 2017-02-07 22:50 GMT+03:00 Cihad Guzel <cguz...@gmail.com>: >>>> >>>>> Thanks Karl, >>>>> >>>>> I will try it. >>>>> >>>>> Regards >>>>> Cihad Guzel >>>>> >>>>> 2017-02-07 22:36 GMT+03:00 Karl Wright <daddy...@gmail.com>: >>>>> >>>>>> I've created a ticket and attached a patch to it. CONNECTORS-1375. >>>>>> Please let me know if it works for you; if not, I'll fix what doesn't >>>>>> work. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Correction: the only metadata attribute we set is the attachment(s) >>>>>>> mimetype (as a multivalued field) -- this doesn't currently include the >>>>>>> attachment data. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddy...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Cihad, >>>>>>>> >>>>>>>> The email connector is providing the attachment data unextracted to >>>>>>>> the output connector as metadata attribute data. There are no >>>>>>>> transformation connectors that look at this metadata. Solr cell also >>>>>>>> probably does not handle binary in random metadata attributes the >>>>>>>> proper >>>>>>>> way. >>>>>>>> >>>>>>>> The connector's attachment code therefore seems to be designed only >>>>>>>> to deal with textual attachments. The right solution is to have >>>>>>>> individual >>>>>>>> IDs for each attachment. But that would also require there to be a >>>>>>>> URL we >>>>>>>> could construct for each attachment. We could provide an additional >>>>>>>> URI >>>>>>>> template for attachments, but I'd wonder if your system has the >>>>>>>> ability to >>>>>>>> serve attachments by their own URLs. Please let me know if this would >>>>>>>> work >>>>>>>> and if so I can create a ticket and work on making these changes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I try the email connector with gmail. I attach the file [1] in my >>>>>>>>> new email. And sent to my test email adress. >>>>>>>>> >>>>>>>>> My mail content body is like: "this is test mail for mfc" >>>>>>>>> >>>>>>>>> Then I run my email job and the email is indexed to Solr >>>>>>>>> successfully. But, the solr's content field have not my attachment's >>>>>>>>> content body. Solr content filed looks like: >>>>>>>>> >>>>>>>>> "content":" \n \n \n \n \n \n \n \n \n \n >>>>>>>>> --94eb2c1910841bc55f0547f43443\r\nContent-Type: >>>>>>>>> multipart/alternative; boundary=94eb2c1910841bc553054 >>>>>>>>> 7f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: >>>>>>>>> text/plain; charset=UTF-8\r\n\r\nthis is test mail for >>>>>>>>> mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: >>>>>>>>> text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for >>>>>>>>> mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n-- >>>>>>>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf; >>>>>>>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment; >>>>>>>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding: >>>>>>>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY >>>>>>>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA >>>>>>>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J >>>>>>>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA >>>>>>>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM >>>>>>>>> ..." >>>>>>>>> >>>>>>>>> Does the MFC email connector know that the attachment's file type >>>>>>>>> is pdf? Does not extract the contents? >>>>>>>>> >>>>>>>>> [1] http://www.orimi.com/pdf-test.pdf >>>>>>>>> -- >>>>>>>>> Regards >>>>>>>>> Cihad Güzel >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Teşekkürler >>>>> Cihad Güzel >>>>> >>>> >>>> >>>> >>>> -- >>>> Teşekkürler >>>> Cihad Güzel >>>> >>> >>> >> >> >> -- >> Teşekkürler >> Cihad Güzel >> > >