Hi, >> I'd like to get the pdf.pm plugin working to convert the text from >> within a PDF attachment into text to be scanned for bad links, etc. >> >> I've downloaded it from here: >> >> http://sa.zmi.at/ >> >> When I try to run, I receive: >> >> Apr 25 09:29:38.733 [15956] warn: Use of uninitialized value $name in >> lc at /etc/mail/spamassassin/pdf.pm line 54. >> Apr 25 09:29:38.774 [15956] warn: readline() on closed filehandle PDF >> at /etc/mail/spamassassin/pdf.pm line 105. >> >> It fails to detect the name of the attached PDF. Is someone able to >> take a look at how it works for me and use their perl skills to fix >> it? >> >> It may even be related to spamassassin itself, as it uses >> parse_content_type() to figure it out: >> >> my ( $ctype, $boundary, $charset, $name ) = >> Mail::SpamAssassin::Util::parse_content_type( >> $p->get_header('content-type') ); >> $ctype = lc($ctype); >> $name = lc($name); > > The mime headers for an attachment, typically look something like this: > > Content-Type: application/octet-stream; name="79421672.pdf" > Content-Transfer-Encoding: base64 > Content-Disposition: attachment; filename="79421672.pdf" > > with the filename duplicated. While this is very common, the "name" in > the Content-Type is superfluous and can be legitimately omitted. The > plugin doesn't allow for this possibility.
This is the sample that triggered the uninitialized value error: http://pastebin.com/eVCVWX8L Would you be willing to take a look for me? > If this is the cause, it might work well enough to allow you to > evaluate the plugin without putting any effort into fixing it until > you know it's worth it. Not all of the "uninitialized value" > warnings will be from pdf files anyway - it can happen on any > application/octet-stream without "name=". We're getting slammed with PDF spam that just has a link to a phishing site. Is it going to work well enough to extract a URL from a PDF that can be fed back into DecodeShortURLs, for example? Or at least process the text with bayes? Is anyone else seeing these? How are you dealing with them? Thanks, Alex
