On Tue, 2009-09-08 at 15:47 +0100, rui.carne...@portugalmail.net wrote:
Now I am trying to find a way to know the mime part id of the parts
used on fts_build_mail. Is that already possible or I need to do that
by my own?
If you already get the MIME structure, then I guess you have struct
Hi again!
After sometime using my changes on this plugin I found one major
problem. When a message have two attachments with same name or one
with content-type equal to message/*, my solr schema design does not
work because as attachment unique identifier I used attachment's name
what is
Citando Timo Sirainen t...@iki.fi:
So valgrind didn't find anything wrong.
We should ignore LEAK SUMMARY?
What does gdb show as the backtrace?
My gdb is not writing where he should (or not writing at all). This shouldn't
be enough?
mail_executable = /usr/local/libexec/dovecot/gdbhelper
On May 26, 2009, at 5:46 AM, Rui Carneiro wrote:
Citando Timo Sirainen t...@iki.fi:
So valgrind didn't find anything wrong.
We should ignore LEAK SUMMARY?
At least for now. Memory leaks don't cause crashes.
What does gdb show as the backtrace?
My gdb is not writing where he should (or
Citando Timo Sirainen t...@iki.fi:
At least for now. Memory leaks don't cause crashes.
Ok.
gdb -p `pidof imap`
cont
make it crash
bt full
I think it won't be necessary. It is not crashing anymore. Maybe it was a bug
in my code.
Tomorrow (or in the next day) I will send you the code.
Citando Timo Sirainen t...@iki.fi:
I guess it works around some other bug then. If it's a memory-related
bug you could also see if valgrind complains something:
protocol imap {
..
mail_executable = /usr/bin/valgrind /usr/local/libexec/dovecot/imap
}
Here is the output (I cloned the
On Mon, 2009-05-25 at 14:20 +0100, Rui Carneiro wrote:
Citando Timo Sirainen t...@iki.fi:
I guess it works around some other bug then. If it's a memory-related
bug you could also see if valgrind complains something:
protocol imap {
..
mail_executable = /usr/bin/valgrind
Hi Timo,
I almost finish the changes on fts plugin. By now, it seems to work fine with
attachments (extracting and sending them to Solr). I only have a problem with
the max size of the command (cmd) that we can send to Solr:
#define SOLR_CMDBUF_SIZE (1024*64)
By now, if we send some message
On Fri, 2009-05-22 at 18:24 +0100, Rui Carneiro wrote:
Hi Timo,
I almost finish the changes on fts plugin. By now, it seems to work fine with
attachments (extracting and sending them to Solr). I only have a problem with
the max size of the command (cmd) that we can send to Solr:
#define
Citando Timo Sirainen t...@iki.fi:
The problem is something else. The Solr code simply tries to keep the
send buffer smaller than that, nothing would break if you sent a larger
buffer. Show gdb backtrace of the crash?
I said it was from the buff size because when I increased it Dovecot
On Fri, 2009-05-22 at 18:57 +0100, Rui Carneiro wrote:
Citando Timo Sirainen t...@iki.fi:
The problem is something else. The Solr code simply tries to keep the
send buffer smaller than that, nothing would break if you sent a larger
buffer. Show gdb backtrace of the crash?
I said it
Now, with attachment.
/* Copyright (c) 2006-2009 Dovecot authors, see the included COPYING file */
#include lib.h
#include buffer.h
#include base64.h
#include str.h
#include unichar.h
#include charset-utf8.h
#include quoted-printable.h
#include rfc822-parser.h
#include rfc2231-parser.h
#include
Citando Timo Sirainen t...@iki.fi:
All the data comes from lib-mail/message-decoder.c. Hmm. Looks like it
tries to force giving only valid UTF-8 output. I guess it should have
some flag or something that makes it do that only for text/* parts, not
for binary parts. OK, implemented, see if it
On Tue, 2009-05-19 at 14:40 +0100, Rui Carneiro wrote:
http://hg.dovecot.org/dovecot-1.2/rev/44548a7fb10d
It is working now but I needed to do some changes on your code.
OK.
Please see the attachment to checked any problem that may exist.
You forgot the attachment.
signature.asc
On Tue, May 19, 2009 at 8:51 PM, Timo Sirainen t...@iki.fi wrote:
You forgot the attachment.
Oh Sorry, I am not at the office now (almost 10pm here) I will send it
tomorrow morning.
Rui Carneiro
---
Portugalmail, Comunicações S.A.
www.portugalmail.net
Hi again,
I am having some troubles sending all data to a file. When I finish to send all
data to a file, I tried to open it and the file is corrupted.
The first think I noticed is that all chars are capitalized what destroy all
the file format.
Where are the chars capitalized?
Any other idea
On May 18, 2009, at 6:42 AM, Rui Carneiro wrote:
I am having some troubles sending all data to a file. When I finish
to send all data to a file, I tried to open it and the file is
corrupted.
The first think I noticed is that all chars are capitalized what
destroy all the file format.
Citando Timo Sirainen t...@iki.fi:
Nope. If you still see corruption, try with some simple test mails and
see if it's adding garbage, losing contents or adding more content.
I tried something more advanced than that. I hexdumped my pdf test file and on
the first line I get:
25 50 44
On Mon, 2009-05-18 at 17:35 +0100, Rui Carneiro wrote:
I think binary data is being corrupted anywhere before
fts_backend_build_more() and I don't have any idea where.
All the data comes from lib-mail/message-decoder.c. Hmm. Looks like it
tries to force giving only valid UTF-8 output. I guess
Citando Timo Sirainen t...@iki.fi:
1. You notice a non-text/* content-type and initialize text extraction
for the MIME part. Like:
struct attachment_extract_context *
attachment_extract_init(const char *content_type);
2. After this you feed all the input belonging to that MIME part to:
On Tue, 2009-05-05 at 12:08 +0100, Rui Carneiro wrote:
- fts_build_mail() indexes a single mail. It parses the messages and
returns the data in small blocks. For text/* and message/rfc822 parts
those blocks are currently sent to FTS backend. This is where I think
you should look into
Hi again,
On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen t...@iki.fi wrote:
- fts_build_mail() indexes a single mail. It parses the messages and
returns the data in small blocks. For text/* and message/rfc822 parts
those blocks are currently sent to FTS backend. This is where I think
you
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Wed, 22 Apr 2009, Rui Carneiro wrote:
I will talk with the developers of those applications about the possibility
of supporting stdin input (if not supported yet).
I think the API that fts plugin uses to do the conversion should be
generic
On Thu, Apr 23, 2009 at 5:47 AM, to...@tuxteam.de wrote:
Note that some formats might require to seek to some point in the file [1]
(typically the end), so reading from stdin is awkward (it would require
stdin to be seekable, so either the app or the caller would have to put
the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Thu, Apr 23, 2009 at 12:27:47PM +0100, rui.carne...@portugalmail.net wrote:
On Thu, Apr 23, 2009 at 5:47 AM, to...@tuxteam.de wrote:
Note that some formats might require to seek to some point in the file [1]
[...]
I hadn't thought on
Hi,
Almost full text search engines (C/C++) I looked (Swish-E, Wumpus, Lemur and
Xapian) do not use any kind of library or parser. Instead, they use other
applications like pdftotext, catdoc, catppt (etc) and call them with execvp
(or equivalent). Using this approach on my project have some pros
On Wed, 2009-04-22 at 15:51 +0100, Rui Carneiro wrote:
Hi,
Almost full text search engines (C/C++) I looked (Swish-E, Wumpus,
Lemur and Xapian) do not use any kind of library or parser. Instead,
they use other applications like pdftotext, catdoc, catppt (etc) and
call them with execvp (or
On Wed, Apr 22, 2009 at 5:38 PM, Timo Sirainen t...@iki.fi wrote:
Maybe those programs could be changed and just require the newer
versions?..
I will talk with the developers of those applications about the possibility
of supporting stdin input (if not supported yet).
I think the API that
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Wed, Apr 22, 2009 at 03:51:45PM +0100, Rui Carneiro wrote:
[...]
Cons:
- Some programs to parse special formats (p.e. catppt and pdftotext) do not
accept input from stdin (we need to create temporary files).
[from the peanut gallery here]
Hi again,
Anyone know some good libraries to handle the content of files like pdf,
ppt, doc, etc? I am already indexing attachments all I need now is extract
the text of them.
Regards,
Rui Carneiro
On Mon, Apr 20, 2009 at 3:29 PM, Rui Carneiro rui@gmail.com wrote:
Hi,
The problem was on
On Apr 21, 2009, at 6:25 AM, Rui Carneiro wrote:
Anyone know some good libraries to handle the content of files like
pdf,
ppt, doc, etc? I am already indexing attachments all I need now is
extract
the text of them.
I've no idea, but you could at least look at some of the other full
text
Great idea!
I will give news soon.
On Tue, Apr 21, 2009 at 5:32 PM, Timo Sirainen t...@iki.fi wrote:
I've no idea, but you could at least look at some of the other full text
search engines. I remember them advertising indexing support for all kinds
of formats. Maybe they're using some
Hi,
The problem was on the flag. My hexa to binary conversions was wrong.
Regards,
Rui Carneiro
On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro rui@gmail.com wrote:
Thank you for all tips. The design look more clear to me now.
I have one more question. I looked into
Thank you for all tips. The design look more clear to me now.
I have one more question. I looked into fts_build_want_index_part() and I
saw that I need to add some flags to message_part_flags, what values should
I choose? My first approach was to follow your schema and set
On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
I didn't understood yet what is the plugin's design and how the plugins are
called from the core system and I was wondering if anyone could help me with
that.
fts-storage.c hooks into all the functions in mail-storage API that it
needs to.
Hi all,
Currently I am developing some changes on the solr plugin. I want this
plugin indexing also the attachment's content. I have already started to
look on plugin's source but I am having some problems understanding how it
works.
I didn't understood yet what is the plugin's design and how
36 matches
Mail list logo