Thanks, parse/writeDelimited() is exactly what I needed. I see that the
LimitedStream underneath restricts reading from the original stream,
so we do not need to re-use stream
On Fri, Feb 19, 2010 at 11:53 AM, Kenton Varda wrote:
> If the underlying stream does not provide its own boundaries
If the underlying stream does not provide its own boundaries then you need
to prefix the protocol message with a size. Hacking an end-of-record
"feature" into the protobuf code is probably not a good idea. We already
provide parseDelimitedFrom()/writeDelemitedTo() which prefix the message
with a
for your last comment, yes, the end-of-record indicator was another hack I
put in.
but both your options above ultimately require the underlying stream to
provide exact record boundaries.
in the last email I pointed out that this may or may not be a valid
requirement for the underlying inputstream
Two options:
1) Do not use parseFrom(InputStream). Use parseFrom(byte[]). Read the byte
array from the stream yourself, so you can make sure to read only the
correct number of bytes.
2) Create a FilteredInputStream subclass which limits reading to some number
of bytes. Wrap your InputStream in
I found the issue, this has the same root cause as a previous issue I
reported on this forum.
basically I think PB assumes that it stops only where the provided stream
has ended, otherwise it keeps on reading.
in the last issue the buffer was too long and it read in further junk, so I
put and End-
Is this a case of needing to delimit the input? I'm not familiar with
SplitterInputStream, but I'm wondering if it does the right thing for this
to work.
--Chris
On Thu, Feb 18, 2010 at 12:56 PM, Kenton Varda wrote:
> Please reply-all so the mailing list stays CC'd. I don't know anything
> abo
-- Forwarded message --
From: Yang
Date: Thu, Feb 18, 2010 at 12:47 PM
Subject: Re: [protobuf] ProtocolBuffer + compression in hadoop?
To: Kenton Varda
btw, I used other inputFileFormats, and they worked, (TFile and RCfile
specifically) only Sequence file had issues
On Thu
Please reply-all so the mailing list stays CC'd. I don't know anything
about the libraries you are using so I can't really help you further. Maybe
someone else can.
On Thu, Feb 18, 2010 at 12:46 PM, Yang wrote:
> thanks Kenton,
>
> I thought about the same,
> what I did was that I use a splitt
You should verify that the bytes that come out of the InputStream really are
the exact same bytes that were written by the serializer to the OutputStream
originally. You could do this by computing a checksum at both ends and
printing it, then inspecting visually. You'll probably find that the byt
I tried to use protocol buffer in hadoop,
so far it works fine with SequenceFile, after I hook it up with a simple
wrapper,
but after I put in a compressor in sequenceFile, it fails, because it read
all the messages and yet still wants to advance the read pointer, and
then readTag() returns 0, so
10 matches
Mail list logo