Looks like passing it in through the client is working. This is huge for
our system - it brought a 30s operation down to a couple hundred ms. The
only problem I have left to solve is the iterator parameters are all being
written to the log file with every query including the 17k base64 encoded
string. Is there a good way to keep that from happening to keep the log
files down to a reasonable size? Can I hook into the logging to truncate
that parameter, or some other way?

On Mon, May 3, 2021 at 2:16 PM Christopher <[email protected]> wrote:

> It really depends on your specific system. I think you could try
> passing it in via the client as you are now, but if you want to
> experiment storing it elsewhere (like in ZooKeeper or on HDFS, or on
> an external REST endpoint or whatever), you could modify your iterator
> to accept a String containing the location of the content, rather than
> the content itself. It's just an idea to experiment with, if your
> first attempt doesn't meet your needs.
>
> On Mon, May 3, 2021 at 8:27 AM Sanjay Deshmukh <[email protected]> wrote:
> >
> > Ok, that makes sense. This is a scan-time iterator. The data will be
> about 18kb in length. The data is a new row that was just inserted into the
> table, that's going to be used as input in a computation with a bunch of
> other rows. Are you saying it'd be better to read that row in the init
> method of the Iterator on the tablet servers, vs passing the data in from
> the client? What's the best way to do that?
> >
> > On Sun, May 2, 2021 at 10:32 PM Christopher <[email protected]> wrote:
> >>
> >> Iterator parameters are passed as strings, so you have to encode
> >> binary data if you need to send that. The limit should be reasonable,
> >> but there's no hard-coded limit. If you are storing options for an
> >> iterator configured on a table, one would expect it to be able to be
> >> small enough to be stored easily in a ZooKeeper node, possibly
> >> alongside other options and table configuration, so it shouldn't be
> >> big enough to make ZooKeeper have trouble. If you are passing the
> >> option as part of a scan-time iterator, it should fit in an RPC
> >> message over Thrift.
> >>
> >> I would keep them small (a few hundred characters or less), no more
> >> than a few thousand, if you really need to. If you need to pass large
> >> parameters, consider passing them indirectly, like passing the name of
> >> a file in HDFS that stores the binary data that the iterator reads
> >> when initialized.
> >>
> >> Your experience will vary depending on the configuration of your
> >> system's components and the hardware resources your machine has
> >> available.
> >>
> >> On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <[email protected]>
> wrote:
> >> >
> >> > Is there a limit to how long an Iterator parameter can be? And does
> it have to be a String, or is there a way to send arbitrary binary without
> encoding it in a string?
> >> >
> >> >
> >> > --
> >> > Sanjay Deshmukh
> >> > [email protected]
> >
> >
> >
> > --
> > Sanjay Deshmukh
> > [email protected]
>


-- 
Sanjay Deshmukh
[email protected]

Reply via email to