Looks like passing it in through the client is working. This is huge for our system - it brought a 30s operation down to a couple hundred ms. The only problem I have left to solve is the iterator parameters are all being written to the log file with every query including the 17k base64 encoded string. Is there a good way to keep that from happening to keep the log files down to a reasonable size? Can I hook into the logging to truncate that parameter, or some other way?
On Mon, May 3, 2021 at 2:16 PM Christopher <[email protected]> wrote: > It really depends on your specific system. I think you could try > passing it in via the client as you are now, but if you want to > experiment storing it elsewhere (like in ZooKeeper or on HDFS, or on > an external REST endpoint or whatever), you could modify your iterator > to accept a String containing the location of the content, rather than > the content itself. It's just an idea to experiment with, if your > first attempt doesn't meet your needs. > > On Mon, May 3, 2021 at 8:27 AM Sanjay Deshmukh <[email protected]> wrote: > > > > Ok, that makes sense. This is a scan-time iterator. The data will be > about 18kb in length. The data is a new row that was just inserted into the > table, that's going to be used as input in a computation with a bunch of > other rows. Are you saying it'd be better to read that row in the init > method of the Iterator on the tablet servers, vs passing the data in from > the client? What's the best way to do that? > > > > On Sun, May 2, 2021 at 10:32 PM Christopher <[email protected]> wrote: > >> > >> Iterator parameters are passed as strings, so you have to encode > >> binary data if you need to send that. The limit should be reasonable, > >> but there's no hard-coded limit. If you are storing options for an > >> iterator configured on a table, one would expect it to be able to be > >> small enough to be stored easily in a ZooKeeper node, possibly > >> alongside other options and table configuration, so it shouldn't be > >> big enough to make ZooKeeper have trouble. If you are passing the > >> option as part of a scan-time iterator, it should fit in an RPC > >> message over Thrift. > >> > >> I would keep them small (a few hundred characters or less), no more > >> than a few thousand, if you really need to. If you need to pass large > >> parameters, consider passing them indirectly, like passing the name of > >> a file in HDFS that stores the binary data that the iterator reads > >> when initialized. > >> > >> Your experience will vary depending on the configuration of your > >> system's components and the hardware resources your machine has > >> available. > >> > >> On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <[email protected]> > wrote: > >> > > >> > Is there a limit to how long an Iterator parameter can be? And does > it have to be a String, or is there a way to send arbitrary binary without > encoding it in a string? > >> > > >> > > >> > -- > >> > Sanjay Deshmukh > >> > [email protected] > > > > > > > > -- > > Sanjay Deshmukh > > [email protected] > -- Sanjay Deshmukh [email protected]
