Hi Steffen I think I understood your description correctly from the beginning. However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils. I assume, that the agent does not return the rows in lexicographic order. That would have the same effect as if a row is dynamically appearing during retrieval.
I do not want to exclude an off-by-one error in TableUtils but all unit tests I run so far do not indicate that. What agent are you using? Nevertheless, the new version will not show the issue you observed with the mode denseTableDoubleCheckIncompleteRows Best regards Frank > Am 19.07.2018 um 17:20 schrieb Steffen Brüntjen <steffen.bruent...@macmon.eu>: > > Hi Frank > > > I'm not sure whether we're talking about the same thing. The problem I > described is *not* a timinig problem with rows being added to or removed from > the table while retrieving rows. The table I am querying doesn't change at > all and the problem is highly reproducible. Let's see the example again: > > > This is how the List<TableEvent> result should look like and how it actually > does - always - when the max-bindings is set to 1 or 32 or some other value. > > [ ... 75 normal rows ... ] > [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = > service] > [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = > reception] > [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = > voice] > [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = > clients] > [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = > VLAN601] > [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = > lab6] > [ ... everything normal ... ] > > > When setting the max-bindings to 4 (I'm requesting 7 columns), I - always - > get these TableEvents: > > [ ... 75 normal rows ... ] > [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = > service] > [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = > reception] > [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, > 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice] > [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, > 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients] > [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, > 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601] > [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, > 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6] > [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, > 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, > null] > [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, > 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, > null] > [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, > 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, > null] > [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, > 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, > null] > [ ... everything normal ... ] > > > The returned List<TableEvent> contains 4 more results, because 4 table rows > are split into two TableEvents. We can see that these indexes seem to have > two rows: > index=283 > index=373 > index=774 > index=783 > > > It's like this table > > > IDX | A | B | C | D > ----+-----+-----+-----+----- > 0 | 1 | 2 | 3 | 4 > 1 | 5 | 6 | 7 | 8 > 2 | 9 | 10 | 11 | 12 > 3 | 13 | 14 | 15 | 16 > > > becomes something like this when obtained by TableUtils: > > IDX | A | B | C | D > ----+-----+-----+-----+----- > 0 | 1 | 2 | 3 | 4 > 1 | null| null| 7 | 8 <-- index=1 > 2 | null| null| 11 | 12 <-- index=2 > 1 | 5 | 6 | null| null <-- index=1 > 2 | 9 | 10 | null| null <-- index=2 > 3 | 13 | 14 | 15 | 16 > > > I tried to describe the reason for this, but it's a bit complicated I admit. > Of course it's also possible that I didn't understand your answer correctly. > Sorry for the confusion in that case. Then I'd be willing to grasp how sparse > and dense tables are the reason for this problem. > > Thanks for the clarification on tooBig errors with GETBULK requests! > > > Best regards > Steffen Brüntjen > > > > -----Original Message----- > From: Frank Fock [mailto:f...@agentpp.com] > Sent: Donnerstag, 12. Juli 2018 08:41 > To: Steffen Brüntjen <steffen.bruent...@macmon.eu> > Cc: snmp4j@agentpp.org > Subject: Re: [SNMP4J] max-bindings with big tables > > Hi Steffen, > > If the agent sends a tooBig error on a GETBULK request, then this is an error > in the agent. See RFC3416 4.2.3: > > If the size of the message encapsulating the Response-PDU > containing the requested number of variable bindings would be > greater than either a local constraint or the maximum message > size of the originator, then the response is generated with a > lesser number of variable bindings. This lesser number is the > ordered set of variable bindings with some of the variable > bindings at the end of the set removed, such that the size of > the message encapsulating the Response-PDU is approximately > equal to but no greater than either a local constraint or the > maximum message size of the originator. Note that the number > of variable bindings removed has no relationship to the values > of N, M, or R. > > For the issue you reported, there is no general solution, because it > interferes with sparse tables. > A solution would either decrease the performance for sparse tables or will > filter out sparse rows. > The latter is not acceptable for intentionally sparse tables. > For dense tables, the filtering could be the best option. Although it would > hide new rows although the command generator already detected them. > > I am currently about to add an option for getDenseTable to activate a > filtering for new rows that appear during the table retrieval and are > therefore incompletely received. Would that help you? > > Best regards, > Frank > >> On 9. Jul 2018, at 19:45, Steffen Brüntjen <steffen.bruent...@macmon.eu> >> wrote: >> >> Hi Frank >> >> Thank you for having a look at it. I agree, the performance with many >> bindings is indeed *much* higher and yes, values should be retrieved >> row-by-row in order to avoid data inconsistencies. But there are also >> problems with many bindings: >> >> 1. Since the agent can not - in the contrast to max-repetition-count - >> decide how many values to send, the packet size might get too big if you >> have a table with many (big) columns. >> >> 2. There are agents that get into trouble when many columns are requested. >> This often results in timeouts (no tooBig error) and then there's no other >> option to requesting fewer bindings. >> >> Maybe the proposed change is the way to go, it's decent, but effective (I >> believe). >> >> Best regards >> Steffen >> >> >> -----Original Message----- >> From: Frank Fock [mailto:f...@agentpp.com] >> Sent: Freitag, 6. Juli 2018 18:55 >> To: Steffen Brüntjen <steffen.bruent...@macmon.eu> >> Cc: snmp4j@agentpp.org >> Subject: Re: [SNMP4J] max-bindings with big tables >> >> Hi Steffen, >> I will try to reproduce this issue. >> Independent from the result, the parameters for TableUtils are not suitable >> for your setup. The maxNumColumnsPerPDU has to be as large as possible. >> Otherwise the overall performance will be bad and the likelihood of >> incomplete table rows increases significantly (through changes in the agent >> while TableUtils operate). >> Best regards >> Frank >> >>> Am 06.07.2018 um 10:20 schrieb Steffen Brüntjen >>> <steffen.bruent...@macmon.eu>: >>> >>> Hi! >>> >>> I'm using SNMP4J version 2.6.2. >>> >>> Best regards >>> Steffen >>> >>> -----Original Message----- >>> From: Frank Fock [mailto:f...@agentpp.com] >>> Sent: Donnerstag, 5. Juli 2018 19:37 >>> To: Steffen Brüntjen <steffen.bruent...@macmon.eu> >>> Cc: snmp4j@agentpp.org >>> Subject: Re: [SNMP4J] max-bindings with big tables >>> >>> Hi Steffen >>> What SNMP4J version are you using? >>> Best regards >>> Frank >>> >>>> Am 05.07.2018 um 17:04 schrieb Steffen Brüntjen >>>> <steffen.bruent...@macmon.eu>: >>>> >>>> Hi Frank >>>> >>>> I believe I found an issue in the TableUtils class. In certain scenarios, >>>> the returned List<TableEvent> from getTable(Target target, OID[] >>>> columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain >>>> incomplete and duplicate rows. >>>> >>>> >>>> Here's an extract of an exemplary List<TableEvent> for a "good" result: >>>> >>>> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 >>>> = service] >>>> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 >>>> = reception] >>>> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 >>>> = voice] >>>> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 >>>> = clients] >>>> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 >>>> = VLAN601] >>>> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 >>>> = lab6] >>>> >>>> >>>> But in some specific circumstances, I get results like these: >>>> >>>> [ ... 75 normal rows ... ] >>>> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 >>>> = service] >>>> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 >>>> = reception] >>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice] >>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients] >>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601] >>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6] >>>> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, >>>> null, null] >>>> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, >>>> null, null] >>>> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, >>>> null, null] >>>> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, >>>> 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, >>>> null, null] >>>> [ ... everything normal ... ] >>>> >>>> >>>> Here we find some rows split into two: One block with the first 4 columns >>>> set null, and another block with the last 3 columns set null. >>>> >>>> >>>> Here's the setting which produces the second result: >>>> >>>> - max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int) >>>> - max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int) >>>> - the device returns many rows (like 120) >>>> - the table request contains more columns than max-bindings >>>> - the table request contains not a multiple of max-bindings >>>> - the problem will also depend on MTU size, but that's not important here >>>> >>>> >>>> This is what happens: >>>> >>>> 1. TableUtils will request the first 4 columns >>>> 2. device returns 60 variable bindings, that's 15 cells per column >>>> 3. TableUtils will request the latter 3 columns >>>> 4. device returns 60 variable bindings, that's 20 cells per column >>>> >>>> This is repeating until all bindings are retrieved. So far, so good. The >>>> problem is now, that all second requests (step 3) will receive more rows, >>>> and so these requests will reach index 283 (as in the example above) >>>> earlier. I did some debugging and I think I found the reason: When the >>>> first results with index 283 are received (step 3), TableUtils creates a >>>> row for this index. That row is filled up with null values for the first 4 >>>> columns so that it's size equals 7 (and not 3). Having size=7, the row is >>>> considered finished too soon. TableUtils then prunes these incomplete but >>>> finished rows from rowCache. When TableUtils receives the other 4 columns >>>> for row 283, it creates a new row with the same index. >>>> >>>> >>>> How to fix? >>>> >>>> I believe a moderately easy, but not very good way to fix this is to have >>>> the little part contain the first 3 columns, not the remaining last 3 >>>> columns: >>>> >>>> max-bindings = 4 >>>> columns: .1, .2, .3, .4, .5, .6, .7 >>>> 1. packet should contain: .1, .2, and .3 >>>> 2. packet should contain: .4, .5, .6, and .7 >>>> >>>> Number of columns for the first packet is NumColumnsTotal % maxBindings. >>>> Number of columns for the other packets is maxBindings. >>>> >>>> >>>> Please tell me if you need more information or if my method invocation is >>>> wrong. >>>> >>>> >>>> Best regards >>>> Steffen Brüntjen >>>> _______________________________________________ >>>> SNMP4J mailing list >>>> SNMP4J@agentpp.org >>>> https://oosnmp.net/mailman/listinfo/snmp4j > _______________________________________________ SNMP4J mailing list SNMP4J@agentpp.org https://oosnmp.net/mailman/listinfo/snmp4j