Re: [Tech] Very frequent data store and routing table corruption

Gordan Bobic Sat, 01 Feb 2003 16:27:49 -0800

On Saturday 01 Feb 2003 10:14 pm, Matthew Toseland wrote:

> > I've got a problem. Just about ever time I stop my note to upgrade the
> > library to a newer version, the node refuses to start afterwards. The
> > last
>
> Are you sure you stopped it properly? If it was still running it might
> cause some nasty symptoms.


Positively sure. I always verify that all the Freenet Java processes have 
terminated before re-starting.

> > message in the log is usually starting mainport. Sometimes, quite
> > infrequently, it will stop a bit before that.
>
> Set logLevel=debug, try again, mail me the log.

Will do next time it happens. It may be a few days, but if past experience is 
anything to go by, it is going to happen again...

> > The only way to get it to start again after that is to delete the routing
> > table files and the data store. Otherwise it will always refuse to start.
>
> Deleting only the routing table files, or only the datastore, does not
> cause it to break?

Once or twice deleting only one of them will allow it to re-start (e.g. rename 
one of the directories and re-start). But most of the time, both need to be 
deleted.

> > I have also found that in the latest downloadable version, having too
> > many node references in the seednodes.ref file will make the node crash
> > during the process of initializing the routing table. I had about 400
> > fresh node references dumped into the file before stopping for the
> > upgrade last time, and after cleaning out the node to get it to re-start
> > again, it would not start again until I used an older, smaller
> > seednodes.ref file.
>
> 400 is ridiculous.

That is how many there were in the routing table that I dupmed into a file. 
What is a good number? Should a seed file that large cause it to crash?

> > Is this normal? What is the maximum number of entries allowed in
> > seednodes.ref? Why does my node refuse to start again after being stopped
>
> Your node only has 50 references in its routing table normally. 400
> seednodes is ludicrous.

I increased the number of routing table entries in the config file.

> > after 10-20+ hours? Is this a known problem? It seems very
> > counter-productive to have to delete several GB of data store on every
> > re-start.
>
> No, it's not a known problem, it has not been reported by anyone else,
> we would like to fix it.

I've seen at least two other similar reports on the Frost discussion boards... 
I will post you the log file next time I encounter the problem. I usually 
clear the log file before re-starting, but I will make sure I leave it 
intact, as it could be something during the previous run-time that broke the 
data store.

> > What configuration options could be to blame? Here are some of the
> > options I use that I think coulbd be relevant to the problem.
> >
> > Any ideas where it's all going wrong?
> >
> > rtMaxRefs=64
>
> Hmm.
>
> > rtMaxNodes=1024
>
> HMMMM. This is um, a bit high. It's possible that your routing table is
> always corrupted because it takes so long to write it out that it never
> actually reaches disk whole.

Possible. But the same was happening with this setting as low as 64.

I have now commented out the above two options, so we'll see if that reduces 
the frequency of the problem occuring, now that both are at default levels.

Another thing I have noticed is that the node sometimes crashes. The routing 
time will go to 0ms, and the number of threads will keep growing. The node 
will start refusing all the connections as the thread limit hits 100%+. After 
that, only kill -9 will terminate the java process. Everything less final 
will be ignored.

> > Is there something in the handling of the routing tables that dies with
> > the entries of rtMaxRefs=64 and rtMaxNodes=1024? I upped the rtMaxNodes
> > number because 256 filled up quite quickly. It looks like it tops out at
> > 384, presumably, due to the lack of more permanent nodes? I have seen in
> > the configuration options that the default values for both of these are
> > 50. I didn't think that upping rtMaxRefs to 64 would hurt things, but the
> > maximum of 50x50=2500 is a lot less than 64x1024=65536 entries in the
> > routing table.
> >
> > But if it was just the routing table getting corrupted, then why doesn't
> > deleting just the routing table enough to get the node to start? Why does
> > the data store need to be deleted as well? And why doesn't the node start
> > if a big seednodes.ref file is used when the routing table needs to be
> > pre-seeded (the one I had dumped to a file was over 400KB).
>
> I can't tell you why until I find out exactly what it does rather than
> working. See above. And once the node is up it should find new nodes
> without needing a seednodes file.

Unless the routing table files get corrupted and the node doesn't start any 
more... :-/

But most of the time, the incoming contacts make the routing table get up to 
speed quite quickly, as the other nodes still "remember" where my node was...

> > Could any of this be related to the heisenbug warning/panic I
> > occasionally see in the log?
>
> I doubt it, but you never know.

I will post you the log next time. I try not to re-start my node given the 
described problems unless there is a good reason, such as a new version being 
available.

Thanks.

Gordan

_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/tech

Re: [Tech] Very frequent data store and routing table corruption

Reply via email to