[Patch 12/12] tabled: print hostname always

2010-04-17 Thread Pete Zaitcev
This code clearly was obsolete and wishful thinking. Let's just be simple. Most importantly print something that tells the sysadmin what node is the problem. Signed-off-by: Pete Zaitcev --- server/storage.c | 19 +++ server/tabled.h |2 +- 2 files changed, 4 insertions(+)

[Patch 11/12] tabled: check argument of -D better

2010-04-17 Thread Pete Zaitcev
The atoi() really does not cut it, as I discovered when I supplied -D -E to tabled. Other arguments may benefit from such checking too, but -D is unique in that nothing gets logged in case of this mistake. So let's just add it here for now; others will at least report something. Signed-off-by: Pet

[Patch 10/12] tabled: retry initial CLD session open etc.

2010-04-17 Thread Pete Zaitcev
This was an error in the conversion to ncld. In the cldc code, we kick the state machine and the natural retries do the rest. Any failures occure there. But in ncld the original kick can fail too. Five retries give CLD server time to reboot. If it's down, then clients refuse to start. This may be

[Patch 09/12] tabled: drop double prefixing

2010-04-17 Thread Pete Zaitcev
On Fedora 14, the following is seen in syslog: Apr 17 19:58:52 niphredil tabled: tabled: connecting to site hitlain.zaitcev.lan:8083: No route to host Apr 17 19:58:56 niphredil tabled: tabled: DB_ENV->rep_elect:WARNING: nvotes (1) is sub-majority with nsites (2) Drop the extra prefix, it only w

[Patch 08/12] Chunk: fix wrong message

2010-04-17 Thread Pete Zaitcev
The message makes no sense. It was a carry-over from cldc where there were many failure modes (fh is NULL, fh->valid false, etc.). Signed-off-by: Pete Zaitcev --- server/cldu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b723269c703c60560e18923ef49fdbe2a46be133 Author: Mast

[Patch 07/12] Chunk: retry initial CLD session open

2010-04-17 Thread Pete Zaitcev
This was an error in the conversion to ncld. In the cldc code, we kick the state machine and the natural retries do the rest. Any failures occure there. But in ncld the original kick can fail too. Five retries give CLD server time to reboot. If it's down, then clients refuse to start. This may be

[Patch 06/12] Chunk: do not call timer_add from event context

2010-04-17 Thread Pete Zaitcev
No matter if timer or cld_timer is used, this was not valid. Obviously, locking is missing, so only one thread can access a certain tlist. But the actual hang was more interesting than a race and crash. Suppose that we add the first timer. In that case, main thread invokes poll() with no timeout. I

[Patch 05/12] Chunk: Use CLD timers

2010-04-17 Thread Pete Zaitcev
Since ncld uses CLD timers and thus we had to have them in libcldc, we may as well use them in Chunk. This gives us an automatic importation of bugfixes. Signed-off-by: Pete Zaitcev --- server/chunkd.h | 27 ++ server/cldu.c |4 +- server/util.c | 84 +---

[Patch 04/12] CLD: remove g_list_foreach

2010-04-17 Thread Pete Zaitcev
In practice this is not a bug, because g_list_foreach is implemented cleverly, and so the old code works. But it looks more direct with one less function, and we do not depend on g_list_foreach doing the right thing, since it's not in the documentation anywhere. Signed-off-by: Pete Zaitcev ---

[Patch 03/12] CLD: fix commentary

2010-04-17 Thread Pete Zaitcev
Add and fix some comments regarding the reasons behind the pipe etc. No code changes. Signed-off-by: Pete Zaitcev --- lib/cldc.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) commit e675f2f316bbb24ca84c1bc23e4d1c6d53b029de Author: Master Date: Sat Apr

[Patch 02/12] CLD: fix hang in ncld_sess_close

2010-04-17 Thread Pete Zaitcev
The use model of ncld made one thing obvious: the cldc_close has two functions: it disposes of the handle in the client memory (in theory -- in practice we do not free those until session terminates), and also talks to the server about that. Our operations have no own timers, so if session goes dow

[Patch 01/12] CLD: fix crash in retransmissions

2010-04-17 Thread Pete Zaitcev
For a longest time I was plagued by (very infrequent) crashes like this: Program received signal SIGSEGV, Segmentation fault. sess_retry_output (timer=0x92070c0) at session.c:532 532 if (!next_retry || (op->next_retry < next_retry)) (gdb) info threads * 1 Thread 0xb72f96c0 (LWP

[Patch 0/12] Start-up and timer bugfixes

2010-04-17 Thread Pete Zaitcev
The most important thing to know is that this series does not fix the failure to build in Koji. But at least I can run tests now. The segfaults in CLD were driving me mad. With this I do not even have a shell on CLD server anymore. Please rest assured that I'll get to the bottom of Koji problem.

Re: Trivial Q about chunkd's main_loop

2010-04-17 Thread Jeff Garzik
On 04/17/2010 09:36 PM, Pete Zaitcev wrote: Is there a reason why the main_loop in chunkd uses a naked g_hash_table_lookup instead of srv_poll_lookup? Performance? @@ -1681,8 +1681,7 @@ static int main_loop(void) fired++; - sp = g_hash_table_lookup

Trivial Q about chunkd's main_loop

2010-04-17 Thread Pete Zaitcev
Is there a reason why the main_loop in chunkd uses a naked g_hash_table_lookup instead of srv_poll_lookup? Performance? @@ -1681,8 +1681,7 @@ static int main_loop(void) fired++; - sp = g_hash_table_lookup(chunkd_srv.fd_info, -

Re: [Patch 1/8] CLD: cleanup: add cld_msg_rpc.x

2010-04-17 Thread Jeff Garzik
On 04/16/2010 10:18 PM, Pete Zaitcev wrote: On Wed, 14 Apr 2010 15:55:01 -0400 Jeff Garzik wrote: +++ b/lib/Makefile.am @@ -27,6 +27,7 @@ libcldc_la_SOURCES= \ common.c\ libtimer.c \ pkt.c \ + cld_msg_rpc.x