Hi Ying, I have a new patch sets which fixes this issue using fixes from your patches. It deviates from your patch the following way: In my solution, the subscription refcount keeps track of a subscription with or without timer. I do not increment refcount for timer, and use the subscriber lock plus the del_timer to find outstanding timers. I will send the series shortly.
I applied your series and ran some tests with it. If I run test against each patch individually, they seem to introduce new warnings/panics. I think every patch should be as correct as possible. I tried to moved around the patches in this series to get every patch correct, which led me to my series above. 1. tipc: advance the time of deleting subscription from subscriber->subscrp_list I get several warnings like: WARNING: CPU: 15 PID: 356 at lib/refcount.c:128 refcount_sub_and_test+0x5e/0x70 refcount_t: underflow; use-after-free. Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio CPU: 15 PID: 356 Comm: topsrv_tester Tainted: G W 4.10.0+ #111 Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013 Call Trace: dump_stack+0x4d/0x72 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? conn_put+0x12/0x30 [tipc] refcount_sub_and_test+0x5e/0x70 refcount_dec_and_test+0x11/0x20 tipc_subscrp_put+0x12/0x30 [tipc] tipc_subscrp_report_overlap+0xa7/0xc0 [tipc] tipc_nameseq_remove_publ+0x179/0x210 [tipc] tipc_nametbl_remove_publ+0x59/0xf0 [tipc] tipc_nametbl_withdraw+0x6c/0x130 [tipc] tipc_sk_withdraw+0xc0/0x100 [tipc] tipc_release+0x56/0x290 [tipc] sock_release+0x1f/0x80 sock_close+0x12/0x20 __fput+0xbe/0x200 ____fput+0xe/0x10 task_work_run+0x7e/0xb0 do_exit+0x3fd/0xc10 ? __do_page_fault+0x27a/0x500 do_group_exit+0x43/0xc0 SyS_exit_group+0x14/0x20 entry_SYSCALL_64_fastpath+0x13/0x94 RIP: 0033:0x7fcc289d22e9 RSP: 002b:00007ffc804f0ef8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcc289d22e9 RDX: 0000000000000000 RSI: 00007fcc28cbe323 RDI: 0000000000000000 RBP: 00007fcc28cb9860 R08: 000000000000003c R09: 00000000000000e7 R10: ffffffffffffff90 R11: 0000000000000246 R12: 00007fcc28cb9860 R13: 00007fcc28cbec60 R14: 0000000000000000 R15: 0000000000000000 And, the one below as you forgot to delete the subscription list at subscription timeout. general protection fault: 0000 [#1] SMP Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio CPU: 17 PID: 222 Comm: kworker/u40:2 Tainted: G W 4.10.0+ #111 Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013 Workqueue: tipc_send tipc_send_work [tipc] task: ffff880c444eef00 task.stack: ffffc90007734000 RIP: 0010:tipc_subscrb_subscrp_delete+0x6c/0x110 [tipc] RSP: 0018:ffffc90007737d48 EFLAGS: 00010246 RAX: dead000000000200 RBX: ffff880c3cebd480 RCX: 0000000000000101 RDX: dead000000000100 RSI: 0000000000000001 RDI: ffff880c3cebd480 RBP: ffffc90007737d70 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000011 R11: 0000000000000000 R12: dead0000000000a8 R13: 0000000000000000 R14: ffff880c44638088 R15: ffff880c446380a0 FS: 0000000000000000(0000) GS:ffff880c7f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000084f0c0 CR3: 0000000001c0b000 CR4: 00000000001406e0 Call Trace: tipc_subscrb_release_cb+0x17/0x30 [tipc] tipc_close_conn+0x80/0x90 [tipc] tipc_send_work+0x1a7/0x1b0 [tipc] process_one_work+0x145/0x410 worker_thread+0x69/0x4c0 kthread+0x128/0x160 ? process_one_work+0x410/0x410 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x29/0x40 Code: 49 89 c4 4d 85 ed 74 18 48 8d b3 80 00 00 00 ba 1c 00 00 00 4c 89 ef e8 e3 51 2e e1 85 c0 75 76 48 8b 53 58 48 8b 43 60 48 89 df <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 58 RIP: tipc_subscrb_subscrp_delete+0x6c/0x110 [tipc] RSP: ffffc90007737d48 2. tipc: adjust policy that sub->timer holds subscription kref My testcases fail, i think the waitq should fix it. So these commits should be squashed together. - create max_subscriptions for a non existent subscriber - wait for subscription timeout - create again the same max subscription and cancel them. They fail as: [16761.924321] Subscription rejected, limit reached (65535) [16761.929661] Subscription rejected, limit reached (65535) [16761.934966] Subscription rejected, limit reached (65535) [16761.940281] Subscription rejected, limit reached (65535) And with one of the your patches (the last one "tipc: delete expired subscription" which fixed it), I got: general protection fault: 0000 [#1] SMP Modules linked in: tipc(-) ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel] CPU: 15 PID: 8105 Comm: modprobe Tainted: G W 4.10.0+ #111 Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013 task: ffff880c3c4fe2c0 task.stack: ffffc9002227c000 RIP: 0010:tipc_nametbl_stop+0xf9/0x190 [tipc] RSP: 0018:ffffc9002227fe00 EFLAGS: 00010286 RAX: 0200000000000000 RBX: ffff880c4d8f5700 RCX: ffff880c3c5b8ab0 RDX: ffffffff81c56cc0 RSI: 00000000000000c3 RDI: ffffea002ad57750 RBP: ffffc9002227fe50 R08: 00000000ffff880c R09: 00000000ffffffff R10: 00000000ffffffff R11: 0000000000000000 R12: 01ffffffffffff90 R13: ffffffff81d02e00 R14: ffff880c4bba1360 R15: 01ffffffffffff90 FS: 0000000000852880(0000) GS:ffff880c7f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000878038 CR3: 0000000c3bda5000 CR4: 00000000001406e0 Call Trace: tipc_exit_net+0x2a/0x40 [tipc] ops_exit_list.isra.8+0x38/0x60 unregister_pernet_operations+0x90/0xe0 unregister_pernet_subsys+0x21/0x30 tipc_exit+0x15/0xc6d [tipc] SyS_delete_module+0x192/0x200 ? SyS_read+0x49/0xb0 entry_SYSCALL_64_fastpath+0x13/0x94 RIP: 0033:0x4abc07 RSP: 002b:00007ffd9adbfe18 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000877e20 RCX: 00000000004abc07 RDX: 000000000000002e RSI: 0000000000000880 RDI: 00007ffd9adbfe20 RBP: 0000000000000110 R08: 0000000000000004 R09: 0000000000000000 R10: 000000000084cd30 R11: 0000000000000246 R12: 000000000084cce0 R13: 0000000000877f30 R14: 0000000000000200 R15: 0000000000000110 Code: 45 8b 4c 24 18 45 8b 44 24 14 4c 89 ef e8 40 ee ff ff 49 8d bc 24 80 00 00 00 be 80 00 00 00 4d 89 fc e8 3b 16 05 e1 49 8d 47 70 <49> 8b 57 70 4c 39 f0 4c 8d 7a 90 75 bb 48 8b 43 20 48 85 c0 74 RIP: tipc_nametbl_stop+0xf9/0x190 [tipc] RSP: ffffc9002227fe00 /Partha On 03/09/2017 03:37 PM, Ying Xue wrote: > To be honest, I don't very like this solution. But I cannot think one > solution is better than this one although I tried at least the following > two methods to solve the issue. > > - Introduce one timeout list to subscriber. When subscription is > expired, it will be moved to its timeout list. When name service is > deleted or inserted, we are going to release all subscriptions linked in > timeout list. After I tried to do the experiment, it's found we have to > make a big change for current code; > > - In subscription's timeout handler, when subsription timeout event is > sent to server through tipc_subscrp_send_event(), we can add a new flag > in the message. When server sends the timeout message to user space send > workqueue, it can send back a TIPC_SUB_CANCEL message to subscriber > through tipc_subscrb_rcv_cb() if that new flag is set. When the > TIPC_SUB_CANCEL message arrives at subscriber, the expired subscription > can be deleted at the moment. The approach is still infeasible due to > the same reason as previous one. > > In all, this solution is a bit elegant compared to above two. > > If you have any suggestion, please let me know. > > Thanks, > Ying > > On 03/09/2017 10:21 PM, Ying Xue wrote: >> After the design policy of subscription timeout is adjusted, >> subscription is still present in name table after it's expired, and >> hence the subscriber will receive new events for this subscription >> until the subscriber terminates. >> >> Now when subscription is expired, it will be first deleted from name >> table and its subscriber, and then schedule its work. Finally the >> subscription will be destroyed in workqueue asynchronously. >> >> Signed-off-by: Ying Xue <ying....@windriver.com> >> --- >> net/tipc/server.h | 2 ++ >> net/tipc/subscr.c | 32 ++++++++++++++++++++++++++++++++ >> net/tipc/subscr.h | 1 + >> 3 files changed, 35 insertions(+) >> >> diff --git a/net/tipc/server.h b/net/tipc/server.h >> index 34f8055..7d5588f 100644 >> --- a/net/tipc/server.h >> +++ b/net/tipc/server.h >> @@ -59,6 +59,7 @@ >> * @name: server name >> * @imp: message importance >> * @type: socket type >> + * @wq: workqueue >> */ >> struct tipc_server { >> struct idr conn_idr; >> @@ -78,6 +79,7 @@ struct tipc_server { >> char name[TIPC_SERVER_NAME_LEN]; >> int imp; >> int type; >> + struct workqueue_struct *wq; >> }; >> >> int tipc_conn_sendmsg(struct tipc_server *s, int conid, >> diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c >> index ea67cce..3dcb009 100644 >> --- a/net/tipc/subscr.c >> +++ b/net/tipc/subscr.c >> @@ -137,14 +137,34 @@ void tipc_subscrp_report_overlap(struct >> tipc_subscription *sub, u32 found_lower, >> static void tipc_subscrp_timeout(unsigned long data) >> { >> struct tipc_subscription *sub = (struct tipc_subscription *)data; >> + struct tipc_subscriber *subscriber = sub->subscriber; >> + struct tipc_net *tn = net_generic(sub->net, tipc_net_id); >> >> /* Notify subscriber of timeout */ >> tipc_subscrp_send_event(sub, sub->evt.s.seq.lower, sub->evt.s.seq.upper, >> TIPC_SUBSCR_TIMEOUT, 0, 0); >> >> + spin_lock_bh(&subscriber->lock); >> + list_del(&sub->subscrp_list); >> + tipc_nametbl_unsubscribe(sub); >> + spin_unlock_bh(&subscriber->lock); >> + >> + queue_work(tn->topsrv->wq, &sub->work); >> + >> tipc_subscrp_put(sub); >> } >> >> +static void tipc_subscrp_work(struct work_struct *work) >> +{ >> + struct tipc_subscription *sub = container_of(work, >> + struct tipc_subscription, work); >> + struct tipc_subscriber *subscriber = sub->subscriber; >> + >> + spin_lock_bh(&subscriber->lock); >> + tipc_subscrp_delete(sub); >> + spin_unlock_bh(&subscriber->lock); >> +} >> + >> static void tipc_subscrb_kref_release(struct kref *kref) >> { >> kfree(container_of(kref,struct tipc_subscriber, kref)); >> @@ -282,6 +302,9 @@ static struct tipc_subscription >> *tipc_subscrp_create(struct net *net, >> memcpy(&sub->evt.s, s, sizeof(*s)); >> atomic_inc(&tn->subscription_count); >> kref_init(&sub->kref); >> + >> + INIT_WORK(&sub->work, tipc_subscrp_work); >> + >> return sub; >> } >> >> @@ -379,6 +402,13 @@ int tipc_topsrv_start(struct net *net) >> topsrv->tipc_conn_release = tipc_subscrb_release_cb; >> >> strncpy(topsrv->name, name, strlen(name) + 1); >> + topsrv->wq = create_singlethread_workqueue(topsrv->name); >> + if (!topsrv->wq) { >> + kfree(saddr); >> + kfree(topsrv); >> + return -ENOMEM; >> + } >> + >> tn->topsrv = topsrv; >> atomic_set(&tn->subscription_count, 0); >> >> @@ -391,6 +421,8 @@ void tipc_topsrv_stop(struct net *net) >> struct tipc_server *topsrv = tn->topsrv; >> >> tipc_server_stop(topsrv); >> + flush_workqueue(topsrv->wq); >> + destroy_workqueue(topsrv->wq); >> kfree(topsrv->saddr); >> kfree(topsrv); >> } >> diff --git a/net/tipc/subscr.h b/net/tipc/subscr.h >> index ee52957..50fc64b 100644 >> --- a/net/tipc/subscr.h >> +++ b/net/tipc/subscr.h >> @@ -65,6 +65,7 @@ struct tipc_subscription { >> struct list_head subscrp_list; >> int swap; >> struct tipc_event evt; >> + struct work_struct work; >> }; >> >> int tipc_subscrp_check_overlap(struct tipc_name_seq *seq, u32 found_lower, >> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion