Hi Partha,

Thank you for the review and improvement.

On 03/14/2017 01:43 AM, Parthasarathy Bhuvaragan wrote:
> Hi Ying,
> 
> I have a new patch sets which fixes this issue using fixes from your 
> patches. It deviates from your patch the following way:
> In my solution, the subscription refcount keeps track of a subscription 
> with or without timer. I do not increment refcount for timer, and use 
> the subscriber lock plus the del_timer to find outstanding timers. I 
> will send the series shortly.

I have carefully reviewed your solution as well as your revised patches.
Your method is pretty simpler than mine. Instead the thing I image is
too complex. Now I can confirm that it's unnecessary to increase
subscription refcount before its timer is started, and it's absolutely
safe for us now. Good work Parth!

> 
> I applied your series and ran some tests with it. If I run test against 
> each patch individually, they seem to introduce new warnings/panics. I 
> think every patch should be as correct as possible. I tried to moved 
> around the patches in this series to get every patch correct, which led 
> me to my series above.
> 

If we can ensure every single patch of a patchset can independently work
very well, that's very good. But in many cases, it's hard to reach that
goal. The most reason is that on one hand, we must have patch easily
readable for reviewer, on another hand, we must keep every single work
well. So it is sometimes very hard.

Anyway, your revised patches are very good. If Jon or other guys have no
any different opinion, please submit them to net-next as soon as possible.

Of course, if possible, please let John verify them again.

Thanks,
Ying

> 1. tipc: advance the time of deleting subscription from
>      subscriber->subscrp_list
> I get several warnings like:
> WARNING: CPU: 15 PID: 356 at lib/refcount.c:128 
> refcount_sub_and_test+0x5e/0x70
> refcount_t: underflow; use-after-free.
> Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet 
> virtio_net virtio_pci virtio_ring virtio
> CPU: 15 PID: 356 Comm: topsrv_tester Tainted: G  W    4.10.0+ #111
> Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013
> Call Trace:
>   dump_stack+0x4d/0x72
>   __warn+0xd1/0xf0
>   warn_slowpath_fmt+0x4f/0x60
>   ? conn_put+0x12/0x30 [tipc]
>   refcount_sub_and_test+0x5e/0x70
>   refcount_dec_and_test+0x11/0x20
>   tipc_subscrp_put+0x12/0x30 [tipc]
>   tipc_subscrp_report_overlap+0xa7/0xc0 [tipc]
>   tipc_nameseq_remove_publ+0x179/0x210 [tipc]
>   tipc_nametbl_remove_publ+0x59/0xf0 [tipc]
>   tipc_nametbl_withdraw+0x6c/0x130 [tipc]
>   tipc_sk_withdraw+0xc0/0x100 [tipc]
>   tipc_release+0x56/0x290 [tipc]
>   sock_release+0x1f/0x80
>   sock_close+0x12/0x20
>   __fput+0xbe/0x200
>   ____fput+0xe/0x10
>   task_work_run+0x7e/0xb0
>   do_exit+0x3fd/0xc10
>   ? __do_page_fault+0x27a/0x500
>   do_group_exit+0x43/0xc0
>   SyS_exit_group+0x14/0x20
>   entry_SYSCALL_64_fastpath+0x13/0x94
> RIP: 0033:0x7fcc289d22e9
> RSP: 002b:00007ffc804f0ef8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcc289d22e9
> RDX: 0000000000000000 RSI: 00007fcc28cbe323 RDI: 0000000000000000
> RBP: 00007fcc28cb9860 R08: 000000000000003c R09: 00000000000000e7
> R10: ffffffffffffff90 R11: 0000000000000246 R12: 00007fcc28cb9860
> R13: 00007fcc28cbec60 R14: 0000000000000000 R15: 0000000000000000
> 
> And, the one below as you forgot to delete the subscription list at 
> subscription timeout.
> general protection fault: 0000 [#1] SMP
> Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet 
> virtio_net virtio_pci virtio_ring virtio
> CPU: 17 PID: 222 Comm: kworker/u40:2 Tainted: G        W       4.10.0+ #111
> Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013
> Workqueue: tipc_send tipc_send_work [tipc]
> task: ffff880c444eef00 task.stack: ffffc90007734000
> RIP: 0010:tipc_subscrb_subscrp_delete+0x6c/0x110 [tipc]
> RSP: 0018:ffffc90007737d48 EFLAGS: 00010246
> RAX: dead000000000200 RBX: ffff880c3cebd480 RCX: 0000000000000101
> RDX: dead000000000100 RSI: 0000000000000001 RDI: ffff880c3cebd480
> RBP: ffffc90007737d70 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000011 R11: 0000000000000000 R12: dead0000000000a8
> R13: 0000000000000000 R14: ffff880c44638088 R15: ffff880c446380a0
> FS:  0000000000000000(0000) GS:ffff880c7f440000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000084f0c0 CR3: 0000000001c0b000 CR4: 00000000001406e0
> Call Trace:
>   tipc_subscrb_release_cb+0x17/0x30 [tipc]
>   tipc_close_conn+0x80/0x90 [tipc]
>   tipc_send_work+0x1a7/0x1b0 [tipc]
>   process_one_work+0x145/0x410
>   worker_thread+0x69/0x4c0
>   kthread+0x128/0x160
>   ? process_one_work+0x410/0x410
>   ? kthread_create_on_node+0x40/0x40
>   ret_from_fork+0x29/0x40
> Code: 49 89 c4 4d 85 ed 74 18 48 8d b3 80 00 00 00 ba 1c 00 00 00 4c 89 
> ef e8 e3 51 2e e1 85 c0 75 76 48 8b 53 58 48 8b 43 60 48 89 df <48> 89 
> 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 58
> RIP: tipc_subscrb_subscrp_delete+0x6c/0x110 [tipc] RSP: ffffc90007737d48
> 
> 2. tipc: adjust policy that sub->timer holds subscription kref
> My testcases fail, i think the waitq should fix it. So these commits 
> should be squashed together.
> - create max_subscriptions for a non existent subscriber
> - wait for subscription timeout
> - create again the same max subscription and cancel them.
> They fail as:
> [16761.924321] Subscription rejected, limit reached (65535)
> [16761.929661] Subscription rejected, limit reached (65535)
> [16761.934966] Subscription rejected, limit reached (65535)
> [16761.940281] Subscription rejected, limit reached (65535)
> 
> And with one of the your patches (the last one "tipc: delete expired 
> subscription" which fixed it), I got:
> general protection fault: 0000 [#1] SMP
> Modules linked in: tipc(-) ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 
> 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: 
> ip6_udp_tunnel]
> CPU: 15 PID: 8105 Comm: modprobe Tainted: G        W       4.10.0+ #111
> Hardware name: Ericsson AB CXC1060320/ROJ208840/5, BIOS 4.6.5 11/07/2013
> task: ffff880c3c4fe2c0 task.stack: ffffc9002227c000
> RIP: 0010:tipc_nametbl_stop+0xf9/0x190 [tipc]
> RSP: 0018:ffffc9002227fe00 EFLAGS: 00010286
> RAX: 0200000000000000 RBX: ffff880c4d8f5700 RCX: ffff880c3c5b8ab0
> RDX: ffffffff81c56cc0 RSI: 00000000000000c3 RDI: ffffea002ad57750
> RBP: ffffc9002227fe50 R08: 00000000ffff880c R09: 00000000ffffffff
> R10: 00000000ffffffff R11: 0000000000000000 R12: 01ffffffffffff90
> R13: ffffffff81d02e00 R14: ffff880c4bba1360 R15: 01ffffffffffff90
> FS:  0000000000852880(0000) GS:ffff880c7f3c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000878038 CR3: 0000000c3bda5000 CR4: 00000000001406e0
> Call Trace:
>   tipc_exit_net+0x2a/0x40 [tipc]
>   ops_exit_list.isra.8+0x38/0x60
>   unregister_pernet_operations+0x90/0xe0
>   unregister_pernet_subsys+0x21/0x30
>   tipc_exit+0x15/0xc6d [tipc]
>   SyS_delete_module+0x192/0x200
>   ? SyS_read+0x49/0xb0
>   entry_SYSCALL_64_fastpath+0x13/0x94
> RIP: 0033:0x4abc07
> RSP: 002b:00007ffd9adbfe18 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 0000000000877e20 RCX: 00000000004abc07
> RDX: 000000000000002e RSI: 0000000000000880 RDI: 00007ffd9adbfe20
> RBP: 0000000000000110 R08: 0000000000000004 R09: 0000000000000000
> R10: 000000000084cd30 R11: 0000000000000246 R12: 000000000084cce0
> R13: 0000000000877f30 R14: 0000000000000200 R15: 0000000000000110
> Code: 45 8b 4c 24 18 45 8b 44 24 14 4c 89 ef e8 40 ee ff ff 49 8d bc 24 
> 80 00 00 00 be 80 00 00 00 4d 89 fc e8 3b 16 05 e1 49 8d 47 70 <49> 8b 
> 57 70 4c 39 f0 4c 8d 7a 90 75 bb 48 8b 43 20 48 85 c0 74
> RIP: tipc_nametbl_stop+0xf9/0x190 [tipc] RSP: ffffc9002227fe00
> 
> /Partha
> 
> On 03/09/2017 03:37 PM, Ying Xue wrote:
>> To be honest, I don't very like this solution. But I cannot think one
>> solution is better than this one although I tried at least the following
>> two methods to solve the issue.
>>
>> - Introduce one timeout list to subscriber. When subscription is
>> expired, it will be moved to its timeout list. When name service is
>> deleted or inserted, we are going to release all subscriptions linked in
>> timeout list. After I tried to do the experiment, it's found we have to
>> make a big change for current code;
>>
>> - In subscription's timeout handler, when subsription timeout event is
>> sent to server through tipc_subscrp_send_event(), we can add a new flag
>> in the message. When server sends the timeout message to user space send
>> workqueue, it can send back a TIPC_SUB_CANCEL message to subscriber
>> through tipc_subscrb_rcv_cb() if that new flag is set. When the
>> TIPC_SUB_CANCEL message arrives at subscriber, the expired subscription
>> can be deleted at the moment. The approach is still infeasible due to
>> the same reason as previous one.
>>
>> In all, this solution is a bit elegant compared to above two.
>>
>> If you have any suggestion, please let me know.
>>
>> Thanks,
>> Ying
>>
>> On 03/09/2017 10:21 PM, Ying Xue wrote:
>>> After the design policy of subscription timeout is adjusted,
>>> subscription is still present in name table after it's expired, and
>>> hence the subscriber will receive new events for this subscription
>>> until the subscriber terminates.
>>>
>>> Now when subscription is expired, it will be first deleted from name
>>> table and its subscriber, and then schedule its work. Finally the
>>> subscription will be destroyed in workqueue asynchronously.
>>>
>>> Signed-off-by: Ying Xue <ying....@windriver.com>
>>> ---
>>>  net/tipc/server.h |  2 ++
>>>  net/tipc/subscr.c | 32 ++++++++++++++++++++++++++++++++
>>>  net/tipc/subscr.h |  1 +
>>>  3 files changed, 35 insertions(+)
>>>
>>> diff --git a/net/tipc/server.h b/net/tipc/server.h
>>> index 34f8055..7d5588f 100644
>>> --- a/net/tipc/server.h
>>> +++ b/net/tipc/server.h
>>> @@ -59,6 +59,7 @@
>>>   * @name: server name
>>>   * @imp: message importance
>>>   * @type: socket type
>>> + * @wq: workqueue
>>>   */
>>>  struct tipc_server {
>>>     struct idr conn_idr;
>>> @@ -78,6 +79,7 @@ struct tipc_server {
>>>     char name[TIPC_SERVER_NAME_LEN];
>>>     int imp;
>>>     int type;
>>> +   struct workqueue_struct *wq;
>>>  };
>>>
>>>  int tipc_conn_sendmsg(struct tipc_server *s, int conid,
>>> diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
>>> index ea67cce..3dcb009 100644
>>> --- a/net/tipc/subscr.c
>>> +++ b/net/tipc/subscr.c
>>> @@ -137,14 +137,34 @@ void tipc_subscrp_report_overlap(struct 
>>> tipc_subscription *sub, u32 found_lower,
>>>  static void tipc_subscrp_timeout(unsigned long data)
>>>  {
>>>     struct tipc_subscription *sub = (struct tipc_subscription *)data;
>>> +   struct tipc_subscriber *subscriber = sub->subscriber;
>>> +   struct tipc_net *tn = net_generic(sub->net, tipc_net_id);
>>>
>>>     /* Notify subscriber of timeout */
>>>     tipc_subscrp_send_event(sub, sub->evt.s.seq.lower, sub->evt.s.seq.upper,
>>>                             TIPC_SUBSCR_TIMEOUT, 0, 0);
>>>
>>> +   spin_lock_bh(&subscriber->lock);
>>> +   list_del(&sub->subscrp_list);
>>> +   tipc_nametbl_unsubscribe(sub);
>>> +   spin_unlock_bh(&subscriber->lock);
>>> +
>>> +   queue_work(tn->topsrv->wq, &sub->work);
>>> +
>>>     tipc_subscrp_put(sub);
>>>  }
>>>
>>> +static void tipc_subscrp_work(struct work_struct *work)
>>> +{
>>> +   struct tipc_subscription *sub = container_of(work,
>>> +                                   struct tipc_subscription, work);
>>> +   struct tipc_subscriber *subscriber = sub->subscriber;
>>> +
>>> +   spin_lock_bh(&subscriber->lock);
>>> +   tipc_subscrp_delete(sub);
>>> +   spin_unlock_bh(&subscriber->lock);
>>> +}
>>> +
>>>  static void tipc_subscrb_kref_release(struct kref *kref)
>>>  {
>>>     kfree(container_of(kref,struct tipc_subscriber, kref));
>>> @@ -282,6 +302,9 @@ static struct tipc_subscription 
>>> *tipc_subscrp_create(struct net *net,
>>>     memcpy(&sub->evt.s, s, sizeof(*s));
>>>     atomic_inc(&tn->subscription_count);
>>>     kref_init(&sub->kref);
>>> +
>>> +   INIT_WORK(&sub->work, tipc_subscrp_work);
>>> +
>>>     return sub;
>>>  }
>>>
>>> @@ -379,6 +402,13 @@ int tipc_topsrv_start(struct net *net)
>>>     topsrv->tipc_conn_release       = tipc_subscrb_release_cb;
>>>
>>>     strncpy(topsrv->name, name, strlen(name) + 1);
>>> +   topsrv->wq = create_singlethread_workqueue(topsrv->name);
>>> +   if (!topsrv->wq) {
>>> +           kfree(saddr);
>>> +           kfree(topsrv);
>>> +           return -ENOMEM;
>>> +   }
>>> +
>>>     tn->topsrv = topsrv;
>>>     atomic_set(&tn->subscription_count, 0);
>>>
>>> @@ -391,6 +421,8 @@ void tipc_topsrv_stop(struct net *net)
>>>     struct tipc_server *topsrv = tn->topsrv;
>>>
>>>     tipc_server_stop(topsrv);
>>> +   flush_workqueue(topsrv->wq);
>>> +   destroy_workqueue(topsrv->wq);
>>>     kfree(topsrv->saddr);
>>>     kfree(topsrv);
>>>  }
>>> diff --git a/net/tipc/subscr.h b/net/tipc/subscr.h
>>> index ee52957..50fc64b 100644
>>> --- a/net/tipc/subscr.h
>>> +++ b/net/tipc/subscr.h
>>> @@ -65,6 +65,7 @@ struct tipc_subscription {
>>>     struct list_head subscrp_list;
>>>     int swap;
>>>     struct tipc_event evt;
>>> +   struct work_struct work;
>>>  };
>>>
>>>  int tipc_subscrp_check_overlap(struct tipc_name_seq *seq, u32 found_lower,
>>>
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to