Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-11-11 Thread Tung-Han Hsieh
Dear Nathan, Thank you very much for sharing this info. LU-14124 and LU-14125 exactly described the problem what we have encountered. At this moment what we have done is upgrade to Lustre-2.12.5. We tried the following work arounds: 1. Set grant_shirnk=0 at MGS only (not in the clients):

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-11-02 Thread Simon Guilbault
Hi, If you set it on the MGS, it will be the new default for all the clients and new mount on the FS, the problem is you need LU-12759 (fixed in 2.12.4) on your clients since there was a bug on older clients and that setting was not working correctly. On Mon, Nov 2, 2020 at 12:38 AM Tung-Han

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-11-01 Thread Tung-Han Hsieh
Dear Simon, Following your suggestions, now we confirmed that the problem of dropping I/O performance of a client when there is a continous I/O in the background is solved. It works charming. Thank you so much !! Here is a final question. We found that this command: lctl set_param

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-29 Thread Tung-Han Hsieh
Dear Simon, Thank you very much for your useful information. Now we are arranging the system maintenance date in order to upgrade to Lustre-2.12.5. Then we will follow your suggestion to see whether this problem could be fixed. Here I report a test of under continuous I/O, how the

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-29 Thread Simon Guilbault
Our current workaround was to use the following command on the MGS with Lustre 2.12.5 that include the patches in LU-12651 and LU-12759 (we were using a patched 2.12.4 a few months ago): lctl set_param -P osc.*.grant_shrink=0 We could not find the root cause of the underlying problem, dynamic

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-28 Thread Tung-Han Hsieh
Dear Simon, Thank you very much for your hint. Yes, you are right. We compared the grant size of two client by (running in each client): lctl get_param osc.*.cur_grant_bytes - Client A: It has run the following large data transfer for over 36 hrs. while [ 1 ]; do

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-28 Thread Simon Guilbault
Hi, we had a similar performance problem on our login/DTNs node a few months ago, the problem was the grant size was shrinking and was getting stuck under 1MB. Once under 1MB, the client had to send every request to the OST using sync IO. Check the output of the following command: lctl get_param

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-27 Thread Tung-Han Hsieh
Dear All, Sorry that I am not sure whether this mail was successfully posted to the lustre-discuss mailing list or not. So I resent it again. Please ignore it if you already read it before. === Dear Andreas, Thank you very

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-08 Thread Andreas Dilger
On Oct 8, 2020, at 10:37 AM, Tung-Han Hsieh wrote: > > Dear All, > > In the past months, we encountered several times of Lustre I/O abnormally > slowing down. It is quite mysterious that there seems no problem on the > network hardware, nor the lustre itself since there is no error message >

[lustre-discuss] Hidden QoS in Lustre ?

2020-10-08 Thread Tung-Han Hsieh
Dear All, In the past months, we encountered several times of Lustre I/O abnormally slowing down. It is quite mysterious that there seems no problem on the network hardware, nor the lustre itself since there is no error message at all in MDT/OST/client sides. Recently we probably found a way to