This is the expected behaviour. In the original implementation of PFL, when
a file was open in append mode, the lock from 0 to EOF was initializing all
stripes of the PFL file. We have a PFL layout on our system with 1 stripe
up to 1 GB, then it increased to 4 and then 32 stripes when the file was
We are not using RoCE in production, but a few years ago we tested it with
25Gb/s cards. From what I recall, RDMA was working as expected using the
MOFED stack, running lnet bench was using a few cores with TCP, while the
same benchmark with RoCE was using almost 0 CPU cores.
The only change I had
Hi, the grant_shrink bug was fixed in 2.15.0 according to this JIRA:
https://jira.whamcloud.com/browse/LU-14124
On Fri, Sep 30, 2022 at 3:59 AM Tung-Han Hsieh <
thhs...@twcp1.phys.ntu.edu.tw> wrote:
> Dear Peter,
>
> Thank you very much for your prompt reply.
>
> Actually we just encountered OST
Hi,
Start a ZFS scrub on your pool, this will ensure that all the content is
fine since the short resilver when re-adding dead disks to a pool does not
check everything, only what changed on the pool while that disk was gone.
I sadly often see that kind of error on my personal NAS due to some bad
Hi,
One of the things that the ZFS pacemaker resource does not seem to pick up
failure is when MMP fails due to some problem with the SAS bus. We added
this short script running as a systemd daemon to do a failover when this
happens. The other check in this script is using NHC, mostly to check if
t
Hi,
You can mount your MDT directly in ldiskfs mode (if using ldiskfs) instead
of Lustre, your complete metadata tree will be visible in the ROOT
directory inside the mount.
On Thu, Nov 12, 2020 at 2:41 AM Zeeshan Ali Shah
wrote:
> Dear All, is it possible to get list of files and directories f
cur_grant_bytes=797097
> >
> >
> > The value 797097 seems to be the minimum. When it dropped to 1062795,
> > the time of cp dramatically increased from around 1 sec to 1 min. In
> > addition, during the test, the cluster is completely idling. And it
> > is obviou
namically adjusted
> ?
>
> Thank you very much for your comment in advance.
>
> Best Regards,
>
> T.H.Hsieh
>
> On Wed, Oct 28, 2020 at 02:00:21PM -0400, Simon Guilbault wrote:
> > Hi, we had a similar performance problem on our login/DTNs node a few
> > months a
Hi, we had a similar performance problem on our login/DTNs node a few
months ago, the problem was the grant size was shrinking and was getting
stuck under 1MB. Once under 1MB, the client had to send every request to
the OST using sync IO.
Check the output of the following command:
lctl get_param o
Hi
The Compute Canada S3/Ceph copytool was never deployed in production since
S3 backed storage was never installed.
We moved to a tape based system with TSM instead and have been running that
copytool on 3 different sites for the past year.
https://github.com/guilbaults/ct_tsm
On Sun, Aug 23, 20
Hi, you can try these patches and check their respective Jira entry:
git cherry-pick --strategy=recursive -X theirs
97823e65efd85bb2db325232584d65646be5644f # for centos 7.8 LU-13347
git cherry-pick --strategy=recursive -X theirs
851ba18ee0424a3a4bf27d54d0c1af20eaf04ed6 # for centos 7.8 LU-13347
gi
Hi,
We seem to be hitting a performance issue with Lustre clients 2.12.2 and
2.12.3. Over time, the grant size of the OSC is shrinking and getting under
1MB and does not grow back. This lowers the performance of this client to a
few MB/s, even in the kB/s for some OST. This does not seem to happen
If it can be of any help, here is the script I use in a throwaway
Openstack VM to build Lustre+ZFS without LDISKFS on a vanilla Centos kernel:
yum install -y epel-release
yum install -y wget git
yum install -y asciidoc audit-libs-devel automake bc binutils-devel bison
device-mapper-devel elfutils
Hi,
I had the same error message with ./configure but it was for a x86 server
with some old Qlogic card. The relevant information to find the error was
in config.log.
In my case it was a macro problem (IS_ENABLED)
in source/arch/x86/include/asm/pgtable.h:504 that was added for the
Meltdown/Spectr
Hi,
If everything is connected with SAS JBOD and controllers, you could
probably run 1 OST on each server and get better performance that way. With
both server reaching the same SAS drives, you could also have a failover in
case one server does not work.
You can forget about failover if you are u
Hi, your lnet.conf look fine, I tested lnet with RoCE V2 a while back with
a pair of server using Connectx4 with a single 25Gb interface and RDMA was
working with Centos 7.3, stock RHEL OFED and Lustre 2.9. The only settings
that I had to use in lustre's config was this one:
options lnet networks=
16 matches
Mail list logo