Re: [Y2038] [PATCH] vfs: replace current_kernel_time64 with ktime equivalent
On Fri, Jun 22, 2018 at 03:24:48PM +0200, Arnd Bergmann wrote: > On Thu, Jun 21, 2018 at 10:23 PM, Dave Chinner wrote: > > On Wed, Jun 20, 2018 at 05:01:24PM +0200, Arnd Bergmann wrote: > > >> diff --git a/fs/inode.c b/fs/inode.c > >> index 2c300e981796..e27bd9334939 100644 > >> --- a/fs/inode.c > >> +++ b/fs/inode.c > >> @@ -2133,7 +2133,9 @@ EXPORT_SYMBOL(timespec64_trunc); > >> */ > >> struct timespec64 current_time(struct inode *inode) > >> { > >> - struct timespec64 now = current_kernel_time64(); > >> + struct timespec64 now; > >> + > >> + ktime_get_coarse_real_ts64(); > > > > Can I just say as a filesystem dev who has no idea at all about > > kernel timer implementations: this is an awful API change. There > > are hundreds of callers of current_time(), so I'm not going to be > > the only person looking at this function who has no clue about WTF > > "ktime_get_coarse_real" actually means or does. Further, this > > function is not documented, and jumps straight into internal time > > implementation stuff, so I'm lost straight away if somebody asks me > > "what does that function do"?. i.e. I have *no clue* what this > > function returns or why this code uses it. > > You definitely have a point about the documentation. I meant to > fix that as part of the recent rework of the timekeeping.h header > but haven't finished it, partly because the header is still being > changed as we get rid of the older interfaces. The interface documentation should be introduced with the new interfaces, not left for later as that leaves people like me with no fucking clue about what the changes actually mean or why they are being done. Perhaps you'd have done better to explain this API as an internal implementation of clock_gettime(CLOCK_REALTIME_COARSE), which is clearly documented in the man page as: "Use when you need very fast, but not fine-grained timestamps." Put that comment on ktime_get_coarse_real_ts64(), and almost all the questions about "WTF does this whacky function do?" go away Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently
On 6/18/2018 8:29 AM, Arnd Bergmann wrote: The get_seconds() helper returns an 'unsigned long' value, which can overflow on 32-bit architectures. Since the interface we pass it into already uses a 64-bit type, we can just use ktime_get_real_seconds() instead. While we generally prefer local timestamps in CLOCK_MONOTONIC format (ktime_get_seconds), this keeps using the CLOCK_REALTIME version in order to maintain compatibility with existing code. Signed-off-by: Arnd Bergmann --- drivers/scsi/scsi_transport_fc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index 1da3d71e9f61..bb6de88aa724 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -567,7 +567,7 @@ fc_host_post_event(struct Scsi_Host *shost, u32 event_number, INIT_SCSI_NL_HDR(>snlh, SCSI_NL_TRANSPORT_FC, FC_NL_ASYNC_EVENT, len); - event->seconds = get_seconds(); + event->seconds = ktime_get_real_seconds(); event->vendor_id = 0; event->host_no = shost->host_no; event->event_datalen = sizeof(u32); /* bytes */ @@ -635,7 +635,7 @@ fc_host_post_vendor_event(struct Scsi_Host *shost, u32 event_number, INIT_SCSI_NL_HDR(>snlh, SCSI_NL_TRANSPORT_FC, FC_NL_ASYNC_EVENT, len); - event->seconds = get_seconds(); + event->seconds = ktime_get_real_seconds(); event->vendor_id = vendor_id; event->host_no = shost->host_no; event->event_datalen = data_len; /* bytes */ looks good Reviewed-by: James Smart ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently
On 06/18/2018 08:29 AM, Arnd Bergmann wrote: > The get_seconds() helper returns an 'unsigned long' value, which can > overflow on 32-bit architectures. Since the interface we pass it into > already uses a 64-bit type, we can just use ktime_get_real_seconds() > instead. > > While we generally prefer local timestamps in CLOCK_MONOTONIC format > (ktime_get_seconds), this keeps using the CLOCK_REALTIME version > in order to maintain compatibility with existing code. > > Signed-off-by: Arnd Bergmann > --- Reviewed-by: Tyrel Datwyler ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH v2 2/2] rusage: allow 64-bit times ru_utime/ru_stime
Ingo Molnar writes: > * Eric W. Biederman wrote: > >> Ingo Molnar writes: >> >> > * Eric W. Biederman wrote: >> > >> >> The trouble with attributes is that means you can't filter your system >> >> call arguments with seccomp. [...] >> > >> > There's nothing keeping seccomp from securely fetching those arguments and >> > extending filtering to them as well ... >> > >> > Allowing that would make sense for a lot of other system calls as >> > well. >> >> Possibly. The challenge is that if the fetch for the kernel to use >> those arguments is different from the fetch of seccomp to test those >> arguments you have a time of test vs time of use race. > > Those fetched values should obviously then be used to call permitted > system calls. Agreed. To my knowledge no one has figured out how to make that work yet. For the most part it has been unnecessary. >> Given the location of the seccomp hook at the kernel user space border >> there is no easy way for seccomp to share the fetch with the system >> call itself. >> >> So I don't see how seccomp could perform the fetch securely. > > Looks like more of a seccomp mis-design/mis-implementation than some > fundamental > problem. Frankly. Given that there are some very good solutions in other operating systems, I think the misdesign is in unix/linux not providing a good answer to what to do when you need more than 6 arguments to a system call. > Mis-designed security features should not hinder system call design. I certainly agree that seccomp should not be the sole reason for not doing something. However there are lots of reasons to avoid extensibility in general. Excess extensibility has been the cause of more than one security issue. Lots of flexibility comes at the price of lots of conditional execution which tends to explode the test matrix of possibilities to test, with the result that some combinations are never thought about or tested because they don't make sense to combine. Then someone with mischievious intent see that combination and thinks what happens when I do this. Further that conditional execution can frequently be the cause of slow code as well. So while there are many nice features of tagged values. I don't think they are a general solution. The lack of seccomp support (today) is just one downside among many. I do think it would be nice to have a general pattern for those system calls that require extensibility. My gut feel says something like the L4 pseudo registers (to give a maxium request size) combined with something like netlink encoding would make a very nice template for making fast and flexible system calls. Eric ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH] vfs: replace current_kernel_time64 with ktime equivalent
On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann wrote: > On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen wrote: >> Arnd Bergmann writes: >>> >>> To clarify: current_kernel_time() uses at most millisecond resolution rather >>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the >>> timer tick. >> >> Ah you're right. I remember now: the motivation was to make sure there >> is basically no overhead. In some setups the full gtod can be rather >> slow, particularly if it falls back to some crappy timer. > > This means, we're probably fine with a compile-time option that > distros can choose to enable depending on what classes of hardware > they are targetting, like > > struct timespec64 current_time(struct inode *inode) > { > struct timespec64 now; > u64 gran = inode->i_sb->s_time_gran; > > if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) && > gran <= NSEC_PER_JIFFY) > ktime_get_real_ts64(); > else > ktime_get_coarse_real_ts64(); > > return timespec64_trunc(now, gran); > } > > With that implementation, we could still let file systems choose > to get coarse timestamps by tuning the granularity in the > superblock s_time_gran, which would result in nice round > tv_nsec values that represent the actual accuracy. I've done some simple tests and found that on a variety of x86, arm32 and arm64 CPUs, it takes between 70 and 100 CPU cycles to read the TSC and add it to the coarse clock, e.g. on a 3.1GHz Ryzen, using the little test program below: vdso hires: 37.18ns vdso coarse:6.44ns sysc hires: 161.62ns sysc coarse: 133.87ns On the same machine, it takes around 400ns (1240 cycles) to write one byte into a tmpfs file with pwrite(). Adding 5% to 10% overhead for accurate timestamps would definitely be noticed, so I guess we wouldn't enable that unconditionally, but could do it as an opt-in mount option if someone had a use case. Arnd --- /* measure times for high-resolution clocksource access from userspace */ #include #include #include #include #include static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso) { if (vdso) return clock_gettime(clkid, tp); return syscall(__NR_clock_gettime, clkid, tp); } static int loop1sec(int clkid, bool vdso) { int i; struct timespec t, start; do_clock_gettime(clkid, , vdso); i = 0; do { do_clock_gettime(clkid, , vdso); i++; } while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec); return i; } int main(void) { printf("vdso hires: %7.2fns\n", 10.0 / loop1sec(CLOCK_REALTIME, true)); printf("vdso coarse:%7.2fns\n", 10.0 / loop1sec(CLOCK_REALTIME_COARSE, true)); printf("sysc hires: %7.2fns\n", 10.0 / loop1sec(CLOCK_REALTIME, false)); printf("sysc coarse:%7.2fns\n", 10.0 / loop1sec(CLOCK_REALTIME_COARSE, false)); return 0; } ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH v2 2/2] rusage: allow 64-bit times ru_utime/ru_stime
On Fri, Jun 22, 2018 at 7:45 PM, Eric W. Biederman wrote: > Ingo Molnar writes: > > So I suspect the simplest thing to do would be to set a flag in the > idtype member of waitid that says give me rusage64 and then we would > be done. It would have to be a flag in both the 'idtype' field of waitid(), and also 'who' field of getrusage(), which unfortunately uses a separate set of flags. Not hard to do, but still a bit more complexity. > Alternately we could use the low bits of the resource usage > pointer. Assuming we don't want to introduce another syscall that is. > I really don't see much incremental extensibility potential in the wait > or rusage interface right now. This is also my conclusion after looking at how various other operating systems implement getrusage() and wait4() today. It seems that this is one of the most stable APIs, everyone uses exactly the same structure layout (Linux/x32 being one exception, they have the 64-bit Linux compatible layout using __s64 instead of long members). For the other ~20 system calls we introduce for y2038, the general idea has been to stay mostly compatible with the source level interface, just using a new syscall number. statx() is a notable exception here, with clock_adjtime() and getitimer()/setitimer() still being undecided. If we don't do an extensible layout or any other new fields, there are still the open questions about whether any types should change: - changing everything to 64-bit would allow sharing the kernel code between compat and native - changing only __old_kernel_timeval to new 64-bit timeval would be the simplest user space change (only the syscall number changes with sizeof(time_t)), avoiding an extra copy thorough the user space stack. - changing timeval to (64-bit) timespec would seem the most sensible update, since it avoids the silly nanosecond-to- microsecond conversion in the kernel (glibc would still need to do it for compatibility). This is what I'm considering for getitimer/setitimer, too. Arnd ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH] scsi: lpfc: use monotonic timestamps for statistics
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038
Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ___ Y2038 mailing list Y2038@lists.linaro.org https://lists.linaro.org/mailman/listinfo/y2038