Re: [Y2038] [PATCH] vfs: replace current_kernel_time64 with ktime equivalent

2018-06-25 Thread Dave Chinner
On Fri, Jun 22, 2018 at 03:24:48PM +0200, Arnd Bergmann wrote:
> On Thu, Jun 21, 2018 at 10:23 PM, Dave Chinner  wrote:
> > On Wed, Jun 20, 2018 at 05:01:24PM +0200, Arnd Bergmann wrote:
> 
> >> diff --git a/fs/inode.c b/fs/inode.c
> >> index 2c300e981796..e27bd9334939 100644
> >> --- a/fs/inode.c
> >> +++ b/fs/inode.c
> >> @@ -2133,7 +2133,9 @@ EXPORT_SYMBOL(timespec64_trunc);
> >>   */
> >>  struct timespec64 current_time(struct inode *inode)
> >>  {
> >> - struct timespec64 now = current_kernel_time64();
> >> + struct timespec64 now;
> >> +
> >> + ktime_get_coarse_real_ts64();
> >
> > Can I just say as a filesystem dev who has no idea at all about
> > kernel timer implementations: this is an awful API change.  There
> > are hundreds of callers of current_time(), so I'm not going to be
> > the only person looking at this function who has no clue about WTF
> > "ktime_get_coarse_real" actually means or does. Further, this
> > function is not documented, and jumps straight into internal time
> > implementation stuff, so I'm lost straight away if somebody asks me
> > "what does that function do"?. i.e. I have *no clue* what this
> > function returns or why this code uses it.
> 
> You definitely have a point about the documentation. I meant to
> fix that as part of the recent rework of the timekeeping.h header
> but haven't finished it, partly because the header is still being
> changed as we get rid of the older interfaces.

The interface documentation should be introduced with the new
interfaces, not left for later as that leaves people like me with no
fucking clue about what the changes actually mean or why they are
being done.  Perhaps you'd have done better to explain this API as
an internal implementation of clock_gettime(CLOCK_REALTIME_COARSE),
which is clearly documented in the man page as:

"Use when you need very fast, but not fine-grained timestamps."

Put that comment on ktime_get_coarse_real_ts64(), and almost all the
questions about "WTF does this whacky function do?" go away

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently

2018-06-25 Thread James Smart

On 6/18/2018 8:29 AM, Arnd Bergmann wrote:

The get_seconds() helper returns an 'unsigned long' value, which can
overflow on 32-bit architectures. Since the interface we pass it into
already uses a 64-bit type, we can just use ktime_get_real_seconds()
instead.

While we generally prefer local timestamps in CLOCK_MONOTONIC format
(ktime_get_seconds), this keeps using the CLOCK_REALTIME version
in order to maintain compatibility with existing code.

Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/scsi_transport_fc.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 1da3d71e9f61..bb6de88aa724 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -567,7 +567,7 @@ fc_host_post_event(struct Scsi_Host *shost, u32 
event_number,
  
  	INIT_SCSI_NL_HDR(>snlh, SCSI_NL_TRANSPORT_FC,

FC_NL_ASYNC_EVENT, len);
-   event->seconds = get_seconds();
+   event->seconds = ktime_get_real_seconds();
event->vendor_id = 0;
event->host_no = shost->host_no;
event->event_datalen = sizeof(u32);  /* bytes */
@@ -635,7 +635,7 @@ fc_host_post_vendor_event(struct Scsi_Host *shost, u32 
event_number,
  
  	INIT_SCSI_NL_HDR(>snlh, SCSI_NL_TRANSPORT_FC,

FC_NL_ASYNC_EVENT, len);
-   event->seconds = get_seconds();
+   event->seconds = ktime_get_real_seconds();
event->vendor_id = vendor_id;
event->host_no = shost->host_no;
event->event_datalen = data_len; /* bytes */



looks good

Reviewed-by:  James Smart  

___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently

2018-06-25 Thread Tyrel Datwyler
On 06/18/2018 08:29 AM, Arnd Bergmann wrote:
> The get_seconds() helper returns an 'unsigned long' value, which can
> overflow on 32-bit architectures. Since the interface we pass it into
> already uses a 64-bit type, we can just use ktime_get_real_seconds()
> instead.
> 
> While we generally prefer local timestamps in CLOCK_MONOTONIC format
> (ktime_get_seconds), this keeps using the CLOCK_REALTIME version
> in order to maintain compatibility with existing code.
> 
> Signed-off-by: Arnd Bergmann 
> ---

Reviewed-by: Tyrel Datwyler 

___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH v2 2/2] rusage: allow 64-bit times ru_utime/ru_stime

2018-06-25 Thread Eric W. Biederman
Ingo Molnar  writes:

> * Eric W. Biederman  wrote:
>
>> Ingo Molnar  writes:
>> 
>> > * Eric W. Biederman  wrote:
>> >
>> >> The trouble with attributes is that means you can't filter your system
>> >> call arguments with seccomp. [...]
>> >
>> > There's nothing keeping seccomp from securely fetching those arguments and 
>> > extending filtering to them as well ...
>> >
>> > Allowing that would make sense for a lot of other system calls as
>> > well.
>> 
>> Possibly.  The challenge is that if the fetch for the kernel to use
>> those arguments is different from the fetch of seccomp to test those
>> arguments you have a time of test vs time of use race.
>
> Those fetched values should obviously then be used to call permitted
> system calls.

Agreed.  To my knowledge no one has figured out how to make that work
yet.  For the most part it has been unnecessary.

>> Given the location of the seccomp hook at the kernel user space border
>> there is no easy way for seccomp to share the fetch with the system
>> call itself.
>> 
>> So I don't see how seccomp could perform the fetch securely.
>
> Looks like more of a seccomp mis-design/mis-implementation than some 
> fundamental 
> problem.

Frankly.  Given that there are some very good solutions in other
operating systems, I think the misdesign is in unix/linux not providing
a good answer to what to do when you need more than 6 arguments to a
system call.

> Mis-designed security features should not hinder system call design.

I certainly agree that seccomp should not be the sole reason for not
doing something.  However there are lots of reasons to avoid
extensibility in general.

Excess extensibility has been the cause of more than one security issue.

Lots of flexibility comes at the price of lots of conditional execution
which tends to explode the test matrix of possibilities to test, with
the result that some combinations are never thought about or tested
because they don't make sense to combine.  Then someone with
mischievious intent see that combination and thinks what happens when I
do this.

Further that conditional execution can frequently be the cause of slow
code as well.

So while there are many nice features of tagged values.  I don't think
they are a general solution.  The lack of seccomp support (today) is
just one downside among many.


I do think it would be nice to have a general pattern for those
system calls that require extensibility.  My gut feel says something
like the L4 pseudo registers (to give a maxium request size)
combined with something like netlink encoding would make a very
nice template for making fast and flexible system calls.

Eric
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH] vfs: replace current_kernel_time64 with ktime equivalent

2018-06-25 Thread Arnd Bergmann
On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann  wrote:
> On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen  wrote:
>> Arnd Bergmann  writes:
>>>
>>> To clarify: current_kernel_time() uses at most millisecond resolution rather
>>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the
>>> timer tick.
>>
>> Ah you're right. I remember now: the motivation was to make sure there
>> is basically no overhead. In some setups the full gtod can be rather
>> slow, particularly if it falls back to some crappy timer.
>
> This means, we're probably fine with a compile-time option that
> distros can choose to enable depending on what classes of hardware
> they are targetting, like
>
> struct timespec64 current_time(struct inode *inode)
> {
> struct timespec64 now;
> u64 gran = inode->i_sb->s_time_gran;
>
> if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) &&
> gran <= NSEC_PER_JIFFY)
>   ktime_get_real_ts64();
> else
>   ktime_get_coarse_real_ts64();
>
> return timespec64_trunc(now, gran);
> }
>
> With that implementation, we could still let file systems choose
> to get coarse timestamps by tuning the granularity in the
> superblock s_time_gran, which would result in nice round
> tv_nsec values that represent the actual accuracy.

I've done some simple tests and found that on a variety of
x86, arm32 and arm64 CPUs, it takes between 70 and 100
CPU cycles to read the TSC and add it to the coarse
clock, e.g. on a 3.1GHz Ryzen, using the little test program
below:

vdso hires:   37.18ns
vdso coarse:6.44ns
sysc hires: 161.62ns
sysc coarse: 133.87ns

On the same machine, it takes around 400ns (1240 cycles)
to write one byte into a tmpfs file with pwrite(). Adding 5% to
10% overhead for accurate timestamps would definitely be
noticed, so I guess we wouldn't enable that unconditionally,
but could do it as an opt-in mount option if someone had a
use case.

   Arnd

---
/* measure times for high-resolution clocksource access from userspace */
#include 
#include 
#include 
#include 
#include 

static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso)
{
if (vdso)
return clock_gettime(clkid, tp);

return syscall(__NR_clock_gettime, clkid, tp);
}

static int loop1sec(int clkid, bool vdso)
{
int i;
struct timespec t, start;

do_clock_gettime(clkid, , vdso);
i = 0;
do {
do_clock_gettime(clkid, , vdso);
i++;
} while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec);

return i;
}

int main(void)
{
printf("vdso hires: %7.2fns\n", 10.0 /
loop1sec(CLOCK_REALTIME, true));
printf("vdso coarse:%7.2fns\n", 10.0 /
loop1sec(CLOCK_REALTIME_COARSE, true));
printf("sysc hires: %7.2fns\n", 10.0 /
loop1sec(CLOCK_REALTIME, false));
printf("sysc coarse:%7.2fns\n", 10.0 /
loop1sec(CLOCK_REALTIME_COARSE, false));

return 0;
}
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH v2 2/2] rusage: allow 64-bit times ru_utime/ru_stime

2018-06-25 Thread Arnd Bergmann
On Fri, Jun 22, 2018 at 7:45 PM, Eric W. Biederman
 wrote:
> Ingo Molnar  writes:
>
> So I suspect the simplest thing to do would be to set a flag in the
> idtype member of waitid that says give me rusage64 and then we would
> be done.

It would have to be a flag in both the 'idtype' field of waitid(), and also
'who' field of getrusage(), which unfortunately uses a separate set of
flags. Not hard to do, but still a bit more complexity.

> Alternately we could use the low bits of the resource usage
> pointer.  Assuming we don't want to introduce another syscall that is.
> I really don't see much incremental extensibility potential in the wait
> or rusage interface right now.

This is also my conclusion after looking at how various other
operating systems implement getrusage() and wait4() today.
It seems that this is one of the most stable APIs, everyone uses
exactly the same structure layout (Linux/x32 being one exception,
they have the 64-bit Linux compatible layout using __s64 instead of
long members).

For the other ~20 system calls we introduce for y2038, the general
idea has been to stay mostly compatible with the source level interface,
just using a new syscall number. statx() is a notable exception here,
with clock_adjtime() and getitimer()/setitimer() still being undecided.

If we don't do an extensible layout or any other new fields, there
are still the open questions about whether any types should change:

- changing everything to 64-bit would allow sharing the kernel
  code between compat and native
- changing only __old_kernel_timeval to new 64-bit timeval would
  be the simplest user space change (only the syscall number
  changes with sizeof(time_t)), avoiding an extra copy thorough
  the user space stack.
- changing timeval to (64-bit) timespec would seem the most
  sensible update, since it avoids the silly nanosecond-to-
  microsecond conversion in the kernel (glibc would still need
  to do it for compatibility). This is what I'm considering for
  getitimer/setitimer,  too.

 Arnd
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH] scsi: lpfc: use monotonic timestamps for statistics

2018-06-25 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038


Re: [Y2038] [PATCH] scsi_transport_fc: use 64-bit timestamps consistently

2018-06-25 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
___
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038