Re: [uml-devel] [PATCH v6] um: Add a high resolution timer subsystem

Anton Ivanov Mon, 11 May 2015 08:07:26 -0700

Hi Thomas, hi Richard,

It is now possible to reproducibly hang it. I have not been able to 
concoct a synthetic test (yet), but a non-synthetic one, namely 
installing an update to base-files on Debian is a guaranteed hang. So IO 
on itself does not hang it, CPU on itself does not, a mix of two does.


It hangs in userspace, spinning at 100% CPU on that thread. If you whack 
the offending thread with -11 from the host, UML recovers, killing the 
affected process. I cannot look at this in detail for a few days though 
- the earliest I can pick it up is on Sat (in my free time).

On the positive side - the behavior we are getting now is better, so we 
just need to figure out the root cause for the hang(s) and stabilize it.

A.


On 11/05/15 13:52, Anton Ivanov wrote:
> Hurray, Houston we have ignition.
>
> We now have working userspace timers.
>
> It is still schizophrenic - userspace is HZ, kernel is NOHZ because the
> userpace has to keep checking "did the kernel timer fire yet" at a HZ
> interval. However, even that is a major progress compared to having
> userspace timer behavior determined by the phase of the moon, the
> position of a black goat relative to a silver knife, etc. It is now
> "spot on" - you set HZ=100 in the .config, you get 100. Before you used
> to get something... like 39-45 depending on the weather.
>
> The userspace is now significantly more responsive and snappy (that is
> expected as it now gets decent clock). Kernel behavior on timers in
> first instance also looks correct and NOHZ-ish (traffic shapers work).
>
> I am going to hit it with the "torture" suite now to see if there is
> significant difference with relation to other known bugs like the ext4
> writeout (my original patch versions seemed to aggravate it).
>
> I will try to get around to restore my virtual desktop setup over X to
> see what difference does it make. Judging by the way userspace behaves
> after the changes it should be better than before.
>
> A.
>
>
> On 10/05/15 15:34, Thomas Meyer wrote:
>>> Am 10.05.2015 um 14:35 schrieb Richard Weinberger 
>>> <[email protected]>:
>>>
>>>> On Sun, May 10, 2015 at 1:14 AM, Thomas Meyer <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> Changes:
>>>> - also create posix timer in stub_clone_handler()
>>>> - incorporated antons remarks
>>> Hm, this patch does a *lot* more than the changelog says.
>> Hi, yes PATCH was probably the wrong keyword in the subject line. It should 
>> have been RFC.
>> I just wanted to have feedback of the current state of this patch/work.
>>
>> I'm currently working on cleaning up the patch and switch from SIGUSR2 to 
>> SIGNALRM, which seems to be the natural thing for posix timers.
>> I will send this next patch as something that should be includable into the 
>> kernel, i.e. With correct description and signed off line and so on.
>>
>> But feel free to have a look at v6 and give feedback.
>>
>> With kind regards
>> Thomas
>>
>>>> diff --git a/arch/um/Makefile b/arch/um/Makefile
>>>> index 17d4460..a4a434f 100644
>>>> --- a/arch/um/Makefile
>>>> +++ b/arch/um/Makefile
>>>> @@ -130,7 +130,7 @@ export LDS_ELF_FORMAT := $(ELF_FORMAT)
>>>> # The wrappers will select whether using "malloc" or the kernel allocator.
>>>> LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc
>>>>
>>>> -LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt))
>>>> +LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) -lrt
>>>>
>>>> # Used by link-vmlinux.sh which has special support for um link
>>>> export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
>>>> diff --git a/arch/um/include/asm/irq.h b/arch/um/include/asm/irq.h
>>>> index 4a2037f..0f2a5b1 100644
>>>> --- a/arch/um/include/asm/irq.h
>>>> +++ b/arch/um/include/asm/irq.h
>>>> @@ -16,8 +16,9 @@
>>>> #define TELNETD_IRQ            12
>>>> #define XTERM_IRQ              13
>>>> #define RANDOM_IRQ             14
>>>> +#define HRTIMER_IRQ            15
>>>>
>>>> -#define LAST_IRQ RANDOM_IRQ
>>>> +#define LAST_IRQ HRTIMER_IRQ
>>>> #define NR_IRQS (LAST_IRQ + 1)
>>>>
>>>> #endif
>>>> diff --git a/arch/um/include/shared/as-layout.h 
>>>> b/arch/um/include/shared/as-layout.h
>>>> index ca1843e..798aa6e 100644
>>>> --- a/arch/um/include/shared/as-layout.h
>>>> +++ b/arch/um/include/shared/as-layout.h
>>>> @@ -17,7 +17,7 @@
>>>>
>>>> /* Some constant macros are used in both assembler and
>>>>    * C code.  Therefore we cannot annotate them always with
>>>> - * 'UL' and other type specifiers unilaterally.  We
>>>> + * 'UL' and other type specifiers unilaterally. We
>>>>    * use the following macros to deal with this.
>>>>    */
>>>>
>>>> @@ -28,6 +28,13 @@
>>>> #define _UML_AC(X, Y)  __UML_AC(X, Y)
>>>> #endif
>>>>
>>>> +/**
>>>> + * userspace stub address space layout:
>>>> + * Below macros define the layout of the stub code and data
>>>> + * which are mapped in each userspace process:
>>>> + *  - one page of code located at 0x100000 followed by
>>>> + *  - one page of data
>>>> + */
>>>> #define STUB_START _UML_AC(, 0x100000)
>>>> #define STUB_CODE _UML_AC((unsigned long), STUB_START)
>>>> #define STUB_DATA _UML_AC((unsigned long), STUB_CODE + UM_KERN_PAGE_SIZE)
>>>> diff --git a/arch/um/include/shared/kern_util.h 
>>>> b/arch/um/include/shared/kern_util.h
>>>> index 83a91f9..0282b36 100644
>>>> --- a/arch/um/include/shared/kern_util.h
>>>> +++ b/arch/um/include/shared/kern_util.h
>>>> @@ -37,6 +37,7 @@ extern void initial_thread_cb(void (*proc)(void *), void 
>>>> *arg);
>>>> extern int is_syscall(unsigned long addr);
>>>>
>>>> extern void timer_handler(int sig, struct siginfo *unused_si, struct 
>>>> uml_pt_regs *regs);
>>>> +extern void hrtimer_handler(int sig, struct siginfo *unused_si, struct 
>>>> uml_pt_regs *regs);
>>>>
>>>> extern int start_uml(void);
>>>> extern void paging_init(void);
>>>> diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
>>>> index d824528..7f7368b 100644
>>>> --- a/arch/um/include/shared/os.h
>>>> +++ b/arch/um/include/shared/os.h
>>>> @@ -217,7 +217,8 @@ extern int set_umid(char *name);
>>>> extern char *get_umid(void);
>>>>
>>>> /* signal.c */
>>>> -extern void timer_init(void);
>>>> +extern void uml_timer_set_signal_handler(void);
>>>> +extern void uml_hrtimer_set_signal_handler(void);
>>>> extern void set_sigstack(void *sig_stack, int size);
>>>> extern void remove_sigstack(void);
>>>> extern void set_handler(int sig);
>>>> @@ -238,12 +239,16 @@ extern void um_early_printk(const char *s, unsigned 
>>>> int n);
>>>> extern void os_fix_helper_signals(void);
>>>>
>>>> /* time.c */
>>>> -extern void idle_sleep(unsigned long long nsecs);
>>>> -extern int set_interval(void);
>>>> -extern int timer_one_shot(int ticks);
>>>> -extern long long disable_timer(void);
>>>> +extern void os_idle_sleep(unsigned long long nsecs);
>>>> +extern int os_timer_create(void* timer);
>>>> +extern int os_timer_set_interval(void* timer, void* its);
>>>> +extern int os_timer_one_shot(int ticks);
>>>> +extern long long os_timer_disable(void);
>>>> +extern long os_timer_remain(void* timer);
>>>> extern void uml_idle_timer(void);
>>>> +extern long long os_persistent_clock_emulation(void);
>>>> extern long long os_nsecs(void);
>>>> +extern long long os_vnsecs(void);
>>>>
>>>> /* skas/mem.c */
>>>> extern long run_syscall_stub(struct mm_id * mm_idp,
>>>> diff --git a/arch/um/include/shared/skas/stub-data.h 
>>>> b/arch/um/include/shared/skas/stub-data.h
>>>> index f6ed92c..f98b9e2 100644
>>>> --- a/arch/um/include/shared/skas/stub-data.h
>>>> +++ b/arch/um/include/shared/skas/stub-data.h
>>>> @@ -6,12 +6,12 @@
>>>> #ifndef __STUB_DATA_H
>>>> #define __STUB_DATA_H
>>>>
>>>> -#include <sys/time.h>
>>>> +#include <time.h>
>>>>
>>>> struct stub_data {
>>>> -       long offset;
>>>> +       unsigned long offset;
>>>>          int fd;
>>>> -       struct itimerval timer;
>>>> +       struct itimerspec timer;
>>>>          long err;
>>>> };
>>>>
>>>> diff --git a/arch/um/include/shared/timer-internal.h 
>>>> b/arch/um/include/shared/timer-internal.h
>>>> new file mode 100644
>>>> index 0000000..afdc6dc
>>>> --- /dev/null
>>>> +++ b/arch/um/include/shared/timer-internal.h
>>>> @@ -0,0 +1,18 @@
>>>> +/*
>>>> + * Copyright (C) 2012 - 2014 Cisco Systems
>>>> + * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
>>>> + * Licensed under the GPL
>>>> + */
>>>> +
>>>> +#ifndef __TIMER_INTERNAL_H__
>>>> +#define __TIMER_INTERNAL_H__
>>>> +
>>>> +#define TIMER_MULTIPLIER 256
>>>> +#define TIMER_MIN_DELTA  500
>>>> +
>>>> +extern void timer_lock(void);
>>>> +extern void timer_unlock(void);
>>>> +
>>>> +extern long long hrtimer_disable(void);
>>>> +
>>>> +#endif
>>>> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
>>>> index 23cb935..4c1966a 100644
>>>> --- a/arch/um/kernel/irq.c
>>>> +++ b/arch/um/kernel/irq.c
>>>> @@ -338,20 +338,20 @@ static struct irq_chip normal_irq_type = {
>>>>          .irq_unmask = dummy,
>>>> };
>>>>
>>>> -static struct irq_chip SIGVTALRM_irq_type = {
>>>> -       .name = "SIGVTALRM",
>>>> -       .irq_disable = dummy,
>>>> -       .irq_enable = dummy,
>>>> -       .irq_ack = dummy,
>>>> -       .irq_mask = dummy,
>>>> -       .irq_unmask = dummy,
>>>> +static struct irq_chip SIGUSR2_irq_type = {
>>>> +       .name = "SIGUSR2",
>>>> +       .irq_disable = dummy,
>>>> +       .irq_enable = dummy,
>>>> +       .irq_ack = dummy,
>>>> +       .irq_mask = dummy,
>>>> +       .irq_unmask = dummy,
>>>> };
>>>>
>>>> void __init init_IRQ(void)
>>>> {
>>>>          int i;
>>>>
>>>> -       irq_set_chip_and_handler(TIMER_IRQ, &SIGVTALRM_irq_type, 
>>>> handle_edge_irq);
>>>> +       irq_set_chip_and_handler(HRTIMER_IRQ, &SIGUSR2_irq_type, 
>>>> handle_edge_irq);
>>>>
>>>>          for (i = 1; i < NR_IRQS; i++)
>>>>                  irq_set_chip_and_handler(i, &normal_irq_type, 
>>>> handle_edge_irq);
>>>> diff --git a/arch/um/kernel/physmem.c b/arch/um/kernel/physmem.c
>>>> index 9034fc8..5f6642d 100644
>>>> --- a/arch/um/kernel/physmem.c
>>>> +++ b/arch/um/kernel/physmem.c
>>>> @@ -119,14 +119,23 @@ void __init setup_physmem(unsigned long start, 
>>>> unsigned long reserve_end,
>>>>                       len - bootmap_size - reserve);
>>>> }
>>>>
>>>> +/**
>>>> + * phys_mapping() - maps a physical address to an offset address
>>>> + * phys:    the physical address
>>>> + * offset_out:  the offset in the memory map area
>>>> + *
>>>> + * Returns an file descriptor, or -1 when unknown physical address
>>>> + */
>>>> int phys_mapping(unsigned long phys, unsigned long long *offset_out)
>>>> {
>>>>          int fd = -1;
>>>>
>>>> +       /* first check normal memory */
>>>>          if (phys < physmem_size) {
>>>>                  fd = physmem_fd;
>>>>                  *offset_out = phys;
>>>>          }
>>>> +       /* than check io memory */
>>>>          else if (phys < __pa(end_iomem)) {
>>>>                  struct iomem_region *region = iomem_regions;
>>>>
>>>> @@ -140,6 +149,7 @@ int phys_mapping(unsigned long phys, unsigned long 
>>>> long *offset_out)
>>>>                          region = region->next;
>>>>                  }
>>>>          }
>>>> +       /* last check highmem */
>>>>          else if (phys < __pa(end_iomem) + highmem) {
>>>>                  fd = physmem_fd;
>>>>                  *offset_out = phys - iomem_size;
>>>> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
>>>> index 68b9119..b8a8d10 100644
>>>> --- a/arch/um/kernel/process.c
>>>> +++ b/arch/um/kernel/process.c
>>>> @@ -27,6 +27,7 @@
>>>> #include <kern_util.h>
>>>> #include <os.h>
>>>> #include <skas.h>
>>>> +#include <timer-internal.h>
>>>>
>>>> /*
>>>>    * This is a per-cpu array.  A processor only modifies its entry and it 
>>>> only
>>>> @@ -201,12 +202,8 @@ void initial_thread_cb(void (*proc)(void *), void 
>>>> *arg)
>>>>
>>>> void arch_cpu_idle(void)
>>>> {
>>>> -       unsigned long long nsecs;
>>>> -
>>>>          cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
>>>> -       nsecs = disable_timer();
>>>> -       idle_sleep(nsecs);
>>>> -       local_irq_enable();
>>>> +       os_idle_sleep(UM_NSEC_PER_SEC / UM_HZ);
>>>> }
>>>>
>>>> int __cant_sleep(void) {
>>>> diff --git a/arch/um/kernel/skas/clone.c b/arch/um/kernel/skas/clone.c
>>>> index 289771d..5f283b1 100644
>>>> --- a/arch/um/kernel/skas/clone.c
>>>> +++ b/arch/um/kernel/skas/clone.c
>>>> @@ -20,37 +20,63 @@
>>>>    * on some systems.
>>>>    */
>>>>
>>>> +/**
>>>> + * stub_clone_handler() - userspace clone handler stub
>>>> + *
>>>> + * this stub clone hanlder is mmaped(?)/available in all userspace
>>>> + * processes. It's used to copy an mm context from an fork syscall in the
>>>> + * traced userspace process
>>>> + */
>>>> void __attribute__ ((__section__ (".__syscall_stub")))
>>>> stub_clone_handler(void)
>>>> {
>>>>          struct stub_data *data = (struct stub_data *) STUB_DATA;
>>>> +       struct sigevent sev;
>>>> +       timer_t timerid;
>>>>          long err;
>>>>
>>>> +       /* clone "from" process */
>>>>          err = stub_syscall2(__NR_clone, CLONE_PARENT | CLONE_FILES | 
>>>> SIGCHLD,
>>>>                              STUB_DATA + UM_KERN_PAGE_SIZE / 2 - 
>>>> sizeof(void *));
>>>> -       if (err != 0)
>>>> +       /* Parent: exit here, child, continue */
>>>> +       if (err != 0) {
>>>>                  goto out;
>>>> +       }
>>>>
>>>> +       /* set child to ptrace */
>>>>          err = stub_syscall4(__NR_ptrace, PTRACE_TRACEME, 0, 0, 0);
>>>>          if (err)
>>>>                  goto out;
>>>>
>>>> -       err = stub_syscall3(__NR_setitimer, ITIMER_VIRTUAL,
>>>> -                           (long) &data->timer, 0);
>>>> +       /* create a new posix interval timer */
>>>> +       sev.sigev_notify = SIGEV_SIGNAL;
>>>> +       sev.sigev_signo = SIGUSR2;
>>>> +       sev.sigev_value.sival_ptr = NULL;
>>>> +
>>>> +       err = stub_syscall3(__NR_timer_create, CLOCK_MONOTONIC,
>>>> +                               (long) &sev, (long) &timerid);
>>>>          if (err)
>>>>                  goto out;
>>>>
>>>> +       /* set interval to the given value from copy_context_skas0() */
>>>> +       err = stub_syscall4(__NR_timer_settime, (long) timerid, 0l,
>>>> +                                               (long) &data->timer, 0l);
>>>> +       if (err)
>>>> +               goto out;
>>>> +
>>>> +       /* switch to new stack */
>>>>          remap_stack(data->fd, data->offset);
>>>>          goto done;
>>>>
>>>>    out:
>>>>          /*
>>>> -        * save current result.
>>>> -        * Parent: pid;
>>>> -        * child: retcode of mmap already saved and it jumps around this
>>>> -        * assignment
>>>> +        * Save current result.
>>>> +        * - Parent: pid from clone() call
>>>> +        * - Child:  "retcode of mmap already saved and it jumps around 
>>>> this
>>>> +        *            assignment"???
>>>>           */
>>>>          data->err = err;
>>>> +
>>>>    done:
>>>>          trap_myself();
>>>> }
>>>> diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c
>>>> index 94abdcc..df9c9ab 100644
>>>> --- a/arch/um/kernel/skas/mmu.c
>>>> +++ b/arch/um/kernel/skas/mmu.c
>>>> @@ -47,6 +47,13 @@ static int init_stub_pte(struct mm_struct *mm, unsigned 
>>>> long proc,
>>>>          return -ENOMEM;
>>>> }
>>>>
>>>> +/**
>>>> + * init_new_context() - creates or copies an mm context
>>>> + * @task:      the belonging task
>>>> + * @mm:                the mm struct to be setup/allocated
>>>> + *
>>>> + * called by mm_init() (kernel/fork.c)
>>>> + */
>>>> int init_new_context(struct task_struct *task, struct mm_struct *mm)
>>>> {
>>>>          struct mm_context *from_mm = NULL;
>>>> @@ -59,13 +66,15 @@ int init_new_context(struct task_struct *task, struct 
>>>> mm_struct *mm)
>>>>                  goto out;
>>>>
>>>>          to_mm->id.stack = stack;
>>>> -       if (current->mm != NULL && current->mm != &init_mm)
>>>> +       if (current->mm != NULL && current->mm != &init_mm) {
>>>>                  from_mm = &current->mm->context;
>>>> +       }
>>>>
>>>> -       if (from_mm)
>>>> -               to_mm->id.u.pid = copy_context_skas0(stack,
>>>> -                                                    from_mm->id.u.pid);
>>>> -       else to_mm->id.u.pid = start_userspace(stack);
>>>> +       if (from_mm) {
>>>> +               to_mm->id.u.pid = copy_context_skas0(stack, 
>>>> from_mm->id.u.pid);
>>>> +       } else {
>>>> +               to_mm->id.u.pid = start_userspace(stack);
>>>> +       }
>>>>
>>>>          if (to_mm->id.u.pid < 0) {
>>>>                  ret = to_mm->id.u.pid;
>>>> diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c
>>>> index 527fa58..2b0c35a 100644
>>>> --- a/arch/um/kernel/skas/process.c
>>>> +++ b/arch/um/kernel/skas/process.c
>>>> @@ -43,6 +43,9 @@ int __init start_uml(void)
>>>>                                   &init_task.thread.switch_buf);
>>>> }
>>>>
>>>> +/**
>>>> + * current_stub_stack() - returns the address of the current mm stack
>>>> + */
>>>> unsigned long current_stub_stack(void)
>>>> {
>>>>          if (current->mm == NULL)
>>>> diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
>>>> index 117568d..ed64037 100644
>>>> --- a/arch/um/kernel/time.c
>>>> +++ b/arch/um/kernel/time.c
>>>> @@ -1,4 +1,5 @@
>>>> /*
>>>> + * Copyright (C) 2012-2014 Cisco Systems
>>>>    * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
>>>>    * Licensed under the GPL
>>>>    */
>>>> @@ -8,32 +9,36 @@
>>>> #include <linux/interrupt.h>
>>>> #include <linux/jiffies.h>
>>>> #include <linux/threads.h>
>>>> +#include <linux/spinlock.h>
>>>> #include <asm/irq.h>
>>>> #include <asm/param.h>
>>>> #include <kern_util.h>
>>>> #include <os.h>
>>>> +#include <timer-internal.h>
>>>>
>>>> -void timer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs 
>>>> *regs)
>>>> +void hrtimer_handler(int sig, struct siginfo *unused_si, struct 
>>>> uml_pt_regs *regs)
>>>> {
>>>>          unsigned long flags;
>>>>
>>>>          local_irq_save(flags);
>>>> -       do_IRQ(TIMER_IRQ, regs);
>>>> +       do_IRQ(HRTIMER_IRQ, regs);
>>>>          local_irq_restore(flags);
>>>> }
>>>>
>>>> -static void itimer_set_mode(enum clock_event_mode mode,
>>>> +static void timer_set_mode(enum clock_event_mode mode,
>>>>                              struct clock_event_device *evt)
>>>> {
>>>>          switch (mode) {
>>>>          case CLOCK_EVT_MODE_PERIODIC:
>>>> -               set_interval();
>>>> +               os_timer_set_interval(NULL, NULL);
>>>>                  break;
>>>>
>>>> +       case CLOCK_EVT_MODE_ONESHOT:
>>>> +               os_timer_one_shot(1);
>>>> +
>>>>          case CLOCK_EVT_MODE_SHUTDOWN:
>>>>          case CLOCK_EVT_MODE_UNUSED:
>>>> -       case CLOCK_EVT_MODE_ONESHOT:
>>>> -               disable_timer();
>>>> +               os_timer_disable();
>>>>                  break;
>>>>
>>>>          case CLOCK_EVT_MODE_RESUME:
>>>> @@ -41,68 +46,74 @@ static void itimer_set_mode(enum clock_event_mode mode,
>>>>          }
>>>> }
>>>>
>>>> -static int itimer_next_event(unsigned long delta,
>>>> +static int timer_next_event(unsigned long delta,
>>>>                               struct clock_event_device *evt)
>>>> {
>>>> -       return timer_one_shot(delta + 1);
>>>> +       return os_timer_one_shot(delta);
>>>> }
>>>>
>>>> -static struct clock_event_device itimer_clockevent = {
>>>> -       .name           = "itimer",
>>>> +static struct clock_event_device timer_clockevent = {
>>>> +       .name           = "timer",
>>>>          .rating         = 250,
>>>>          .cpumask        = cpu_all_mask,
>>>>          .features       = CLOCK_EVT_FEAT_PERIODIC | 
>>>> CLOCK_EVT_FEAT_ONESHOT,
>>>> -       .set_mode       = itimer_set_mode,
>>>> -       .set_next_event = itimer_next_event,
>>>> -       .shift          = 32,
>>>> +       .set_mode       = timer_set_mode,
>>>> +       .set_next_event = timer_next_event,
>>>> +       .shift          = 0,
>>>> +       .max_delta_ns   = 0xffffffff,
>>>> +       .min_delta_ns   = TIMER_MIN_DELTA, //microsecond resolution should 
>>>> be enough for anyone, same as 640K RAM
>>>>          .irq            = 0,
>>>> +       .mult           = 1,
>>>> };
>>>>
>>>> -static irqreturn_t um_timer(int irq, void *dev)
>>>> +static irqreturn_t um_timer_irq(int irq, void *dev)
>>>> {
>>>> -       (*itimer_clockevent.event_handler)(&itimer_clockevent);
>>>> +       (*timer_clockevent.event_handler)(&timer_clockevent);
>>>>
>>>>          return IRQ_HANDLED;
>>>> }
>>>>
>>>> -static cycle_t itimer_read(struct clocksource *cs)
>>>> +static cycle_t timer_read(struct clocksource *cs)
>>>> {
>>>> -       return os_nsecs() / 1000;
>>>> +       return os_nsecs() / TIMER_MULTIPLIER;
>>>> }
>>>>
>>>> -static struct clocksource itimer_clocksource = {
>>>> -       .name           = "itimer",
>>>> +static struct clocksource timer_clocksource = {
>>>> +       .name           = "timer",
>>>>          .rating         = 300,
>>>> -       .read           = itimer_read,
>>>> +       .read           = timer_read,
>>>>          .mask           = CLOCKSOURCE_MASK(64),
>>>>          .flags          = CLOCK_SOURCE_IS_CONTINUOUS,
>>>> };
>>>>
>>>> -static void __init setup_itimer(void)
>>>> +static void __init timer_setup(void)
>>>> {
>>>>          int err;
>>>>
>>>> -       err = request_irq(TIMER_IRQ, um_timer, 0, "timer", NULL);
>>>> -       if (err != 0)
>>>> +       err = request_irq(HRTIMER_IRQ, um_timer_irq, IRQF_TIMER, "hr 
>>>> timer", NULL);
>>>> +       if (err != 0) {
>>>>                  printk(KERN_ERR "register_timer : request_irq failed - "
>>>>                         "errno = %d\n", -err);
>>>> +               return;
>>>> +    }
>>>> +
>>>> +    err = os_timer_create(NULL);
>>>> +    if (err != 0) {
>>>> +        printk(KERN_ERR "creation of timer failed - errno = %d\n", -err);
>>>> +        return;
>>>> +    }
>>>>
>>>> -       itimer_clockevent.mult = div_sc(HZ, NSEC_PER_SEC, 32);
>>>> -       itimer_clockevent.max_delta_ns =
>>>> -               clockevent_delta2ns(60 * HZ, &itimer_clockevent);
>>>> -       itimer_clockevent.min_delta_ns =
>>>> -               clockevent_delta2ns(1, &itimer_clockevent);
>>>> -       err = clocksource_register_hz(&itimer_clocksource, USEC_PER_SEC);
>>>> +       err = clocksource_register_hz(&timer_clocksource, 
>>>> NSEC_PER_SEC/TIMER_MULTIPLIER);
>>>>          if (err) {
>>>>                  printk(KERN_ERR "clocksource_register_hz returned %d\n", 
>>>> err);
>>>>                  return;
>>>>          }
>>>> -       clockevents_register_device(&itimer_clockevent);
>>>> +       clockevents_register_device(&timer_clockevent);
>>>> }
>>>>
>>>> void read_persistent_clock(struct timespec *ts)
>>>> {
>>>> -       long long nsecs = os_nsecs();
>>>> +       long long nsecs = os_persistent_clock_emulation();
>>>>
>>>>          set_normalized_timespec(ts, nsecs / NSEC_PER_SEC,
>>>>                                  nsecs % NSEC_PER_SEC);
>>>> @@ -110,6 +121,6 @@ void read_persistent_clock(struct timespec *ts)
>>>>
>>>> void __init time_init(void)
>>>> {
>>>> -       timer_init();
>>>> -       late_time_init = setup_itimer;
>>>> +       uml_hrtimer_set_signal_handler();
>>>> +       late_time_init = timer_setup;
>>>> }
>>>> diff --git a/arch/um/os-Linux/internal.h b/arch/um/os-Linux/internal.h
>>>> deleted file mode 100644
>>>> index 0dc2c9f..0000000
>>>> --- a/arch/um/os-Linux/internal.h
>>>> +++ /dev/null
>>>> @@ -1 +0,0 @@
>>>> -void alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc);
>>>> diff --git a/arch/um/os-Linux/main.c b/arch/um/os-Linux/main.c
>>>> index df9191a..bd5907e 100644
>>>> --- a/arch/um/os-Linux/main.c
>>>> +++ b/arch/um/os-Linux/main.c
>>>> @@ -168,8 +168,8 @@ int __init main(int argc, char **argv, char **envp)
>>>>           * some time) and cause a segfault.
>>>>           */
>>>>
>>>> -       /* stop timers and set SIGVTALRM to be ignored */
>>>> -       disable_timer();
>>>> +       /* stop timers and set timer signal to be ignored */
>>>> +       os_timer_disable();
>>>>
>>>>          /* disable SIGIO for the fds and set SIGIO to be ignored */
>>>>          err = deactivate_all_fds();
>>>> diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
>>>> index 7b605e4..ee6db2e 100644
>>>> --- a/arch/um/os-Linux/signal.c
>>>> +++ b/arch/um/os-Linux/signal.c
>>>> @@ -13,7 +13,6 @@
>>>> #include <kern_util.h>
>>>> #include <os.h>
>>>> #include <sysdep/mcontext.h>
>>>> -#include "internal.h"
>>>>
>>>> void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
>>>>          [SIGTRAP]       = relay_signal,
>>>> @@ -23,7 +22,8 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct 
>>>> uml_pt_regs *) = {
>>>>          [SIGBUS]        = bus_handler,
>>>>          [SIGSEGV]       = segv_handler,
>>>>          [SIGIO]         = sigio_handler,
>>>> -       [SIGVTALRM]     = timer_handler };
>>>> +       [SIGUSR2]       = hrtimer_handler
>>>> +};
>>>>
>>>> static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
>>>> {
>>>> @@ -38,7 +38,7 @@ static void sig_handler_common(int sig, struct siginfo 
>>>> *si, mcontext_t *mc)
>>>>          }
>>>>
>>>>          /* enable signals if sig isn't IRQ signal */
>>>> -       if ((sig != SIGIO) && (sig != SIGWINCH) && (sig != SIGVTALRM))
>>>> +       if ((sig != SIGIO) && (sig != SIGWINCH) && (sig != SIGVTALRM) && 
>>>> (sig != SIGUSR2))
>>>>                  unblock_signals();
>>>>
>>>>          (*sig_info[sig])(sig, si, &r);
>>>> @@ -55,8 +55,8 @@ static void sig_handler_common(int sig, struct siginfo 
>>>> *si, mcontext_t *mc)
>>>> #define SIGIO_BIT 0
>>>> #define SIGIO_MASK (1 << SIGIO_BIT)
>>>>
>>>> -#define SIGVTALRM_BIT 1
>>>> -#define SIGVTALRM_MASK (1 << SIGVTALRM_BIT)
>>>> +#define SIGUSR2_BIT 2
>>>> +#define SIGUSR2_MASK (1 << SIGUSR2_BIT)
>>>>
>>>> static int signals_enabled;
>>>> static unsigned int signals_pending;
>>>> @@ -78,46 +78,47 @@ void sig_handler(int sig, struct siginfo *si, 
>>>> mcontext_t *mc)
>>>>          set_signals(enabled);
>>>> }
>>>>
>>>> -static void real_alarm_handler(mcontext_t *mc)
>>>> +static void real_hralarm_handler(mcontext_t *mc)
>>>> {
>>>>          struct uml_pt_regs regs;
>>>>
>>>>          if (mc != NULL)
>>>>                  get_regs_from_mc(&regs, mc);
>>>>          regs.is_user = 0;
>>>> -       unblock_signals();
>>>> -       timer_handler(SIGVTALRM, NULL, &regs);
>>>> +       hrtimer_handler(SIGUSR2, NULL, &regs);
>>>> }
>>>>
>>>> -void alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
>>>> +void hralarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
>>>> {
>>>>          int enabled;
>>>>
>>>>          enabled = signals_enabled;
>>>>          if (!signals_enabled) {
>>>> -               signals_pending |= SIGVTALRM_MASK;
>>>> +               signals_pending |= SIGUSR2_MASK;
>>>>                  return;
>>>>          }
>>>>
>>>>          block_signals();
>>>> -
>>>> -       real_alarm_handler(mc);
>>>> +       real_hralarm_handler(mc);
>>>>          set_signals(enabled);
>>>> }
>>>>
>>>> -void timer_init(void)
>>>> +void uml_hrtimer_set_signal_handler(void)
>>>> {
>>>> -       set_handler(SIGVTALRM);
>>>> +       set_handler(SIGUSR2);
>>>> }
>>>>
>>>> void set_sigstack(void *sig_stack, int size)
>>>> {
>>>> -       stack_t stack = ((stack_t) { .ss_flags  = 0,
>>>> -                                    .ss_sp     = (__ptr_t) sig_stack,
>>>> -                                    .ss_size   = size - sizeof(void *) });
>>>> +       stack_t stack = ((stack_t) {
>>>> +                   .ss_flags = 0,
>>>> +                               .ss_sp    = (__ptr_t) sig_stack,
>>>> +                               .ss_size  = size - sizeof(void *)
>>>> +       });
>>>>
>>>> -       if (sigaltstack(&stack, NULL) != 0)
>>>> +       if (sigaltstack(&stack, NULL) != 0) {
>>>>                  panic("enabling signal stack failed, errno = %d\n", 
>>>> errno);
>>>> +       }
>>>> }
>>>>
>>>> static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t 
>>>> *mc) = {
>>>> @@ -129,10 +130,9 @@ static void (*handlers[_NSIG])(int sig, struct 
>>>> siginfo *si, mcontext_t *mc) = {
>>>>
>>>>          [SIGIO] = sig_handler,
>>>>          [SIGWINCH] = sig_handler,
>>>> -       [SIGVTALRM] = alarm_handler
>>>> +       [SIGUSR2] = hralarm_handler
>>>> };
>>>>
>>>> -
>>>> static void hard_handler(int sig, siginfo_t *si, void *p)
>>>> {
>>>>          struct ucontext *uc = p;
>>>> @@ -176,6 +176,13 @@ static void hard_handler(int sig, siginfo_t *si, void 
>>>> *p)
>>>>          } while (pending);
>>>> }
>>>>
>>>> +/**
>>>> + * set_handler() - enable signal in process' signal mask
>>>> + * @sig:    The signal to enable
>>>> + *
>>>> + * Enable the given signal in the process' signal mask and
>>>> + * attach hard_handler() as handler routine
>>>> + */
>>>> void set_handler(int sig)
>>>> {
>>>>          struct sigaction action;
>>>> @@ -186,9 +193,9 @@ void set_handler(int sig)
>>>>
>>>>          /* block irq ones */
>>>>          sigemptyset(&action.sa_mask);
>>>> -       sigaddset(&action.sa_mask, SIGVTALRM);
>>>>          sigaddset(&action.sa_mask, SIGIO);
>>>>          sigaddset(&action.sa_mask, SIGWINCH);
>>>> +       sigaddset(&action.sa_mask, SIGUSR2);
>>>>
>>>>          if (sig == SIGSEGV)
>>>>                  flags |= SA_NODEFER;
>>>> @@ -281,8 +288,8 @@ void unblock_signals(void)
>>>>                  if (save_pending & SIGIO_MASK)
>>>>                          sig_handler_common(SIGIO, NULL, NULL);
>>>>
>>>> -               if (save_pending & SIGVTALRM_MASK)
>>>> -                       real_alarm_handler(NULL);
>>>> +               if (save_pending & SIGUSR2_MASK)
>>>> +                       real_hralarm_handler(NULL);
>>>>          }
>>>> }
>>>>
>>>> @@ -298,9 +305,11 @@ int set_signals(int enable)
>>>>                  return enable;
>>>>
>>>>          ret = signals_enabled;
>>>> -       if (enable)
>>>> +       if (enable) {
>>>>                  unblock_signals();
>>>> -       else block_signals();
>>>> +       } else {
>>>> +           block_signals();
>>>> +    }
>>>>
>>>>          return ret;
>>>> }
>>>> diff --git a/arch/um/os-Linux/skas/process.c 
>>>> b/arch/um/os-Linux/skas/process.c
>>>> index 7a97775..30065e1 100644
>>>> --- a/arch/um/os-Linux/skas/process.c
>>>> +++ b/arch/um/os-Linux/skas/process.c
>>>> @@ -45,7 +45,7 @@ static int ptrace_dump_regs(int pid)
>>>>    * Signals that are OK to receive in the stub - we'll just continue it.
>>>>    * SIGWINCH will happen when UML is inside a detached screen.
>>>>    */
>>>> -#define STUB_SIG_MASK ((1 << SIGVTALRM) | (1 << SIGWINCH))
>>>> +#define STUB_SIG_MASK ((1 << SIGVTALRM) | (1 << SIGWINCH) | (1 << 
>>>> SIGUSR2))
>>>>
>>>> /* Signals that the stub will finish with - anything else is an error */
>>>> #define STUB_DONE_MASK (1 << SIGTRAP)
>>>> @@ -176,17 +176,59 @@ static void handle_trap(int pid, struct uml_pt_regs 
>>>> *regs,
>>>>
>>>> extern int __syscall_stub_start;
>>>>
>>>> +/**
>>>> + * userspace_tramp() - userspace trampoline
>>>> + * @stack:  The address of the stub stack used for the new process
>>>> + *          (used for SIGSEGV handling).
>>>> + *
>>>> + * The trampoline does execute as a new process after clone()
>>>> + * For each new userspace process the below code sets up
>>>> + * all necessary data:
>>>> + * 1.) enable ptrace from parent (the uml kernel)
>>>> + * 2.) Setup signal handling. Signals are inherited by the parent, i.e
>>>> + *     the uml kernel
>>>> + * 3.) Create and start an posix (interval) timer for this process.
>>>> + *     This timer will emulate the kernel timer ticks.
>>>> + *     The timer signal will be processed by the kernel process in 
>>>> userspace()
>>>> + * 4.) Map stub code page in the new process, i.e. the
>>>> + *     userspace process:
>>>> + *     The stub codes is used to catch syscalls from the userspace to
>>>> + *     the kernel.
>>>> + *     See linker scripts arch/um/kernel/dyn.lds.S (dynamic) resp.
>>>> + *                        arch/um/kernel/uml.lds.S (static)
>>>> + *     for __syscall_stub_start defintion and
>>>> + *     arch/um/kernel/skas/clone.c for the stub_handler itself.
>>>> + * 5.) Map stub data page in the new process, i.e. the
>>>> + *     userspace process:
>>>> + *     Setup an SIGSEGV handler into the new process.
>>>> + *     Page faults will be catched and signaled to the kernel via this
>>>> + *     mechanism.
>>>> + *     See arch/x86/um/stub_segv.c for the handler itself.
>>>> + * 6.) Stop the new process and wait for the kernel to SIGCONT it agian
>>>> + *     when it will get scheduled()
>>>> + */
>>>> static int userspace_tramp(void *stack)
>>>> {
>>>>          void *addr;
>>>>          int err, fd;
>>>>          unsigned long long offset;
>>>> +       timer_t timer;
>>>> +
>>>> +       struct stub_data *data = (struct stub_data *) stack;
>>>>
>>>>          ptrace(PTRACE_TRACEME, 0, 0, 0);
>>>>
>>>>          signal(SIGTERM, SIG_DFL);
>>>>          signal(SIGWINCH, SIG_IGN);
>>>> -       err = set_interval();
>>>> +
>>>> +       err = os_timer_create(&timer);
>>>> +       if (err) {
>>>> +               printk(UM_KERN_ERR "userspace_tramp - creation of timer 
>>>> failed, "
>>>> +                      "errno = %d\n", err);
>>>> +               exit(1);
>>>> +       }
>>>> +
>>>> +       err = os_timer_set_interval(&timer, &data->timer);
>>>>          if (err) {
>>>>                  printk(UM_KERN_ERR "userspace_tramp - setting timer 
>>>> failed, "
>>>>                         "errno = %d\n", err);
>>>> @@ -246,11 +288,18 @@ static int userspace_tramp(void *stack)
>>>> #define NR_CPUS 1
>>>> int userspace_pid[NR_CPUS];
>>>>
>>>> +/**
>>>> + * start_userspace() - start a new userspace process with a new mm context
>>>> + * @stub_stack: Address of the new process' stack
>>>> + *
>>>> + * called by init_new_context()
>>>> + */
>>>> int start_userspace(unsigned long stub_stack)
>>>> {
>>>>          void *stack;
>>>>          unsigned long sp;
>>>>          int pid, status, n, flags, err;
>>>> +       struct stub_data *data = (struct stub_data *) stub_stack;
>>>>
>>>>          stack = mmap(NULL, UM_KERN_PAGE_SIZE,
>>>>                       PROT_READ | PROT_WRITE | PROT_EXEC,
>>>> @@ -266,6 +315,14 @@ int start_userspace(unsigned long stub_stack)
>>>>
>>>>          flags = CLONE_FILES | SIGCHLD;
>>>>
>>>> +       *data = ((struct stub_data) {
>>>> +                       .timer  = ((struct itimerspec)
>>>> +                               { .it_value.tv_sec  = 0,
>>>> +                                 .it_value.tv_nsec = 
>>>> os_timer_remain(NULL),
>>>> +                                 .it_interval.tv_sec  = 0,
>>>> +                                 .it_interval.tv_nsec = UM_NSEC_PER_SEC / 
>>>> UM_HZ })
>>>> +       });
>>>> +
>>>>          pid = clone(userspace_tramp, (void *) sp, flags, (void *) 
>>>> stub_stack);
>>>>          if (pid < 0) {
>>>>                  err = -errno;
>>>> @@ -313,10 +370,15 @@ int start_userspace(unsigned long stub_stack)
>>>>          return err;
>>>> }
>>>>
>>>> +/**
>>>> + * userspace() - user space control loop
>>>> + * @regs:      the register's save memory
>>>> + *
>>>> + * The main loop that traces and controls each spwaned userspace
>>>> + * process
>>>> + */
>>>> void userspace(struct uml_pt_regs *regs)
>>>> {
>>>> -       struct itimerval timer;
>>>> -       unsigned long long nsecs, now;
>>>>          int err, status, op, pid = userspace_pid[0];
>>>>          /* To prevent races if using_sysemu changes under us.*/
>>>>          int local_using_sysemu;
>>>> @@ -325,13 +387,8 @@ void userspace(struct uml_pt_regs *regs)
>>>>          /* Handle any immediate reschedules or signals */
>>>>          interrupt_end();
>>>>
>>>> -       if (getitimer(ITIMER_VIRTUAL, &timer))
>>>> -               printk(UM_KERN_ERR "Failed to get itimer, errno = %d\n", 
>>>> errno);
>>>> -       nsecs = timer.it_value.tv_sec * UM_NSEC_PER_SEC +
>>>> -               timer.it_value.tv_usec * UM_NSEC_PER_USEC;
>>>> -       nsecs += os_nsecs();
>>>> -
>>>>          while (1) {
>>>> +
>>>>                  /*
>>>>                   * This can legitimately fail if the process loads a
>>>>                   * bogus value into a segment register.  It will
>>>> @@ -388,32 +445,19 @@ void userspace(struct uml_pt_regs *regs)
>>>>                          switch (sig) {
>>>>                          case SIGSEGV:
>>>>                                  if (PTRACE_FULL_FAULTINFO) {
>>>> -                                       get_skas_faultinfo(pid,
>>>> -                                                          
>>>> &regs->faultinfo);
>>>> -                                       (*sig_info[SIGSEGV])(SIGSEGV, 
>>>> (struct siginfo *)&si,
>>>> -                                                            regs);
>>>> +                                       
>>>> get_skas_faultinfo(pid,&regs->faultinfo);
>>>> +                                       (*sig_info[SIGSEGV])(SIGSEGV, 
>>>> (struct siginfo *)&si, regs);
>>>> +                               } else {
>>>> +                                       handle_segv(pid, regs);
>>>>                                  }
>>>> -                               else handle_segv(pid, regs);
>>>>                                  break;
>>>>                          case SIGTRAP + 0x80:
>>>> -                               handle_trap(pid, regs, local_using_sysemu);
>>>> +                               handle_trap(pid, regs, local_using_sysemu);
>>>>                                  break;
>>>>                          case SIGTRAP:
>>>>                                  relay_signal(SIGTRAP, (struct siginfo 
>>>> *)&si, regs);
>>>>                                  break;
>>>> -                       case SIGVTALRM:
>>>> -                               now = os_nsecs();
>>>> -                               if (now < nsecs)
>>>> -                                       break;
>>>> -                               block_signals();
>>>> -                               (*sig_info[sig])(sig, (struct siginfo 
>>>> *)&si, regs);
>>>> -                               unblock_signals();
>>>> -                               nsecs = timer.it_value.tv_sec *
>>>> -                                       UM_NSEC_PER_SEC +
>>>> -                                       timer.it_value.tv_usec *
>>>> -                                       UM_NSEC_PER_USEC;
>>>> -                               nsecs += os_nsecs();
>>>> -                               break;
>>>> +                       case SIGUSR2:
>>>>                          case SIGIO:
>>>>                          case SIGILL:
>>>>                          case SIGBUS:
>>>> @@ -448,8 +492,7 @@ static int __init init_thread_regs(void)
>>>>          thread_regs[REGS_IP_INDEX] = STUB_CODE +
>>>>                                  (unsigned long) stub_clone_handler -
>>>>                                  (unsigned long) &__syscall_stub_start;
>>>> -       thread_regs[REGS_SP_INDEX] = STUB_DATA + UM_KERN_PAGE_SIZE -
>>>> -               sizeof(void *);
>>>> +       thread_regs[REGS_SP_INDEX] = STUB_DATA + UM_KERN_PAGE_SIZE - 
>>>> sizeof(void *);
>>>> #ifdef __SIGNAL_FRAMESIZE
>>>>          thread_regs[REGS_SP_INDEX] -= __SIGNAL_FRAMESIZE;
>>>> #endif
>>>> @@ -458,26 +501,51 @@ static int __init init_thread_regs(void)
>>>>
>>>> __initcall(init_thread_regs);
>>>>
>>>> +/**
>>>> + * copy_context_skas0() - copy an mm context
>>>> + * new_stack:  void pointer of new stack, a zeroed page
>>>> + * pid:                        the pid of the mm parent, this proces is 
>>>> cloned
>>>> + *                             into a new one
>>>> + *
>>>> + * Copy an mm context from an existing task
>>>> + * 1.) get file descriptor and offset of the mmaped new_stack
>>>> + * 2.) set current stub stack's data: file descriptor, offset and timer 
>>>> data
>>>> + * 3.) Restore parents registers to init_thread_regs()
>>>> + * 4.) Continue parent (==from_mm) in stub_clone_handler(), see also
>>>> + *     init_thread_regs(). This will clone a new process with same
>>>> + *     mm.
>>>> + * 5.)
>>>> + *
>>>> + * Returns the PID of the new process
>>>> + */
>>>> int copy_context_skas0(unsigned long new_stack, int pid)
>>>> {
>>>> -       struct timeval tv = { .tv_sec = 0, .tv_usec = UM_USEC_PER_SEC / 
>>>> UM_HZ };
>>>>          int err;
>>>>          unsigned long current_stack = current_stub_stack();
>>>>          struct stub_data *data = (struct stub_data *) current_stack;
>>>>          struct stub_data *child_data = (struct stub_data *) new_stack;
>>>>          unsigned long long new_offset;
>>>> +
>>>>          int new_fd = phys_mapping(to_phys((void *)new_stack), 
>>>> &new_offset);
>>>>
>>>>          /*
>>>>           * prepare offset and fd of child's stack as argument for parent's
>>>>           * and child's mmap2 calls
>>>>           */
>>>> -       *data = ((struct stub_data) { .offset   = MMAP_OFFSET(new_offset),
>>>> -                                     .fd       = new_fd,
>>>> -                                     .timer    = ((struct itimerval)
>>>> -                                                  { .it_value = tv,
>>>> -                                                    .it_interval = tv }) 
>>>> });
>>>> -
>>>> +       *data = ((struct stub_data) {
>>>> +                       .offset = MMAP_OFFSET(new_offset),
>>>> +                       .fd     = new_fd,
>>>> +                       .timer  = ((struct itimerspec)
>>>> +                                            { .it_value.tv_sec  = 0,
>>>> +                                              .it_value.tv_nsec = 
>>>> os_timer_remain(NULL),
>>>> +                                              .it_interval.tv_sec  = 0,
>>>> +                                              .it_interval.tv_nsec = 
>>>> UM_NSEC_PER_SEC / UM_HZ })
>>>> +       });
>>>> +
>>>> +       /* set parents regs
>>>> +        * this set the registers to the saved registers done in the 
>>>> initcall
>>>> +        * init_thread_regs()
>>>> +        */
>>>>          err = ptrace_setregs(pid, thread_regs);
>>>>          if (err < 0) {
>>>>                  err = -errno;
>>>> @@ -486,6 +554,7 @@ int copy_context_skas0(unsigned long new_stack, int 
>>>> pid)
>>>>                  return err;
>>>>          }
>>>>
>>>> +       /* set parents fp registers */
>>>>          err = put_fp_registers(pid, thread_fp_regs);
>>>>          if (err < 0) {
>>>>                  printk(UM_KERN_ERR "copy_context_skas0 : put_fp_registers 
>>>> "
>>>> @@ -493,7 +562,9 @@ int copy_context_skas0(unsigned long new_stack, int 
>>>> pid)
>>>>                  return err;
>>>>          }
>>>>
>>>> -       /* set a well known return code for detection of child write 
>>>> failure */
>>>> +       /* set a well known return code for detection of child write 
>>>> failure,
>>>> +        * i.e. on the new stack
>>>> +        */
>>>>          child_data->err = 12345678;
>>>>
>>>>          /*
>>>> @@ -508,8 +579,10 @@ int copy_context_skas0(unsigned long new_stack, int 
>>>> pid)
>>>>                  return err;
>>>>          }
>>>>
>>>> +       /* wait for parents stub_clone_handler() to finish */
>>>>          wait_stub_done(pid);
>>>>
>>>> +       /* get childs pid, the pid of the cloned parent process */
>>>>          pid = data->err;
>>>>          if (pid < 0) {
>>>>                  printk(UM_KERN_ERR "copy_context_skas0 - stub-parent 
>>>> reports "
>>>> diff --git a/arch/um/os-Linux/time.c b/arch/um/os-Linux/time.c
>>>> index e9824d5..5a7f49c 100644
>>>> --- a/arch/um/os-Linux/time.c
>>>> +++ b/arch/um/os-Linux/time.c
>>>> @@ -1,4 +1,5 @@
>>>> /*
>>>> + * Copyright (C) 2012-2014 Cisco Systems
>>>>    * Copyright (C) 2000 - 2007 Jeff Dike (jdike{addtoit,linux.intel}.com)
>>>>    * Licensed under the GPL
>>>>    */
>>>> @@ -10,177 +11,177 @@
>>>> #include <sys/time.h>
>>>> #include <kern_util.h>
>>>> #include <os.h>
>>>> -#include "internal.h"
>>>> +#include <string.h>
>>>> +#include <timer-internal.h>
>>>>
>>>> -int set_interval(void)
>>>> -{
>>>> -       int usec = UM_USEC_PER_SEC / UM_HZ;
>>>> -       struct itimerval interval = ((struct itimerval) { { 0, usec },
>>>> -                                                         { 0, usec } });
>>>> -
>>>> -       if (setitimer(ITIMER_VIRTUAL, &interval, NULL) == -1)
>>>> -               return -errno;
>>>> +static timer_t event_high_res_timer = 0;
>>>>
>>>> -       return 0;
>>>> +static inline long long timeval_to_ns(const struct timeval *tv)
>>>> +{
>>>> +       return ((long long) tv->tv_sec * UM_NSEC_PER_SEC) +
>>>> +               tv->tv_usec * UM_NSEC_PER_USEC;
>>>> }
>>>>
>>>> -int timer_one_shot(int ticks)
>>>> +static inline long long timespec_to_ns(const struct timespec *ts)
>>>> {
>>>> -       unsigned long usec = ticks * UM_USEC_PER_SEC / UM_HZ;
>>>> -       unsigned long sec = usec / UM_USEC_PER_SEC;
>>>> -       struct itimerval interval;
>>>> -
>>>> -       usec %= UM_USEC_PER_SEC;
>>>> -       interval = ((struct itimerval) { { 0, 0 }, { sec, usec } });
>>>> +       return ((long long) ts->tv_sec * UM_NSEC_PER_SEC) +
>>>> +               ts->tv_nsec;
>>>> +}
>>>>
>>>> -       if (setitimer(ITIMER_VIRTUAL, &interval, NULL) == -1)
>>>> -               return -errno;
>>>> +long long os_persistent_clock_emulation (void) {
>>>> +       struct timespec realtime_tp;
>>>>
>>>> -       return 0;
>>>> +       clock_gettime(CLOCK_REALTIME, &realtime_tp);
>>>> +       return timespec_to_ns(&realtime_tp);
>>>> }
>>>>
>>>> /**
>>>> - * timeval_to_ns - Convert timeval to nanoseconds
>>>> - * @ts:                pointer to the timeval variable to be converted
>>>> - *
>>>> - * Returns the scalar nanosecond representation of the timeval
>>>> - * parameter.
>>>> - *
>>>> - * Ripped from linux/time.h because it's a kernel header, and thus
>>>> - * unusable from here.
>>>> + * os_timer_create() - create an new posix (interval) timer
>>>>    */
>>>> -static inline long long timeval_to_ns(const struct timeval *tv)
>>>> -{
>>>> -       return ((long long) tv->tv_sec * UM_NSEC_PER_SEC) +
>>>> -               tv->tv_usec * UM_NSEC_PER_USEC;
>>>> -}
>>>> +int os_timer_create(void* timer) {
>>>>
>>>> -long long disable_timer(void)
>>>> -{
>>>> -       struct itimerval time = ((struct itimerval) { { 0, 0 }, { 0, 0 } 
>>>> });
>>>> -       long long remain, max = UM_NSEC_PER_SEC / UM_HZ;
>>>> +       struct sigevent sev;
>>>> +       timer_t* t = timer;
>>>>
>>>> -       if (setitimer(ITIMER_VIRTUAL, &time, &time) < 0)
>>>> -               printk(UM_KERN_ERR "disable_timer - setitimer failed, "
>>>> -                      "errno = %d\n", errno);
>>>> +       if(t == NULL) {
>>>> +               t = &event_high_res_timer;
>>>> +       }
>>>>
>>>> -       remain = timeval_to_ns(&time.it_value);
>>>> -       if (remain > max)
>>>> -               remain = max;
>>>> +       sev.sigev_notify = SIGEV_SIGNAL;
>>>> +       sev.sigev_signo = SIGUSR2; /* note - hrtimer now has its own 
>>>> signal */
>>>> +       sev.sigev_value.sival_ptr = &event_high_res_timer;
>>>>
>>>> -       return remain;
>>>> +       if (timer_create(
>>>> +               CLOCK_MONOTONIC,
>>>> +               &sev,
>>>> +               t) == -1) {
>>>> +               return -1;
>>>> +       }
>>>> +       return 0;
>>>> }
>>>>
>>>> -long long os_nsecs(void)
>>>> +int os_timer_set_interval(void* timer, void* i)
>>>> {
>>>> -       struct timeval tv;
>>>> +       struct itimerspec its;
>>>> +       unsigned long long nsec;
>>>> +       timer_t* t = timer;
>>>> +       struct itimerspec* its_in = i;
>>>>
>>>> -       gettimeofday(&tv, NULL);
>>>> -       return timeval_to_ns(&tv);
>>>> -}
>>>> +       if(t == NULL) {
>>>> +               t = &event_high_res_timer;
>>>> +       }
>>>> +
>>>> +       nsec = UM_NSEC_PER_SEC / UM_HZ;
>>>> +
>>>> +       if(its_in != NULL) {
>>>> +               its.it_value.tv_sec = its_in->it_value.tv_sec;
>>>> +               its.it_value.tv_nsec = its_in->it_value.tv_nsec;
>>>> +       } else {
>>>> +               its.it_value.tv_sec = 0;
>>>> +               its.it_value.tv_nsec = nsec;
>>>> +       }
>>>> +
>>>> +       its.it_interval.tv_sec = 0;
>>>> +       its.it_interval.tv_nsec = nsec;
>>>> +
>>>> +       if(timer_settime(*t, 0, &its, NULL) == -1) {
>>>> +               return -errno;
>>>> +       }
>>>>
>>>> -#ifdef UML_CONFIG_NO_HZ_COMMON
>>>> -static int after_sleep_interval(struct timespec *ts)
>>>> -{
>>>>          return 0;
>>>> }
>>>>
>>>> -static void deliver_alarm(void)
>>>> +/**
>>>> + * os_timer_remain() - returns the remaining nano seconds of the given 
>>>> interval
>>>> + *                     timer
>>>> + * Because this is the remaining time of an interval timer, which 
>>>> correspondends
>>>> + * to HZ, this value can never be bigger than one second. Just
>>>> + * the nanosecond part of the timer is returned.
>>>> + * The returned time is relative to the start time of the interval timer.
>>>> + * Return an negative value in an error case.
>>>> + */
>>>> +long os_timer_remain(void* timer)
>>>> {
>>>> -       alarm_handler(SIGVTALRM, NULL, NULL);
>>>> -}
>>>> +       struct itimerspec its;
>>>> +       timer_t* t = timer;
>>>>
>>>> -static unsigned long long sleep_time(unsigned long long nsecs)
>>>> -{
>>>> -       return nsecs;
>>>> -}
>>>> +       if(t == NULL) {
>>>> +               t = &event_high_res_timer;
>>>> +       }
>>>>
>>>> -#else
>>>> -unsigned long long last_tick;
>>>> -unsigned long long skew;
>>>> +       if(timer_gettime(t, &its) == -1) {
>>>> +               return -errno;
>>>> +       }
>>>>
>>>> -static void deliver_alarm(void)
>>>> -{
>>>> -       unsigned long long this_tick = os_nsecs();
>>>> -       int one_tick = UM_NSEC_PER_SEC / UM_HZ;
>>>> +       return its.it_value.tv_nsec;
>>>> +}
>>>>
>>>> -       /* Protection against the host's time going backwards */
>>>> -       if ((last_tick != 0) && (this_tick < last_tick))
>>>> -               this_tick = last_tick;
>>>> +int os_timer_one_shot(int ticks)
>>>> +{
>>>> +       struct itimerspec its;
>>>> +       unsigned long long nsec;
>>>> +       unsigned long sec;
>>>>
>>>> -       if (last_tick == 0)
>>>> -               last_tick = this_tick - one_tick;
>>>> +    nsec = (ticks + 1);
>>>> +    sec = nsec / UM_NSEC_PER_SEC;
>>>> +       nsec = nsec % UM_NSEC_PER_SEC;
>>>>
>>>> -       skew += this_tick - last_tick;
>>>> +       its.it_value.tv_sec = nsec / UM_NSEC_PER_SEC;
>>>> +       its.it_value.tv_nsec = nsec;
>>>>
>>>> -       while (skew >= one_tick) {
>>>> -               alarm_handler(SIGVTALRM, NULL, NULL);
>>>> -               skew -= one_tick;
>>>> -       }
>>>> +       its.it_interval.tv_sec = 0;
>>>> +       its.it_interval.tv_nsec = 0; // we cheat here
>>>>
>>>> -       last_tick = this_tick;
>>>> +       timer_settime(event_high_res_timer, 0, &its, NULL);
>>>> +       return 0;
>>>> }
>>>>
>>>> -static unsigned long long sleep_time(unsigned long long nsecs)
>>>> +/**
>>>> + * os_timer_disable() - disable the posix (interval) timer
>>>> + * Returns the remaining interval timer time in nanoseconds
>>>> + */
>>>> +long long os_timer_disable(void)
>>>> {
>>>> -       return nsecs > skew ? nsecs - skew : 0;
>>>> +       struct itimerspec its;
>>>> +
>>>> +       memset(&its, 0, sizeof(struct itimerspec));
>>>> +       timer_settime(event_high_res_timer, 0, &its, &its);
>>>> +
>>>> +       return its.it_value.tv_sec * UM_NSEC_PER_SEC + 
>>>> its.it_value.tv_nsec;
>>>> }
>>>>
>>>> -static inline long long timespec_to_us(const struct timespec *ts)
>>>> +long long os_vnsecs(void)
>>>> {
>>>> -       return ((long long) ts->tv_sec * UM_USEC_PER_SEC) +
>>>> -               ts->tv_nsec / UM_NSEC_PER_USEC;
>>>> +       struct timespec ts;
>>>> +
>>>> +       clock_gettime(CLOCK_PROCESS_CPUTIME_ID,&ts);
>>>> +       return timespec_to_ns(&ts);
>>>> }
>>>>
>>>> -static int after_sleep_interval(struct timespec *ts)
>>>> +long long os_nsecs(void)
>>>> {
>>>> -       int usec = UM_USEC_PER_SEC / UM_HZ;
>>>> -       long long start_usecs = timespec_to_us(ts);
>>>> -       struct timeval tv;
>>>> -       struct itimerval interval;
>>>> -
>>>> -       /*
>>>> -        * It seems that rounding can increase the value returned from
>>>> -        * setitimer to larger than the one passed in.  Over time,
>>>> -        * this will cause the remaining time to be greater than the
>>>> -        * tick interval.  If this happens, then just reduce the first
>>>> -        * tick to the interval value.
>>>> -        */
>>>> -       if (start_usecs > usec)
>>>> -               start_usecs = usec;
>>>> -
>>>> -       start_usecs -= skew / UM_NSEC_PER_USEC;
>>>> -       if (start_usecs < 0)
>>>> -               start_usecs = 0;
>>>> -
>>>> -       tv = ((struct timeval) { .tv_sec  = start_usecs / UM_USEC_PER_SEC,
>>>> -                                .tv_usec = start_usecs % UM_USEC_PER_SEC 
>>>> });
>>>> -       interval = ((struct itimerval) { { 0, usec }, tv });
>>>> -
>>>> -       if (setitimer(ITIMER_VIRTUAL, &interval, NULL) == -1)
>>>> -               return -errno;
>>>> +       struct timespec ts;
>>>>
>>>> -       return 0;
>>>> +       clock_gettime(CLOCK_MONOTONIC,&ts);
>>>> +       return timespec_to_ns(&ts);
>>>> }
>>>> -#endif
>>>>
>>>> -void idle_sleep(unsigned long long nsecs)
>>>> +/**
>>>> + * os_idle_sleep() - sleep for a given time of nsecs
>>>> + * @nsecs: nanoseconds to sleep
>>>> + */
>>>> +void os_idle_sleep(unsigned long long nsecs)
>>>> {
>>>>          struct timespec ts;
>>>>
>>>> -       /*
>>>> -        * nsecs can come in as zero, in which case, this starts a
>>>> -        * busy loop.  To prevent this, reset nsecs to the tick
>>>> -        * interval if it is zero.
>>>> -        */
>>>> -       if (nsecs == 0)
>>>> -               nsecs = UM_NSEC_PER_SEC / UM_HZ;
>>>> -
>>>> -       nsecs = sleep_time(nsecs);
>>>> -       ts = ((struct timespec) { .tv_sec       = nsecs / UM_NSEC_PER_SEC,
>>>> -                                 .tv_nsec      = nsecs % UM_NSEC_PER_SEC 
>>>> });
>>>> -
>>>> -       if (nanosleep(&ts, &ts) == 0)
>>>> -               deliver_alarm();
>>>> -       after_sleep_interval(&ts);
>>>> +       if (nsecs <= 0) {
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       ts = ((struct timespec) {
>>>> +                       .tv_sec  = nsecs / UM_NSEC_PER_SEC,
>>>> +                       .tv_nsec = nsecs % UM_NSEC_PER_SEC
>>>> +       });
>>>> +
>>>> +       clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>>>> }
>>>> diff --git a/arch/um/os-Linux/util.c b/arch/um/os-Linux/util.c
>>>> index faee55e..10ecc06 100644
>>>> --- a/arch/um/os-Linux/util.c
>>>> +++ b/arch/um/os-Linux/util.c
>>>> @@ -102,6 +102,7 @@ void os_fix_helper_signals(void)
>>>>          signal(SIGWINCH, SIG_IGN);
>>>>          signal(SIGINT, SIG_DFL);
>>>>          signal(SIGTERM, SIG_DFL);
>>>> +       signal(SIGUSR2, SIG_IGN);
>>>> }
>>>>
>>>> void os_dump_core(void)
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>>> Widest out-of-the-box monitoring support with 50+ applications
>>>> Performance metrics, stats and reports that give you Actionable Insights
>>>> Deep dive visibility with transaction tracing using APM Insight.
>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> _______________________________________________
>>>> User-mode-linux-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>>
>>> --
>>> Thanks,
>>> //richard
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> User-mode-linux-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Re: [uml-devel] [PATCH v6] um: Add a high resolution timer subsystem

Reply via email to