Module Name:    src
Committed By:   pooka
Date:           Tue Apr 30 21:18:40 UTC 2013

Modified Files:
        src/lib/librumpuser: rumpuser.3

Log Message:
document the hypercall interface


To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/lib/librumpuser/rumpuser.3

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/librumpuser/rumpuser.3
diff -u src/lib/librumpuser/rumpuser.3:1.2 src/lib/librumpuser/rumpuser.3:1.3
--- src/lib/librumpuser/rumpuser.3:1.2	Mon Mar  1 17:20:44 2010
+++ src/lib/librumpuser/rumpuser.3	Tue Apr 30 21:18:40 2013
@@ -1,6 +1,6 @@
-.\"     $NetBSD: rumpuser.3,v 1.2 2010/03/01 17:20:44 pooka Exp $
+.\"     $NetBSD: rumpuser.3,v 1.3 2013/04/30 21:18:40 pooka Exp $
 .\"
-.\" Copyright (c) 2010 Antti Kantee.  All rights reserved.
+.\" Copyright (c) 2013 Antti Kantee.  All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
@@ -23,42 +23,587 @@
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
-.Dd March 1, 2010
+.Dd April 30, 2013
 .Dt RUMPUSER 3
 .Os
 .Sh NAME
 .Nm rumpuser
-.Nd rump hypervisor interface
+.Nd rump kernel hypercall interface
 .Sh LIBRARY
 rump User Library (librumpuser, \-lrumpuser)
 .Sh SYNOPSIS
 .In rump/rumpuser.h
 .Sh DESCRIPTION
+The
 .Nm
-is the hypervisor interface for
-.Xr rump 3
-style kernel virtualization.
-A virtual rump kernel can make calls to the host operating system
-libraries and kernel (system calls) using
-.Nm
-interfaces.
-Any "slow" hypervisor calls such as file I/O, sychronization wait,
-or sleep will cause rump to unschedule the calling kernel thread
-from the virtual CPU and free it for other consumers.
-When the hypervisor call returns to the kernel, a new scheduling
-operation takes place.
-.Pp
-For example, rump implements kernel threads directly as hypervisor
-calls to host
-.Xr pthread 3 .
-This avoids the common virtualization drawback of multiple overlapping
-and possibly conflicting implementations of same functionality in
-the software stack.
+hypercall interfaces allow a rump kernel to access host resources.
+A hypervisor implementation must implement the routines described in
+this document to allow a rump kernel to run on the host.
+The implementation included in
+.Nx
+is for POSIX hosts.
+This document is divided into sections based on the functionality
+group of each hypercall.
+.Sh UPCALLS AND RUMP KERNEL CONTEXT
+A hypercall is always entered with the calling thread scheduled in
+the rump kernel.
+In case the hypercall intends to block while waiting for an event,
+the hypervisor must first release the rump kernel scheduling context.
+In other words, the rump kernel context is a resource and holding
+on to it while waiting for a rump kernel event/resource may lead
+to a deadlock.
+Even when there is no possibility of deadlock in the strict sense
+of the term, holding on to the rump kernel context while performing
+a slow hypercall such as reading a device will prevent other threads
+(including the clock interrupt) from using that rump kernel context.
+.Pp
+Releasing the context is done by calling the
+.Fn hyp_backend_unschedule
+upcall which the hypervisor received from rump kernel as a parameter
+for
+.Fn rumpuser_init .
+Before a hypercall returns back to the rump kernel, the returning thread
+must carry a rump kernel context.
+In case the hypercall unscheduled itself, it must reschedule itself
+by calling
+.Fn hyp_backend_schedule .
+.Sh HYPERCALL INTERFACES
+.Ss Initialization
+.Ft int
+.Fn rumpuser_init "int version" "struct rump_hyperup *hyp"
+.Pp
+Initialize the hypervisor.
+.Bl -tag -width "xalignmentx"
+.It Fa version
+hypercall interface version number that the kernel expects to be used.
+In case the hypervisor cannot provide an exact match, this routine must
+return a non-zero value.
+.It Fa hyp
+pointer to a set of upcalls the hypervisor can make into the rump kernel
+.El
+.Ss Memory allocation
+.Ft int
+.Fn rumpuser_malloc "size_t len" "int alignment" "void **memp"
+.Bl -tag -width "xalignmentx"
+.It Fa len
+amount of memory to allocate
+.It Fa alignment
+size the returned memory must be aligned to.
+For example, if the value passed is 4096, the returned memory
+must be aligned to a 4k boundary.
+.It Fa memp
+return pointer for allocated memory
+.El
+.Pp
+.Ft void
+.Fn rumpuser_free "void *mem" "size_t len"
+.Bl -tag -width "xalignmentx"
+.It Fa mem
+memory to free
+.It Fa len
+length of allocation.
+This is always equal to the amount the caller requested from the
+.Fn rumpuser_malloc
+which returned
+.Fa mem .
+.El
+.Ss Files and I/O
+.Ft int
+.Fn rumpuser_open "const char *name" "int mode" "int *fdp"
+.Pp
+Open a file for I/O.
+Notably, there needs to be no mapping between
+.Fa name
+and the host, but for example on a POSIX system it may be convenient
+to let
+.Fa name
+denote the host file system namespace.
+.Bl -tag -width "xalignmentx"
+.It Fa name
+the identifier of the file to open for I/O
+.It Fa mode
+combination of the following:
+.Bl -tag -width "XRUMPUSER_OPEN_CREATE"
+.It Dv RUMPUSER_OPEN_RDONLY
+open only for reading
+.It Dv RUMPUSER_OPEN_WRONLY
+open only for writing
+.It Dv RUMPUSER_OPEN_RDWR
+open for reading and writing
+.It Dv RUMPUSER_OPEN_CREATE
+do not treat missing
+.Fa name
+as an error
+.It Dv RUMPUSER_OPEN_EXCL
+combined with
+.Dv RUMPUSER_OPEN_CREATE ,
+flag an error if
+.Fa name
+already exists
+.It Dv RUMPUSER_OPEN_BIO
+the caller will use this file for block I/O, usually used in
+conjunction with accessing file system media.
+The hypervisor should treat this flag as advisory and possibly
+enable some optimizations for
+.Fa *fdp
+based on it.
+.El
+Notably, the permissions of the created file are left up to the
+hypervisor implementation.
+.It Fa fdp
+An integer value denoting the open file is returned here.
+.El
+.Pp
+.Ft int
+.Fn rumpuser_close "int fd"
+.Pp
+Close a previously opened file descriptor.
+.Pp
+.Ft int
+.Fn rumpuser_getfileinfo "const char *name" "uint64_t *size" "int *type"
+.Bl -tag -width "xalignmentx"
+.It Fa name
+file for which information is returned.
+The namespace is equal to that of
+.Fn rumpuser_open .
+.It Fa size
+If non-NULL, size of the file is returned here.
+.It Fa type
+If non-NULL, type of the file is returned here.
+The options are
+.Dv RUMPUSER_FT_DIR ,
+.Dv RUMPUSER_FT_REG ,
+.Dv RUMPUSER_FT_BLK ,
+.Dv RUMPUSER_FT_CHR ,
+or
+.Dv RUMPUSER_FT_OTHER
+for directory, regular file, block device, character device or unknown,
+respectively.
+.El
+.Pp
+.Ft void
+.Fo rumpuser_bio
+.Fa "int fd" "int op" "void *data" "size_t dlen" "off_t off"
+.Fa "rump_biodone_fn biodone" "void *donearg"
+.Fc
+.Pp
+Initiate block I/O and return immediately.
+.Bl -tag -width "xalignmentx"
+.It Fa fd
+perform I/O on this file descriptor.
+The file descriptor must have been opened with
+.Dv RUMPUSER_OPEN_BIO .
+.It Fa op
+Transfer data from the file descriptor with
+.Dv RUMPUSER_BIO_READ
+and transfer data to the file descriptor with
+.Dv RUMPUSER_BIO_WRITE .
+Unless
+.Dv RUMPUSER_BIO_SYNC
+is specified, the hypervisor may cache a write instead of
+committing it to permanent storage.
+.It Fa data
+memory address to transfer data to/from
+.It Fa dlen
+length of I/O.
+The length is guaranteed to be a multiple of 512.
+.It Fa off
+offset into
+.Fa fd
+where I/O is performed
+.It Fa biodone
+To be called when the I/O is complete.
+Accessing
+.Fa data
+is not legal after the call is made.
+.It Fa donearg
+opaque arg that must be passed to
+.Fa biodone .
+.El
+.Pp
+.Ft int
+.Fo rumpuser_iovread
+.Fa "int fd" "struct rumpuser_iovec *ruiov" "size_t iovlen"
+.Fa "off_t off" "size_t *retv"
+.Fc
+.Pp
+.Ft int
+.Fo rumpuser_iovwrite
+.Fa "int fd" "struct rumpuser_iovec *ruiov" "size_t iovlen"
+.Fa "off_t off" "size_t *retv"
+.Fc
+.Pp
+These routines perform scatter-gather I/O which is not
+block I/O by nature and therefore cannot be handled by
+.Fn rumpuser_bio .
+.Pp
+.Bl -tag -width "xalignmentx"
+.It Fa fd
+file descriptor to perform I/O on
+.It Fa ruiov
+an array of I/O descriptors.
+It is defined as follows:
+.Bd -literal -offset indent -compact
+struct rumpuser_iovec {
+	void *iov_base;
+	size_t iov_len;
+};
+.Ed
+.It Fa iovlen
+number of elements in
+.Fa ruiov
+.It Fa off
+offset of
+.Fa fd
+to perform I/O on.
+This can either be a non-negative value or
+.Dv RUMPUSER_IOV_NOSEEK .
+The latter denotes that no attempt to change the underlying objects
+offset should be made.
+Using both types of offsets on a single instance of
+.Fa fd
+results in undefined behavior.
+.It Fa retv
+number of bytes successfully transferred is returned here
+.El
+.Ss Clocks
+The hypervisor should support two clocks, one for wall time and one
+for monotonically increasing time, the latter of which may be based
+on some arbitrary time (e.g. system boot time).
+If this is not possible, the hypervisor must make a reasonable effort to
+retain semantics.
+.Pp
+.Ft int
+.Fn rumpuser_clock_gettime "enum rumpclock clk" "uint64_t *sec" "uint64_t *nsec"
+.Pp
+.Bl -tag -width "xalignmentx"
+.It Fa clk
+specifies the clock type.
+In case of
+.Dv RUMPUSER_CLOCK_RELWALL
+the wall time should be returned.
+In case of
+.Dv RUMPUSER_CLOCK_ABSMONO
+the time of a monotonic clock should be returned.
+.It Fa sec
+return value for seconds
+.It Fa nsec
+return value for nanoseconds
+.El
+.Pp
+.Ft int
+.Fn rumpuser_clock_sleep "enum rumpclock clk" "uint64_t sec" "uint64_t nsec"
+.Bl -tag -width "xalignmentx"
+.It Fa clk
+In case of
+.Dv RUMPUSER_CLOCK_RELWALL ,
+the sleep should last at least as long as specified.
+In case of
+.Dv RUMPUSER_CLOCK_ABSMONO ,
+the sleep should last until the hypervisor monotonic clock hits
+the specified absolute time.
+.It Fa sec
+sleep duration, seconds.
+exact semantics depend on
+.Fa clk .
+.It Fa nsec
+sleep duration, nanoseconds.
+exact semantics depend on
+.Fa clk .
+.El
+.Ss Parameter retrieval
+.Ft int
+.Fn rumpuser_getparam "const char *name" "void *buf" "size_t buflen"
+.Pp
+Retrieve a configuration parameter from the hypervisor.
+It is up to the hypervisor to decide how the parameters can be set.
+.Bl -tag -width "xalignmentx"
+.It Fa name
+name of the parameter.
+If the name starts with an underscore, it means a mandatory parameter.
+The mandatory parameters are
+.Dv RUMPUSER_PARAM_NCPU
+which specifies the amount of virtual CPUs bootstrapped by the
+rump kernel and
+.Dv RUMPUSER_PARAM_HOSTNAME
+which returns a preferably unique instance name for the rump kernel.
+.It Fa buf
+buffer to return the data in as a string
+.It Fa buflen
+length of buffer
+.El
+.Ss Termination
+.Ft void
+.Fn rumpuser_exit "int value"
+.Pp
+Terminate the rump kernel with exit value
+.Fa value .
+If
+.Fa value
+is
+.Dv RUMPUSER_PANIC
+the hypervisor should attempt to provide something akin to a core dump.
+.Ss Console output
+.Pp
+Console output is divided into two routines: a per-character
+one and printf-like one.
+The former is used e.g. by the rump kernel's internal printf
+routine.
+The latter can be used for direct debug prints e.g. very early
+on in the rump kernel's bootstrap or when using the in-kernel
+routine causes too much skew in the debug print results
+(the hypercall runs outside of the rump kernel and therefore does not
+cause any locking or scheduling events inside the rump kernel).
+.Pp
+.Ft void
+.Fn rumpuser_putchar "int ch"
+.Pp
+Output
+.Fa ch
+on the console.
+.Pp
+.Ft void
+.Fn rumpuser_dprintf "const char *fmt" "..."
+.Pp
+Do output based on printf-like parameters.
+.Ss Random pool
+.Ft int
+.Fn rumpuser_getrandom "void *buf" "size_t buflen" "int flags" "size_t *retp"
+.Pp
+.Bl -tag -width "xalignmentx"
+.It Fa buf
+buffer that the randomness is written to
+.It Fa buflen
+number of bytes of randomness requested
+.It Fa flags
+The value 0 or a combination of
+.Dv RUMPUSER_RANDOM_HARD
+(return true randomness instead of something from a PRNG)
+and
+.Dv RUMPUSER_RANDOM_NOWAIT
+(do not block in case the requested amount of bytes is not available).
+.It Fa retp
+The number of random bytes written into
+.Fa buf .
+.El
+.Ss Threads
+.Pp
+.Ft int
+.Fo rumpuser_thread_create
+.Fa "void *(*fun)(void *)" "void *arg" "const char *thrname" "int mustjoin"
+.Fa "int priority" "int cpuidx" "void **cookie"
+.Fc
+.Pp
+Create a thread.
+In case the hypervisor wants to optimize the scheduling of the
+threads, it can perform heuristics on the
+.Fa thrname ,
+.Fa priority
+and
+.Fa cpuidx
+parameters.
+.Bl -tag -width "xalignmentx"
+.It Fa fun
+function that the new thread must call
+.It Fa arg
+argument to be passed to
+.Fa fun
+.It Fa thrname
+Name of the new thread.
+.It Fa mustjoin
+If 1, the thread will be waited for by
+.Fn rumpuser_thread_join
+when the thread exits.
+.It Fa priority
+The priority that the kernel requested the thread to be created at.
+Higher values mean higher priority.
+The exact kernel semantics for each value are not available through
+this interface.
+.It Fa cpuidx
+The index of the virtual CPU that the thread is bound to, or \-1
+if the thread is not bound.
+The mapping between the virtual CPUs and physical CPUs, if any,
+is hypervisor implementation specific.
+.It Fa cookie
+In case
+.Fa mustjoin
+is set, the value returned in
+.Fa cookie
+will be passed to
+.Fn rumpuser_thread_join .
+.El
+.Pp
+.Ft void
+.Fn rumpuser_thread_exit "void"
+.Pp
+Called when a thread created with
+.Fn rumpuser_thread_create
+exits.
+.Pp
+.Ft int
+.Fn rumpuser_thread_join "void *cookie"
+.Pp
+Wait for a joinable thread to exit.
+The cookie matches the value from
+.Fn rumpuser_thread_create .
+.Pp
+.Ft void
+.Fn rumpuser_set_curlwp "struct lwp *l"
 .Pp
+Set
+.Fa l
+as the rump kernel thread context for the calling host thread.
+The value
+.Dv NULL
+means that an existing rump kernel context (which must exist)
+must be cleared.
+.Pp
+.Ft struct lwp *
+.Fn rumpuser_get_curlwp "void"
+.Pp
+Retrieve the rump kernel thread context previously set by
+.Fn rumpuser_set_curlwp .
+This routine can be called when a context does not exist and
+the routine must return
+.Dv NULL
+in that case.
+.Pp
+.Ft void
+.Fn rumpuser_seterrno "int errno"
+.Pp
+Set an errno value in the calling thread's TLS.
+Note: this is used only if rump kernel clients make rump system calls.
+.Ss Mutexes, rwlocks and condition variables
+The locking interfaces have standard semantics, so we will not
+discuss each one in detail.
+The data types
+.Vt struct rumpuser_mtx ,
+.Vt struct rumpuser_rw
+and
+.Vt struct rumpuser_cv
+used by these interfaces are opaque to the rump kernel, i.e. the
+hypervisor has complete freedom over them.
+.Pp
+Most of these interfaces will (and must) relinquish the rump kernel
+CPU context in case they block (or intend to block).
+The exceptions are the "nowrap" variants of the interfaces which
+may not relinquish rump kernel context.
+.Pp
+.Ft void
+.Fn rumpuser_mutex_init "struct rumpuser_mtx **mtxp" "int flags"
+.Pp
+.Ft void
+.Fn rumpuser_mutex_enter "struct rumpuser_mtx *mtx"
+.Pp
+.Ft void
+.Fn rumpuser_mutex_enter_nowrap "struct rumpuser_mtx *mtx"
+.Pp
+.Ft int
+.Fn rumpuser_mutex_tryenter "struct rumpuser_mtx *mtx"
+.Pp
+.Ft void
+.Fn rumpuser_mutex_exit "struct rumpuser_mtx *mtx"
+.Pp
+.Ft void
+.Fn rumpuser_mutex_destroy "struct rumpuser_mtx *mtx"
+.Pp
+.Ft void
+.Fn rumpuser_mutex_owner "struct rumpuser_mtx *mtx" "struct lwp **lp"
+.Pp
+Mutexes provide mutually exclusive locking.
+The flags for initialization are as follows:
+.Bl -tag -width "XRUMPUSER_MTX_KMUTEX"
+.It Dv RUMPUSER_MTX_SPIN
+Create a spin mutex.
+Locking this type of mutex must not relinquish rump kernel context
+even when
+.Fn rumpuser_mutex_enter
+is used.
+.It Dv RUMPUSER_MTX_KMUTEX
+The mutex must track and be able to return the rump kernel thread
+that owns the mutex (if any).
+If this flag is not specified,
+.Fn rumpuser_mutex_owner
+will never be called for that particular mutex.
+.El
+.Pp
+.Ft void
+.Fn rumpuser_rw_init "struct rumpuser_rw **rwp"
+.Pp
+.Ft void
+.Fn rumpuser_rw_enter "struct rumpuser_rw *rw" "int writelock"
+.Pp
+.Ft int
+.Fn rumpuser_rw_tryenter "struct rumpuser_rw *rw" "int writelock"
+.Pp
+.Ft void
+.Fn rumpuser_rw_exit "struct rumpuser_rw *rw"
+.Pp
+.Ft void
+.Fn rumpuser_rw_destroy "struct rumpuser_rw *rw"
+.Pp
+.Ft void
+.Fn rumpuser_rw_held "struct rumpuser_rw *rw" "int *heldp"
+.Pp
+.Ft void
+.Fn rumpuser_rw_rdheld "struct rumpuser_rw *rw" "int *heldp"
+.Pp
+.Ft void
+.Fn rumpuser_rw_wrheld "struct rumpuser_rw *rw" "int *heldp"
+.Pp
+Read/write locks acquire an exclusive version of the lock if the
+.Fa writelock
+parameter is non-zero and a shared lock otherwise.
+.Pp
+.Pp
+.Ft void
+.Fn rumpuser_cv_init "struct rumpuser_cv **cvp"
+.Pp
+.Ft void
+.Fn rumpuser_cv_destroy "struct rumpuser_cv *cv"
+.Pp
+.Ft void
+.Fn rumpuser_cv_wait "struct rumpuser_cv *cv" "struct rumpuser_mtx *mtx"
+.Pp
+.Ft void
+.Fn rumpuser_cv_wait_nowrap "struct rumpuser_cv *cv" "struct rumpuser_mtx *mtx"
+.Pp
+.Ft int
+.Fo rumpuser_cv_timedwait
+.Fa "struct rumpuser_cv *cv" "struct rumpuser_mtx *mtx"
+.Fa "int64_t sec" "int64_t nsec"
+.Fc
+.Pp
+.Ft void
+.Fn rumpuser_cv_signal "struct rumpuser_cv *cv";
+.Pp
+.Ft void
+.Fn rumpuser_cv_broadcast "struct rumpuser_cv *cv";
+.Pp
+.Ft void
+.Fn rumpuser_cv_has_waiters "struct rumpuser_cv *cv" "int *waitersp"
+.Pp
+Condition variables wait for an event.
 The
-.Nm
-interface is still under development and interface documentation
-is available only in source form from
-.Pa src/lib/librumpuser .
+.Fa mtx
+interlock eliminates a race between checking the predicate and
+sleeping on the condition variable; the mutex should be released
+for the duration of the sleep in the normal atomic manner.
+The timedwait variant takes a specifier indicating a relative
+sleep duration after which the routine will return with
+.Er ETIMEDOUT .
+If a timedwait is signalled before the timeout expires, the
+routine will return 0.
+.Sh RETURN VALUES
+All routines which return an integer return an errno value.
+The hypervisor must translate the value to the the native errno
+namespace used by the rump kernel.
+Routines which do not return an integer may never fail.
 .Sh SEE ALSO
 .Xr rump 3
+.Rs
+.%A Antti Kantee
+.%D 2012
+.%J Aalto University Doctoral Dissertations
+.%T Flexible Operating System Internals: The Design and Implementation of the Anykernel and Rump Kernerls
+.Re

Reply via email to