thox concepts: contexts and interactions¶
Talk about processes themselves.
Inter-process communication (IPC) serves, from a process’s point of view, for communicating with the hardware and other processes, allowing them to cooperate to accomplish a task. It is accomplished in thox using two mechanisms:
Remote procedure call (RPC) calls.
Message queues (not implemented yet).
These mechanisms take place in contexts. Contexts are security objects in which these two interactions take place; they allow processes to manage these elements, amongst others:
Sandboxing of given processes (by whitelisting or blacklisting RPC endpoints they can call).
Logging (by running the process from a process “watcher”).
Compatibility with older thox processes (by converting older RPC endpoints into new ones); see seccomp for a real-world example of such a concept.
They can also correspond to objects, such as file handles or network connections, where the handle is guaranteed to the daemon to be closed since references to given contexts are closed automatically when a process exits.
Regarding contexts, a process:
A context is owned by the process which has created it using
os.context(). This process manages the context, that means it
manages the routing of the RPC calls and messages, thus the security of it
in case it shares this context with multiple processes.
The initial process gets a context managed by the process manager; the initial process can then create contexts for its children, depending on the security it wants to setup. See systemd for more information.
Message queues in a different namespace, also in contexts, with the following functions, probably:
os.push(ctx, name, info)
This requires similar utilities to RPC calls, as we also need routing (although while calls must arrive to one desintation, messages can arrive to multiple).
What happens on OS shutdown for example, in which order are processes closed if they need to close processes? Can processes have a “kill” event in order to be able to make their last actions (e.g. sending disconnect messages on network connections)?
thox processes are event-driven; the process manager prepares the events for the process to read, with an optional filter depending on what it expects.
Events can be the following:
RPC events, such as calls received by the process and answers received from previous calls.
Messages from messages queues the process has subscribed to. These messages can represent “real world” events, etc.
Events are represented using
os.Event objects, and are pulled
os.pull(). It is possible to pull specific events, such as
the answer to a specific call, by specifying additional parameters to
os.pull(); by default, it returns the oldest event not gathered
by the process.
What if there are too many events? Where is the limit? What happens?
Remote Procedure Call (RPC)¶
thos processes communicate mainly in a one-to-one fashion using a remote procedure call protocol.
Since RPC in thox is asynchronous, what about cancelling calls? Some calls might never end, and this might clog up the process’ call space, there should be a timeout mechanism, or we could let the process manage its timeouts using alarms but let it cancel the call. But what about what happens for the process at the other end of the RPC call? This would be a hang up, but wouldn’t it be simpler to just let the RPC call open for it and simply not transmit the answer? But then should we alert it of the hang up?
Also, what if cancelling is used to DOS another process making complex operations or lots of I/O behind to make it work? e.g. instead of being limited to the max. number of calls I can emit, I make a lot of calls, cancel all of them, then do it again in a loop. The process scheduler might see this as time-sharing friendly, so it might give it more time to make more system calls compared to how it is managed on the other end.
Picture three processes P1, P2 and P3, where P3 manages the default context of both processes P1 and P2. P1 wants to execute an action using this protocool, and P2 has this function available and wants any other processes using its default context to be able to run it.
In order to represent the action, thox uses RPC names such as
my.super.function. When started up, P3 first decides, either for each
action or globally, what it wants to do. Some common possibilities are:
It provides a fix set of functions, and does not provide any mechanisms to “bind” functions.
It transmits all calls from a given context to another, e.g. calls from a context it created to its default context.
It allows RPC name binding.
The basic context most processes on thox will encounter is the context
provided by initd (see systemd for more information).
This context allows binding through its
With binding, daemons such as P2 can then route specific RPC calls done on
its default context to itself, which means that subsequent calls by any
my.super.function on the context provided by P3 will result
in P2 receiving a call from the said process. Therefore, when making a
my.super.function to its default context, P1 will receive an
answer from P2.
What happens in order during the call is the following:
P1 calls the procedure using the
rpccall. This actually emits a call to the system using
os.call(), which returns a token in the form of a numerical Call IDentifier (CID).
P3 gets a call event, bundled with the CID with which to answer, the arguments given by P1, and some additional request information. It finds out that P2 is bound to the given name, and transmits the call to it using
P2 gets a call event, bundled with the CID with which to answer, the arguments given by P1, and some additional request information.
P2 treats the request accordingly.
P2 emits an answer using
os.answer(), passing the CID to it, optionally followed by some return values.
P1 gets an answer event, with the CID (to distinguish the call to which the answer is for, in case P1 has sent multiple calls).
Remote Procedure Names (RPN)¶
Remote Procedure Names (RPN) are names to which remote procedure calls are emitted. They are repsented as a dot-joined collection of one or more name components, which are non-empty strings of letters and digits, not beginning with a digit and not being one of the following reserved words:
A Remote Procedure Name:
Can end with an underscore. Note that only the last name component can end with an underscore; for example,
fs_.openis not allowed.
Must neither end with an empty component (i.e. with a dot) nor must the last component be composed solely of an underscore; for example,
fs.open_are allowed, where
Is of non-zero arbitrary length 1.
Is case-insensitive, which implies that the callee will receive a lower-cased version of it; e.g.
fs.getspaceleftwill all be received by the callee as
Note that the underscore has special significance; see Sharing contexts.
The regex for validating names as used for RPC (which requires negative lookaheads) is the following:
A Lua function to validate and return the canonicalized RPC name can be
Some valid and invalid identifiers are the following:
The rationale behind this definition is to be able to integrate these
identifiers into native code using the
os.rpc prefix, for example
os.rpc.sleep(5) to emit a synchronous call to the
Case insentivity is explained by the confusion that the system-wide
fs.getspaceleft could generate,
leading to potential security problems; see typosquatting for a real world
problem alike what this mitigation is addressing.
Notice that while this API is asynchronous, most of the time, processes will
want to call RPC functions synchronously; to make this more accessible,
one can use the
RPC calls always return a status code as a number as the first argument.
This status code should always be defined between 0 and 255; when a
status code provided to
os.answer() is not provided
within those bounds, the status code will be set to
Special status codes are the following:
SUCCESS(0): returned when the call has succeeded.
UNBOUND(253): returned when the name should be considered as unbound.
UNANSWERED(254): returned when a call has been unanswered. This can be due to it not being picked up from a full event queue, or to bad routing leading a process owning a context to forward it to self.
UNKNOWN(255): returned when an invalid status code has been provided to