thox process manager

This is accomplished using two tools we have at our disposal in Lua:

Coroutines are basically functions that can start and stop as needed. You give them thread control by “resuming” them, and they give you control back when they have a value for you, which is called “yielding”, when an error occurred and they didn’t catch it, or when the function terminated its execution. They do not run concurrently; in another language you might be familiar with, Python, has called them generators.

This, however, only sets ground for a cooperative multitasking system. In order to make it pre-emptive, we need to force the processes to yield after a certain time, e.g.:

local function hook_autoyield(type, arg)
        coroutine.yield(YIELD.PREEMPT)
end

debug.sethook(co, hook_autoyield, "", QUANTUM)

Debug hooks are actually defined on a per-coroutine basis, defining them for all subcoroutines can only be made by calling debug.sethook() when the subcoroutine is created. A proof-of-concept of this is the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
local t = coroutine.create(function ()
	debug.sethook(function (type, arg)
		print("{HOK} debug hook was called!")
	end, "", 5e4)

	for i = 1, 2e5 do
		if i % 5e4 == 0 then
			print("[lv2] currently at:", i)
		end
	end

	local st = coroutine.create(function ()
		for i = 1, 2e5 do
			if i % 5e4 == 0 then
				print("[lv3] currently at:", i)
			end
		end
	end)

	coroutine.resume(st)
end)

coroutine.resume(t)

for i = 1, 2e5 do
	if i % 5e4 == 0 then
		print("[lv1] currently at:", i)
	end
end

Which gives the following result:

../../_images/setcohook-demo.png

As you can see, the hook is only called for the coroutine it was set into.

System calls and coroutines

In order to set up all of this, the process manager and the various processes must be able to communicate. This is done thanks to coroutines and yielding. If you haven’t done it, please read the chapter about Coroutines.

As explained previously, a process is a coroutine wrapped with some information about when and how to resume it; see startup.Process.thread. The BIOS actually does the same thing; see the os.pullEventRaw() definition.

However, we want to do it better than the BIOS, by allowing the user to:

  • Make a system call from a nested coroutine within the process, without the need for every intermediate level in the coroutine stack to transmit it explicitly.

  • Have a coroutine waiting for an answer while others (or the upper level) can continue running.

In order to achieve this, thox hijacks the normal yielding system by replacing the coroutine.yield() and coroutine.resume() to add a special value at the beginning, invisible to the user, which represents the yield type amongst:

  • basic yield: this means the process has manually called its version of the coroutine.yield() function. This can also happen when the function calls coroutine.yield() from its main thread; this is then understood the same as a pre-empt yield.

  • preempt yield: this means the process has taken too long before yielding, and the auto-yielding function has fired because the quantum has been reached.

  • system call yield: this means the process has made a system call, amongst os.call(), os.answer(), os.bind() and os.unbind().

Note that the user doesn’t have access to the “real” coroutine functions, hidden behind the hijacked coroutine functions and the system calls described above. Note also that there is one yield that doesn’t include the special value, and that is the case when the coroutine dies with arguments; any code within these functions or the process manager must first check if the coroutine has died or not to know if the special value is present or not.

Based on these elements, you should now know what the hijacked coroutine functions bring on top of the system ones:

  • coroutine.yield() adds the pm.YIELD.BASIC special value before the arguments given by the user, and calls the real yield function.

  • coroutine.resume() resumes the coroutine using the real resuming function until it yields. Then:

    • If the coroutine has died, then it just returns the received arguments received from the real “resume” function to the caller, as the special value isn’t here.

    • Otherwise, if the coroutine has yielded, then it extracts the special value. If this value indicates a normal yield, then the arguments initially passed to coroutine.yield() to the caller.

    • Otherwise, it yields the arguments accompagnied with the special value in the same fashion that it has received them using the real “yield” function.

All of this behaviour is transparent to the user. Note that instead of calling coroutine.resume(), any code in the process can instead call os.capture(), which does the same thing except it does not automatically yield system calls but returns them to the caller, which can then imitate the system (useful for testing), transmit the calls and answers while sharing time between coroutines in a collaborative fashion (useful for making simple daemons), log system calls and answers in strace fashion, and so on.

Todo

Explain sandboxing, referencing context switching (use the link below).

Also, I should shift from loading the program outside with a distinct environment to start a process directly, and setup the sandbox and load the file from within the created function! This would allow me more liberties and make the whole thing more simple; this means that the startup.ProcessManager:add() method would take a function instead of a program, and just run it inside a coroutine and that’s all.

Startup and main loop

At computer startup, the BIOS is loaded (usually CraftOS, present in the ROM; see bios.lua). Then, if the related ComputerCraft settings are set appropriately, the startup.lua file from thox is executed; the related setting is either shell.allow_startup if thox is installed on the computer’s main disk, or shell.allow_disk_startup if thox is installed on a disk in a peripheral disk drive.

Once started, the startup script from thox creates the process manager, named pm, spawns the initial processes (i.e. basic drivers and the init script) using startup.ProcessManager.spawn(), then enters the main loop through startup.ProcessManager:run().

The main loop consists of the following steps:

  1. Check on each process.

  2. Remove the zombies. If there are no processes left, exit the main loop and shutdown (this is not considered normal).

  3. Read hardware events during:

    • During the hardware quantum if some processes are running; i.e. the game tick duration, .05 seconds.

    • Otherwise, until the next alarm trigger or the next hardware events that occurs and is pulled by at least one process.

Step 1 runs as follows:

  1. If the process has any alarm set, check if any has already gone past. If that’s the case, add an event for each alarm that has gone past (as an answer to the call using its CID).

  2. If the process is currently waiting and there is an event waiting for the process to receive, set the answer to the received event (as an os.Event) and set the process to be running.

  3. Then, if the process is currently running:

    1. Resume it using the latest answer to give it to.

    2. Check the status of the coroutine:

      • If the process has died (the task is finished, or an uncaught error has occurred), set the process to be a zombie.

      • If the process has yielded normally (the user has called a raw coroutine.yield()) or has been pre-empted, we ignore the call and continue on with our loop.

      • If the process has emitted a call, we attribute a system-wide unique identifier to that call, and transmit the request to the appropriate process; if there was none, set the next answer to false.

      • TODO: bind and answer?

Todo

How does this procedure manage the events? Event filtering? Function binding? Too many calls? Priorities (if any)? Passing calls? Alarms? Many things are to define in this process.

What about hanging calls when a process is killed? Is there a possibility to cancel a call once started, either by process will or when the process is killed?