Init (User space boot) reference for Chaos Calmer: procd
Analysis of how the user space part of the boot sequence is implemented in OpenWrt, Chaos Calmer release.
Procd replaces init
On a fully booted Chaos Calmer system, pid 1 is /sbin/procd
:
root@openwrt:~# ps PID USER VSZ STAT COMMAND 1 root 1440 S /sbin/procd ...
At boot, Linux kernel starts /sbin/init
as the first user process. In Chaos Calmer, /sbin/init
does the preinit/failsafe steps, those that depend only on the read-only partition in flashed image, then execs (that is: is replaced by) /sbin/procd
to continue boot as specified by the configuration in writable flash partition. Procd started as pid 1 assumes several roles: service manager, hotplug events handler; this as of February 2016, when this research was done. Procd techref wiki page at this point in time is a design document and work in progress, if you are reading here and know/understand procd's semantics and API, please update that page.
Procd sources:
http://git.openwrt.org/?p=project/procd.git;a=tree;hb=0da5bf2ff222d1a499172a6e09507388676b5a08
at the commit used to build the procd package in Chaos Calmer release:
PKG_SOURCE_VERSION:=0da5bf2ff222d1a499172a6e09507388676b5a08
/sbin/init
source:
http://git.openwrt.org/?p=project/procd.git;a=blob;f=initd/init.c;hb=0da5bf2ff222d1a499172a6e09507388676b5a08#l71
Life and death of a Chaos Calmer system
This is the source code path followed in logical order of execution by the processor in user space while booting Chaos Calmer.
All links to source repositories should show the code at the commit used in Chaos Calmer release.
Pathnames evaluated at preinit time when / is read only have “(/rom)” prepended, to signify the path where the file is found on a fully booted system.
main(int argc, char **argv)
in /sbin/init, line 71
User space life begins here. OpenWrt calls this phase “preinit”.early()
(definition)
Mount filesystems:/proc
,/sys
,/sys/fs/cgroup
,/dev
(a tmpfs),/dev/pts
Populate/dev
with entries from/sys/dev/{char;block}
Open/dev/console
as STDIN/STDOUT/STDERR
Make directories/tmp
(optionally on zram),/tmp/run
,tmp/lock
,/tmp/state
This accounts for most of the filesystem layout, observed that/etc/fstab
is a broken symlink, line 161, with the following additions:
-procd_coldplug()
invoked at hotplug setup time will recreate/dev
from scratch.
-/etc/rc.d/S10boot
will invokemount_root
to setup a writable filesystem based on extroot or jffs2 overlay or a tmpfs backed snapshot capable overlay, add some directories and files, and mount debugfs.- Fork
/sbin/kmodloader (/rom)/etc/modules-boot.d/
kmodloader source
Wait up to 120 seconds for/sbin/kmodloader
to probe the kernel modules declared in(/rom)/etc/modules-boot.d/
At this point in the boot sequence, '/etc/modules-boot.d' is the one from the rom image (/rom/etc/...
when boot is done). The overlay filesystem is mounted later.
kmodloader is a multicall binary, invoked as
kmodloader
does
main_loader()
which reads files in(/rom)/etc/modules-boot.d/
, looking for lines starting with the name of a module to load, optionally followed by a space and module parameters. There appear to be special treatment for files with names beginning with a number: the modules they list are immediately loaded, then modules from files with name beginning with an ascii char greater than “9” are loaded all together in a final load_modprobe call. uloop_init()
line 116 (definition)
Documentation of libubox/uloop.h says:
Uloop is a loop runner for i/o. Gets in charge of polling the different file descriptors you have added to it, gets in charge of running timers, and helps you manage child processes. Supports epoll and kqueue as event running backends.
uloop.c source in libubox says uloop's process management duty is assigned by a call to
int uloop_process_add(struct uloop_process *p)
p->pid
is the process id of a child process to monitor andp->cb
a pointer to a callback function.
When the managed child process will exit, uloop_run, running in parent context to receive SIGCHLD signal, will trigger execution of the callback.preinit()
line 117 (definition)- Forks a "plugd instance", line 94
/sbin/procd -h (/rom)/etc/hotplug-preinit.json
to listen to kernel uevents for any required firmware or for notification of button pressed, handled by(/rom)/etc/rc.button/failsafe
as the request to enter failsafe mode. A flag file/tmp/failsafe_button
containing the value of${BUTTON}
is created if failsafe has been requested. - Forks, at lines 106-111,
PREINIT=1 /bin/sh (/rom)/etc/preinit
a shell to execute(/rom)/etc/preinit
withPREINIT=1
in its environment. Submits the child process to uloop management with the callback
spawn_procd()
that will exec procd to replace init as pid 1 at completion of(/rom)/etc/preinit
./etc/preinit
A shell script, fully documented here preinit_operation. In short, parse files in(/rom)/lib/preinit
to build 5 lists of hooks and an environment, then run the hooks from some of the lists depending on the state of the environment.
One of the steps in a successful boot sequence is to mount the overlay file system with a hook setup by
(/rom)/lib/preinit/80_mount_root
to call
mount_root
which if extroot is not configured, mounts the writable data partition “rootfs_data” as overlay over the / partition “rootfs”. If the data partition is being prepared, overlays a tmpfs in ram.
Filesystem snapshots are supported; this is a feature listed in Barrier Breaker announce, shell wrapper is/sbin/snapshot
script. The “SNAPSHOT=magic
” environment variable is set inmount_snapshot()
line 330.
uloop_run()
, line 118
At exit of the(/rom)/etc/preinit
shell script, invokes the callback spawn_procd()
-
setsid()
, line 67
The process group ID and session ID of the calling process are set to the PID of the calling process: man 2 setsid See also man 7 credentials.procd_signal()
, line 69 (definition), line 82.
Setup signal handlers. Reboot on SIGTERM or SIGINT, poweroff on SIGUSR2 or SIGUSR2.trigger_init()
, line 70 (definition)
Procd triggers on config file/network interface changes, see procd_triggers_on_config_filenetwork_interface_changes
Initialise a run queue. An example is the sole documentation. A queued task has an uloop callback invoked when done, here sets the empty queue callback to do nothing.procd_state_next()
, line 74 (definition)
Transitions from NONE to EARLY the state of a state machine implemented instate_enter(void)
used to sequence the remaining boot steps.STATE_EARLY
instate_enter()
- Emits “- early -” to syslog,
- Initialise the watchdog,
hotplug(“/etc/hotplug.json”)
(definition)
User space device hotplugging handler setup.
Static variables in file scope are important. The filename of the script to execute is kept in hotplug.c global scope:static char * rule_file;
.
Opens a netlink socket (man 7 netlink) and handles the file descriptor to uloop, to listen to uevents: kernel messages informing userspace of kernel events. See https://www.kernel.org/doc/pending/hotplug.txt
The uloop instance in pid 1 uses epoll_wait to monitor file descriptors, the kernel netlink socket FD is one of them, and is instructed to invoke the callbackhotplug_handler()
on uevent arrival.
Thishotplug_handler
callback stays active after coldplug, and will handle all uevents the kernel will emit.procd_coldplug()
(definition)
Umounts/dev/pts
and/dev
, mounts a tmpfs on/dev
, creates directories/dev/shm
and/dev/pts
, forksudevtrigger
to reconstruct kernel uevents went unheard before netlink socket opening (“coldplug”).udevtrigger
Scans/sys/bus/*/devices
,/sys/class
; and/sys/block
if it isn't a subdir of/sys/class
, writing “add” to the uevent file of all devices. Then the kernel synthesizes an “add” uevent message on netlink. See Injecting events into hotplug via “uevent” in https://www.kernel.org/doc/pending/hotplug.txt
A callback chain,udevtrigger_complete()
followed bycoldplug_complete()
is attached to completion of the child udevtrigger process, such that the still to be reacheduloop_run()
in procdmain()
function, after all uevents will have been processed, will advance procd state to STATE_UBUS, line 31.
uloop_run
, line 75
Solicited by udevtrigger in another process, the kernel emits uevents and uloop invokes the user space hotplug handler: the callback-
- The
/etc/hotplug.json
script
- creates and removes devices files, assigns them permissions,
- loads firmware,
- handles buttons by calling scripts in/etc/rc.button/%BUTTON%
if the uevent has the “BUTTON
” value,
- and invokes/sbin/hotplug-call “%SUBSYSTEM%”
to handle all other subsystem related actions.
Subystems are: “platform” “net”, “input”, “usb”, “usbmisc”, “ieee1394”, “block”, “atm”, “zaptel”, “tty”, “button” (without BUTTON value, possible?), “usb-serial”. “usb-serial” is aliased to “tty” in hotplug.json.
Documentation of json script syntax? Offline. Use the source. It is the json representation of the abstract syntax tree of a script in a fairly intuitive scripting language.
There are 2 levels at which decisions are taken: hotplug.json acts as fast path executor or lightweight dispatcher, the subsystem scripts in /etc/hotplug.d/%SUBSYSTEM%/ do the heavy lifting.
Uevent messages from the kernel contain key-value pairs passed as environment variables to the scripts. The kernel function
int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
creates them. This link http://lxr.free-electrons.com/ident?v=3.18;i=add_uevent_var provides a list of all places in the Linux kernel where it is used. It is an authoritative reference of the upstream defined uevent variables. Button events are generated by the out of tree kernel modulesbutton-hotplug
gpio-button-hotplug
specific to OpenWrt./sbin/hotplug-call “%SUBSYSTEM%”
is a shell script that scans/etc/hotlug.d/%SUBSYSTEM%/*
and sources all scripts assigned to a subsystem. “button” subsystem is handled here if the uevent lacks the “BUTTON” value, unlikely or impossible?.
STATE_UBUS
At end of coldplug uevents processing, the callback coldplug_complete callsprocd_state_next
which results in advancing procd to STATE_UBUS.
“- ubus -” is logged to console, the services infrastructure is initialised, then procd schedules connect to after 1“ (line 67) and starts/sbin/ubus
as the system ubus service.
Transition to next state is triggered by the callbackubus_connect_cb
that at the end, line 118, callsprocd_state_ubus_connect()
, line 186, that callsprocd_state_next
to transition toSTATE_INIT
”- init -“ is logged,/etc/inittab
is parsed and entries
::askconsole:/bin/ash --login
::sysinit:/etc/init.d/rcS S boot
executed. inittab format is the same as the one from busybox (Busybox example inittab).
The ”sysinit
action“ handlerrunrc
instantiates a queue, whose empty handlerrcdone
will advance procd state.
runrc
ignores the process specification ”/etc/init.d/rcS
“ (there is no such a script!), and runsrcS(pattern="S" , param="boot", rcdone)
(line 159)
that invokes the equivalent of
_rc(&q, *path="/etc/rc.d", *file="S", *pattern="*", *param="boot")
to enqueue in glob sort order the scripts
/etc/rc.d/S* boot
with ”boot
“ as the action./etc/rc.d/S*
are symlinks made by rc.common enable to files in/etc/init.d
, that are shell scripts with the shebang#!/bin/sh /etc/rc.common
.
Invoking a/etc/rc.d/S*
script runsrc.common
that sources the /etc/rc.d/S* script to set up a context, then invokes the function named as the action parameter (”boot()
“), in that context.
STATE_RUNNING
Execution arrives here after rcS scripts are done.
”- init complete -“ is logged.
This is a stable state, keeping uloop_run in procd.c main() running, mostly waiting on epoll_wait. Upon receipt of a signal in SIGTERM, SIGINT (reboot), or SIGUSR2, SIGUSR2 (poweroff), procd transitions toSTATE_SHUTDOWN
”- shutdown -“ is logged, /etc/inittab shutdown entry is executed, and procd sleeps at line 169 while the kernel does poweroff or reboot.
-