第 3 章 系统初始化

目录

3.1. 启动过程概述
3.1.1. 第一阶段:BIOS
3.1.2. 第二阶段:引载加载程序
3.1.3. 第三阶段:迷你 Debian 系统
3.1.4. 第四阶段:常规 Debian 系统
3.2. Systemd init
3.2.1. 主机名
3.2.2. 文件系统
3.2.3. 网络接口初始化
3.2.4. 内核消息
3.2.5. 系统消息
3.2.6. System management under systemd
3.2.7. Customizing systemd
3.3. udev 系统
3.3.1. 内核模块初始化

作为系统管理员,粗略地了解 Debian 系统的启动和配置方式是明智的。尽管准确的细节在安装的软件包及对应的文档中,但这些知识对我们大多数人来说都是必须掌握的。

笔者基于自己和其他人的过往及现在的知识,尽己所能地提供关于 Debian 系统的知识要点及其配置的快速概览作为读者的参考。由于 Debian 系统在不断地更新中,系统的状况可能已经有所变化。在对系统做任何修改之前,请参考各个软件包的最新文档。

[提示] 提示

bootup(7) describes the system bootup process based on systemd . (Recent Debian)

[提示] 提示

boot(7) describes the system bootup process based on UNIX System V Release 4. (Older Debian)

计算机系统从上电事件到能为用户提供完整的操作系统(OS)功能为止,需要经历几个阶段的启动过程

为简便起见,笔者将讨论范围限定在具有默认安装的典型 PC 平台上。

典型的启动过程像是一个四级的火箭。每一级火箭将系统控制权交给下一级。

当然,这些阶段可以有不同的配置。比如,你编译了自己的内核,则可能会跳过迷你 Debian 系统的步骤。因此,在读者亲自确认之前,请勿假定自己系统的情况也是如此。

[注意] 注意

对于 SUN 或 Macintosh 系统等非传统 PC 平台来说,ROM 上的 BIOS 及磁盘上的分区可能大不相同(第 9.5.2 节 “硬盘分区配置”)。对于这种情况,请另寻对应平台相关的文档。

BIOS 是启动过程的第一阶段,在上电事件后开始。CPU 的程序计数器在上电事件后被初始化为一个特定的内存地址,驻留在只读存储器(ROM)中的 BIOS 就是从这个特定的内存地址开始执行。

BIOS 执行硬件的基本初始化(POST: 上电自检)并将系统控制权交给你指定的下一步骤。BIOS 通常和硬件一同提供。

BIOS 启动屏幕通常指示了进入 BIOS 配置界面所需的按键。流行的按键是 F1、F2、F10、Esc、Ins 和 Del 键。假如你的启动屏幕被一个漂亮的图形界面隐藏,你可以按下某些按键(比如 ESC)取消隐藏。这些按键高度依赖于硬件。

硬件位置和 BIOS 启动的代码的优先级可以在 BIOS 配置界面中选择。通常,在已选择的设备(硬盘、软件、CD-ROM……)中,最先找到的设备的最开始的几个扇区将被加载到内存,并执行其中的初始化代码。初始化代码可以是以下任意一种。

  • 引导加载代码

  • 类似 FreeDOS 这样的过滤型操作系统的内核代码

  • 能够加载到如此小的空间中的目标操作系统的内核代码

通常,系统从主硬件的特定分区中引导。传统 PC 硬盘的最开始两个扇区中包含了主引导记录(MBR)。在 MBR 的末尾记录了磁盘分区信息及引导选择。BIOS 中执行的首段引导加载代码占据了 MBR 的其余部分。

引导加载程序是启动过程的第二阶段,由 BIOS 启动。引导加载程序将系统内核映像和 initrd 映像加载到内存并将控制权交给它们。initrd 映像是根文件系统映像,其支持程度依赖于所使用的引导加载程序。

Debian 系统通常使用 Linux 内核作为其默认的系统内核。当前 2.6/3.x 版本 Linux 内核的 initrd 映像从技术上说是 initramfs(初始化 RAM 文件系统)映像。initramfs 映像是根文件系统中所有文件的 cpio 归档再经过 gzip 压缩得到。

[警告] 警告

使用新的 multi-segment initramfs 之后,上述内容已不正确。请参见错误 #790100

Debian 系统默认将 PC 平台的 GRUB 引导加载程序的第一阶段代码安装在 MBR 中。可用的引导加载程序和配置选项如下。


[警告] 警告

Do not play with boot loaders without having bootable rescue media (USB memory stick, CD or floppy) created from images in the grub-rescue-pc package. It makes you boot your system even without functioning bootloader on the hard disk.

传统 GRUB 的菜单配置文件位于 /boot/grub/menu.lst。例如,文件中有如下的配置条目。

title           Debian GNU/Linux
root            (hd0,2)
kernel          /vmlinuz root=/dev/hda3 ro
initrd          /initrd.img

GRUB 第 2 版的菜单配置文件位于 /boot/grub/grub.cfg。此文件由 /usr/sbin/update-grub 根据 "/etc/grub.d/*" 中的模板及 "/etc/default/grub" 中的设置自动生成。例如,文件中有如下的配置条目。

menuentry "Debian GNU/Linux" {
        set root=(hd0,3)
        linux /vmlinuz root=/dev/hda3
        initrd /initrd.img
}

这些示例中,GRUB 参数的含义如下。


[注意] 注意

传统 GRUB 使用的分区号为 Linux 内核及各种实用工具使用的分区号减 1。GRUB 第 2 版修复了这个问题。

[提示] 提示

在标识一个块设备时,可能需要使用 UUID(参见第 9.5.3 节 “使用 UUID 访问分区”)而不是类似 "/dev/hda3" 这样的文件名,例如 "root=UUID=81b289d5-4341-4003-9602-e254a17ac232 ro"。

[提示] 提示

如果使用了 GRUB,内核的启动参数可以在 /boot/grub/grub.cfg 里面设置。在 Debian 系统里,你不应该直接编辑 /boot/grub/grub.cfg。你可以通过编辑 /etc/default/grub 文件中 GRUB_CMDLINE_LINUX_DEFAULT 的值并运行 update-grub(8) 来更新 /boot/grub/grub.cfg

[提示] 提示

通过使用链式引导技术,你可以在一个引导装载程序中启动另一个引导装载程序。

参见 “info grub” 及 grub-install(8)

迷你 Debian 系统是启动流程的第三阶段,由引导加载程序启动。它会在内存中运行系统内核和根文件系统。这是启动流程的一个可选准备阶段。

[注意] 注意

“迷你 Debian 系统”是笔者自创的术语,用于在本文档中描述启动流程的第三个阶段。这个系统通常被称为 initrd 或 initramfs 系统。内存中类似的系统在 Debian 安装程序中使用。

The "/init" program is executed as the first program in this root filesystem on the memory. It is a program which initializes the kernel in user space and hands control over to the next stage. This mini-Debian system offers flexibility to the boot process such as adding kernel modules before the main boot process or mounting the root filesystem as an encrypted one.

  • The "/init" program is a shell script program if initramfs was created by initramfs-tools. You can interrupt this part of the boot process to gain root shell by providing "break=init" etc. to the kernel boot parameter. See the "/init" script for more break conditions. This shell environment is sophisticated enough to make a good inspection of your machine's hardware. Commands available in this mini-Debian system are stripped down ones and mainly provided by a GNU tool called busybox(1).

  • The "/init" program is a binary systemd program if initramfs was created by dracut. ** Commands available in this mini-Debian system are stripped down systemd(1) environment.

[小心] 小心

当在一个只读的根文件系统上时,使用 mount 命令需要添加 -n 选项。

常规 Debian 系统是启动流程的第四阶段,由迷你 Debian 系统启动。迷你 Debian 系统的内核在此环境下继续运行。根文件系统将由内存切换到实际的硬盘文件系统上。

init 程序是系统执行的第一个程序(PID=1),它启动其它各种程序以完成主引导流程。init 程序的默认路径是 ”/sbin/init“,但可通过内核启动参数修改,例如 ”init=/path/to/init_program"。

默认的 init 程序一直在变化中:

  • squeeze 之前的 Debian,使用简单的 SysV 风格的 init。

  • wheezy 版本的 Debian 对 SysV 风格的 init 做了改进:使用 LSB 头将启动步骤排序,同时并行执行启动脚本。

  • jessie版本的 Debian 将默认 init 切换成 systemd,以使用事件驱动和并行初始化。

[提示] 提示

你的系统中实际使用的 init 命令可以使用 “ps --pid 1 -f” 命令确认。

[提示] 提示

"/sbin/init" is symlinked to "/lib/systemd/systemd" after Debian jessie.


[提示] 提示

有关启动流程加速的最新信息,请参见 Debian 维基:启动流程加速词条。

This section describes how system is started by the systemd(1) program with PID=1 (i.e., init process).

The systemd init process spawns processes in parallel based on the unit configuration files (see systemd.unit(5)) which are written in declarative style instead of SysV-like procedural style. These are loaded from a set of paths (see systemd-system.conf(5)) as follows:

  • "/lib/systemd/system": OS default configuration files

  • "/etc/systemd/system": system administrator configuration files which override the OS default configuration files

  • "/run/systemd/system": run-time generated configuration files which override the installed configuration files

Their inter-dependencies are specified by the directives "Wants=", "Requires=", "Before=", "After=", … (see "MAPPING OF UNIT PROPERTIES TO THEIR INVERSES" in systemd.unit(5)). The resource controls are also defined (see systemd.resource-control(5)).

The suffix of the unit configuration file encodes their types as:

  • *.service describes the process controlled and supervised by systemd. See systemd.service(5).

  • *.device describes the device exposed in the sysfs(5) as udev(7) device tree. See systemd.device(5).

  • *.mount describes the file system mount point controlled and supervised by systemd. See systemd.mount(5).

  • *.automount describes the file system auto mount point controlled and supervised by systemd. See systemd.automount(5).

  • *.swap describes the swap device or file controlled and supervised by systemd. See systemd.swap(5).

  • *.path describes the path monitored by systemd for path-based activation. See systemd.path(5).

  • *.socket describes the socket controlled and supervised by systemd for socket-based activation. See systemd.socket(5).

  • *.timer describes the timer controlled and supervised by systemd for timer-based activation. See systemd.timer(5).

  • *.slice manages resources with the cgroups(7). See systemd.slice(5).

  • *.scope is created programmatically using the bus interfaces of systemd to manages a set of system processes. See systemd.scope(5).

  • *.target groups other unit configuration files to create the synchronization point during start-up. See systemd.target(5).

Upon system start up (i.e., init), the systemd process tries to start the "/lib/systemd/system/default.target (normally symlinked to "graphical.target"). First, some special target units (see systemd.special(7)) such as "local-fs.target", "swap.target" and "cryptsetup.target" are pulled in to mount the filesystems. Then, other target units are also pulled in by the target unit dependencies. For details, read bootup(7).

systemd offers backward compatibility features. SysV-style boot scripts in "/etc/init.d/rc[0123456S].d/[KS]<name>" are still parsed and telinit(8) is translated into systemd unit activation requests.

[小心] 小心

Emulated runlevel 2 to 4 are all symlinked to the same "multi-user.target".

The mount options of normal disk and network filesystems are set in "/etc/fstab". See fstab(5) and 第 9.5.7 节 “通过挂载选项优化文件系统”.

The configuration of the encrypted filesystem is set in "/etc/crypttab". See crypttab(5)

The configuration of software RAID with mdadm(8) is set in "/etc/mdadm/mdadm.conf". See mdadm.conf(5).

[警告] 警告

每次启动的时候,在挂载了所有文件系统以后,"/tmp", "/var/lock", 和 "/var/run" 中的临时文件会被清空。

The systemd offers not only init system but also generic system management functionalities such as journal logging, login management, time management, network management. etc..

The systemd(1) is managed by several commands:

  • the systemctl(1) command controls the systemd system and service manager (CLI),

  • the systemsdm(1) command controls the systemd system and service manager (GUI),

  • the journalctl(1) command queries the systemd journal,

  • the loginctl(1) command controls the systemd login manager, and

  • the systemd-analyze(1) analyzes system boot-up performance.

Here are a list of typical systemd management command snippets. For the exact meanings, please read the pertinent manpages.

表 3.5. List of typical systemd management command snippets

Operation Type Command snippets
GUI for service manager GUI "systemadm" (systemd-ui package)
List all target unit configuration Unit "systemctl list-units --type=target"
List all service unit configuration Unit "systemctl list-units --type=service"
List all unit configuration types Unit "systemctl list-units --type=help"
List all socket units in memory Unit "systemctl list-sockets"
List all timer units in memory Unit "systemctl list-timers"
Start "$unit" Unit "systemctl start $unit"
Stop "$unit" Unit "systemctl stop $unit"
Reload service-specific configuration Unit "systemctl reload $unit"
Stop and start all "$unit" Unit "systemctl restart $unit"
Start "$unit" and stop all others Unit "systemctl isolate $unit"
Switch to "graphical" (GUI system) Unit "systemctl isolate graphical"
Switch to "multi-user" (CLI system) Unit "systemctl isolate multi-user"
Switch to "rescue" (single user CLI system) Unit "systemctl isolate rescue"
Send kill signal to "$unit" Unit "systemctl kill $unit"
Send kill signal to "$unit" Unit "systemctl kill $unit"
Check if "$unit" service is active Unit "systemctl is-active $unit"
Check if "$unit" service is failed Unit "systemctl is-failed $unit"
Check status of "$unit|$PID|device" Unit "systemctl status $unit|$PID|$device"
Show properties of "$unit|$job" Unit "systemctl show $unit|$job"
Reset failed "$unit" Unit "systemctl reset-failed $unit"
List dependency of all unit services Unit "systemctl list-dependencies --all"
List unit files installed on the system Unit file "systemctl list-unit-files"
Enable "$unit" (add symlink) Unit file "systemctl enable $unit"
Disable "$unit" (remove symlink) Unit file "systemctl disable $unit"
Unmask "$unit" (remove symlink to "/dev/null") Unit file "systemctl unmask $unit"
Mask "$unit" (add symlink to "/dev/null") Unit file "systemctl mask $unit"
Get default-target setting Unit file "systemctl get-default"
Set default-target to "graphical" (GUI system) Unit file "systemctl set-default graphical"
Set default-target to "multi-user" (CLI system) Unit file "systemctl set-default multi-user"
Show job environment Environment "systemctl show-environment"
Set job environment "variable" to "value" Environment "systemctl set-environment variable=value"
Unset job environment "variable" Environment "systemctl unset-environment variable"
Reload all unit files and daemons Lifecycle "systemctl daemon-reload"
Shut down the system System "systemctl poweroff"
Shut down and reboot the system System "systemctl reboot"
Suspend the system System "systemctl suspend"
Hibernate the system System "systemctl hibernate"
View job log of "$unit" Journal "journalctl -u $unit"
View job log of "$unit" ("tail -f" style) Journal "journalctl -u $unit -f"
Show time spent for each initialization steps Analyze "systemd-analyze time"
List of all units by the time to initialize Analyze "systemd-analyze blame"
Load and detect errors in "$unit" file Analyze "systemd-analyze verify $unit"
Track boot process by the cgroups(7) Cgroup "systemd-cgls"
Track boot process by the cgroups(7) Cgroup "ps xawf -eo pid,user,cgroup,args"
Track boot process by the cgroups(7) Cgroup Read sysfs under "/sys/fs/cgroup/systemd/"

Here, "$unit" in the above examples may be a single unit name (suffix such as .service and .target are optional) or, in many cases, multiple unit specifications (shell-style globs "*", "?", "[]" using fnmatch(3) which will be matched against the primary names of all units currently in memory).

System state changing commands in the above examples are typically preceded by the "sudo" to attain the required administrative privilege.

The output of the "systemctl status $unit|$PID|$device" uses color of the dot ("●") to summarize the unit state at a glance.

  • White "●" indicates an "inactive" or "deactivating" state.

  • Red "●" indicates a "failed" or "error" state.

  • Green "●" indicates an "active", "reloading" or "activating" state.

With default installation, many network services (see 第 6 章 网络应用) are started as daemon processes after network.target at boot time by systemd. The "sshd" is no exception. Let's change this to on-demand start of "sshd" as a customization example.

First, disable system installed service unit.

 $ sudo systemctl stop sshd.service
 $ sudo systemctl mask sshd.service

The on-demand socket activation system of the classic Unix services was through the indetd superserver. Under systemd, the equivalent can be enabled by adding *.socket and *.service unit configuration files.

sshd.socket for specifying a socket to listen on

[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=22
Accept=yes

[Install]
WantedBy=sockets.target

sshd@.service as the matching service file of sshd.socket

[Unit]
Description=SSH Per-Connection Server

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket

Then reload.

 $ sudo systemctl daemon-reload

Linux 内核 2.6 和更新的内核,udev 系统 提供了自动硬件发现和初始化机制。(参见 udev(7)).在内核发现每个设备的基础上,udev 系统使用从 sysfs 文件系统 (参见 第 1.2.12 节 “procfs 和 sysfs”)的信息启动一个用户进程,使用 modprobe(8) 程序 (参见 第 3.3.1 节 “内核模块初始化”)加载支持它所要求的内核模块, 创建相应的设备节点。

[提示] 提示

如果由于某些理由,"/lib/modules/<kernel-version>/modules.dep"没有被 depmod(8) 正常生成,模块可能不会被 udev 系统按期望的方式加载。执行"depmod -a" 来修复它。

设备节点的名字,可以通过"/etc/udev/rules.d/"里的 udev 文件来配置.当前默认的规则倾向创建动态生成的名字,除了光驱和网络设备外,会生成非静态的设备名。通过添加和光驱、网络设备类似的个性化规则,你也可以为 USB 盘之类的其它设备,生成静态设备名。 参见 "Writing udev rules" 或 "/usr/share/doc/udev/writing_udev_rules/index.html".

由于 udev 系统是一个正在变化的事物,我在其它文档进行了详细描述,在这里只提供了最少的信息。

[提示] 提示

"/etc/fstab"里面的挂载规则,设备节点不必需是静态的。你能够使用 UUID 来挂载设备,来代替"/dev/sda"之类的设备名. 参见 第 9.5.3 节 “使用 UUID 访问分区”.

通过 modprobe(8) 程序添加和删除内核模块,使我们能够从用户进程来配置正在运行的 Linux 内核。udev 系统(参见 第 3.3 节 “udev 系统”)自动化它的调用来帮助内核模块初始化。

下面的非硬件模块和特殊的硬件驱动模块,需要被预先加载,把它们在"/etc/modules"文件里列出 (参见 modules(5)).

modprobe(8) 程序的配置文件是按 modprobe.conf(5)的说明放在"/etc/modprobes.d/" 目录下,(如果你想避免自动加载某些内核模块,考虑把它们作为黑名单放在"/etc/modprobes.d/blacklist" 文件里.)

"/lib/modules/<version>/modules.dep" 文件由 depmod(8) 程序生成,它描述了 modprobe(8) 程序使用的模块依赖性.

[注意] 注意

如果你在启动时出现模块加载问题,或者 modprobe(8)时出现模块加载问题, "depmod -a" 可以通过重构"modules.dep"来解决这些问题。

modinfo(8) 程序显示 Linux 内核模块信息。

lsmod(8) 程序以好看的格式展示"/proc/modules"的内容,显示当前内核加载了哪些模块。

[提示] 提示

你能够精确识别你系统上的硬件。 参见第 9.4.3 节 “硬件识别”.

[提示] 提示

你可以在启动时配置硬件来激活期望的硬件特征。参见 第 9.4.4 节 “硬件配置”.

[提示] 提示

你可以重新编译内核来增加你的特殊设备的支持。参见 第 9.9 节 “内核”.