Project

General

Profile

Bug #9960

Boot delay/issues because of limited entropy

Added by Carlo Landmeter 4 months ago. Updated 17 days ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Boot sequence
Target version:
Start date:
02/04/2019
Due date:
% Done:

100%

Estimated time:
Affected versions:
Security IDs:

Description

In Alpine Linux 3.9, the booting process may be slowed down by entropy generation.

This is because RDRAND (entropy gathering that requires trusting the CPU) is disabled by default.
This decision was made due to a lack of consensus as to whether or not the hardware can be trusted to perform randomness generation (a security-critical task).

It is possible to re-enable it through the kernel command line as so: 'random.trust_cpu=on'.
If you trust the CPU manufacturer, add 'random.trust_cpu=on' to your kernel command line using the configuration of your boot manager.
If you do not, but still wish to gain a faster boot speed, you may consider haveged or similar entropy-generating daemons.

We already discussed on IRC how we could work around this issue by detecting entropy in the installer but this would not cover users who are upgrading.
Other ways would be to alarm the user at boot when entropy is too low and services would be slow or fail to start.

Associated revisions

Revision 3dab4b17 (diff)
Added by Natanael Copa 20 days ago

main/linux-vanilla: upgrade to 4.19.36

also enable CONFIG_RANDOM_TRUST_CPU
https://askubuntu.com/questions/1070433/will-ubuntu-enable-random-trust-cpu-in-the-kernel-and-what-would-be-the-effect/1071196#1071196

fixes #9960

(cherry picked from commit e67c2f8bcb163695a5917e059a2c7ba46726ee89)

History

#1 Updated by Jake Buchholz 3 months ago

My workaround was to add haveged to boot runlevel, add an /etc/local.d/99_stop-haveged.start, and add local to the default runlevel.

https://github.com/mcrute/alpine-ec2-ami/issues/39

#2 Updated by John Longe 3 months ago

I'm using Alpine for VMs. "random.trust_cpu=on" didn't work and neither did haveged or rngd. So far, the only thing that seems to be working is adding a graphical Spice console to the VM and bashing a lot of random letters on my keyboard in it, supposedly feeding it entropy.

I was told in the IRC that that boot option doesn't work for VMs. In Alpine's current kernel configs, there is a line: "# CONFIG_RANDOM_TRUST_CPU is not set". I'm not sure what that implies. Does that mean the boot option won't work in any case?

This basically renders my VMs to be unbootable when unattended, so this is important to me.

#3 Updated by Olivier Duclos 3 months ago

John Longe wrote:

neither did haveged or rngd

Are you sure you added haveged to the boot runlevel? Because it worked for me an Jake.

While I would be willing to enable CONFIG_RANDOM_TRUST_CPU, I have noticed this option is only available for X86, S390 and PPC. It does not support ARM. So it is not an ideal solution.

#4 Updated by Marnix Rijnart 3 months ago

This issue also affects Alpine on Raspberry Pi. The kernel command line option random.trust_cpu=on does not work because it's ARM.

RPi's have a hardware rng that rngd can use to fix the boot delay. Is it an idea to add and enable it on rpi images by default so users won't have to figure this out by themself?

Something like apks="$apks rng-tools" in mkimg.arm.sh and then something in setup-alpine to (optionally?) add rngd to runlevel boot?

#5 Updated by Natanael Copa 3 months ago

i wonder if it helps to modprobe intel-rng?

#6 Updated by Natanael Copa 3 months ago

  • Target version changed from 3.9.1 to 3.9.2

#7 Updated by Natanael Copa 3 months ago

  • Target version changed from 3.9.2 to 3.9.3

#8 Updated by Milan P. Stanić 2 months ago

Maybe we could add option to setup-bootable to ask user who installs Alpine
with something like this:
Do you trust your CPU manufacturer on it's implementation of the Random Number Generator [y/n]
If the answer is no then next question could be:
Do you want to install haveged (Software Random Number Generator) [y/n]

To elaborate little, setup-bootable can be extended to check if there is /dev/hwrng
(be it RDRAND or TPM) and if it find it ask first question, and if not just ask second one.

Something similar could be done for update-extlinux, i.e. to check /etc/update-extlinux.conf
for existence of random.trust_cpu parameter, and if it is skip this question

#9 Updated by Natanael Copa about 2 months ago

  • Target version changed from 3.9.3 to 3.9.4

#10 Updated by Rolando A about 1 month ago

Got the delay behavior using KVM; clean install working just fine after upgrade the whole system and the new kernel was installed it stucks on: "caching dependencies" unless I attach a console and start to send random keys to feed the entropy. Tried several workarounds without luck, and as for others, this turns all my alpine VMs (more than 50) useless if they reboot for any reason after the upgrade.

#11 Updated by Henrik Riomar about 1 month ago

Rolando A wrote:

Got the delay behavior using KVM;

does your kvm host provide entropy the guests with VirtIORNG? (https://wiki.qemu.org/Features/VirtIORNG)

this turns all my alpine VMs (more than 50) useless

do you have any non alpine VMs running Linux 4.19, that does not show this problem?

#12 Updated by John Longe about 1 month ago

Hi Henrik, thank you for your efforts so far.

does your kvm host provide entropy the guests with VirtIORNG?

My machines are using virtio-rng. The host machine seems to have an entropy_avail of ~3650 on average (which is good, no?).

do you have any non alpine VMs running Linux 4.19, that does not show this problem?

I can't help you here. All my machines are running Alpine or OpenBSD.

#13 Updated by Henrik Riomar about 1 month ago

The host machine seems to have an entropy_avail of ~3650 on average (which is good, no?).

over 3000 in host is good, if it can sustain it when it starts handing it out to guest.

do you have the virtio-rng kernel module loaded in the guest?

Can you provide me with the output of

# cat /sys/devices/virtual/misc/hw_random/rng_available

# cat /sys/devices/virtual/misc/hw_random/rng_current

# rngd --rng-device /dev/hwrng -f

#14 Updated by John Longe about 1 month ago

# cat /sys/devices/virtual/misc/hw_random/rng_available
virtio_rng.0
# cat /sys/devices/virtual/misc/hw_random/rng_current
virtio_rng.0
# rngd --rng-device /dev/hwrng -f
Initalizing available sources
Failed to init entropy source 1: TPM RNG Device
Failed to init entropy source 2: Intel RDRAND Instruction RNG

I've added rngd to runlevel boot. Boot is still delayed until dmesg reports "random: crng init done" -- so far only sped up by bashing keys into the console. Once I can login, the guest also has >3000 entropy.

#15 Updated by Henrik Riomar about 1 month ago

output above looks fine, you would get "read error" if /dev/hwrng was not working.

When rngd is running (after the slow bad boot) how does /proc/sys/kernel/random/entropy_avail look in the guest? how does it look if you do

dd if=/dev/random of=/dev/null

for a few seconds? does entropy_avail recover quick when you then quit dd?

mesg reports "random: crng init done"

you see this long before rngd starts, right? i.e. not after "Starting rngd" on the console, but before?

You are not using dovecot in this VM by any chance?

#16 Updated by John Longe about 1 month ago

Henrik Riomar wrote:

output above looks fine, you would get "read error" if /dev/hwrng was not working.

When rngd is running (after the slow bad boot) how does /proc/sys/kernel/random/entropy_avail look in the guest?

Around 3100.

how does it look if you do
[...]
for a few seconds?

Around 300.

does entropy_avail recover quick when you then quit dd?

It immediately jumps back to ~3100 (within seconds)

you see this long before rngd starts, right? i.e. not after "Starting rngd" on the console, but before?

Yes.

You are not using dovecot in this VM by any chance?

Yes, I am! This only happens in my VMs running dovecot (I also wrote about this in the IRC channel)
... I've just seen your bug report https://bugs.alpinelinux.org/issues/10320 now. Hah! Thank you for finding this.

#17 Updated by Henrik Riomar about 1 month ago

Nice, so then rng-tools and virtio-rng works like it should for you.

If you are not running multiple dovecot instances you can make this small change while waiting for #10320 to be fixed.

--- a/init.d/dovecot
+++ b/init.d/dovecot
@@ -5,7 +5,8 @@
 description="Secure POP3/IMAP server" 

 cfgfile=/etc/dovecot/dovecot${instance:+.$instance}.conf
-pidfile=$(doveconf -c $cfgfile -h base_dir 2>/dev/null)/master.pid
+#pidfile=$(doveconf -c $cfgfile -h base_dir 2>/dev/null)/master.pid
+pidfile="/run/dovecot/master.pid" 

#18 Updated by Natanael Copa 20 days ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

#19 Updated by Natanael Copa 17 days ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF