awful: pre-grow propolis-server heap to avoid #1008 by iximeow · Pull Request #1032 · oxidecomputer/propolis

iximeow · 2026-02-05T19:33:49Z

the gory details are in that issue, but for VMs with large address spaces it is relatively easy for a guest picking random pages to cause long streams of page faults as Propolis does I/O against those pages. The faults then starve out anything that would change the address space, most importantly brk() and friends which may need to grow the heap to support an allocation made as part of device operations.

At that point, the device will be (partially) stuck. Bad enough. Then the guest OS may notice the situation and try to restart the device. To do this, a vCPU will do some kind of access to the stuck device, which may be stuck in a way that the vCPU becomes blocked on the device. That vCPU won't be responsive to interrupts and from the guest perspective the whole machine is extremely broken.

We're not immediately sure how to untangle the faults or AS lock bits, so for the time being we can at least try to not brk() at runtime by growing the heap probably-enough to serve real needs.

(giving this a quick run on mb-0 before making it a real PR)

the gory details are in that issue, but for VMs with large address spaces it is relatively easy for a guest picking random pages to cause long streams of page faults as Propolis does I/O against those pages. The faults then starve out anything that would change the address space, most importantly `brk()` and friends which may need to grow the heap to support an allocation made as part of device operations. At that point, the device will be (partially) stuck. Bad enough. Then the guest OS may notice the situation and try to restart the device. To do this, a vCPU will do some kind of access to the stuck device, which may be stuck in a way that the vCPU becomes blocked on the device. That vCPU won't be responsive to interrupts and from the guest perspective the whole machine is extremely broken. We're not immediately sure how to untangle the faults or AS lock bits, so for the time being we can at least try to not brk() at runtime by growing the heap probably-enough to serve real needs.

hawkw

Okay, yeah, I agree that this is "awful", but...if the problem goes away, it's clearly less worse than not doing it!

hawkw · 2026-02-05T20:53:14Z

bin/propolis-server/src/lib/initializer.rs

+                // (see propolis::block::crucible::Crucible::WORKER_COUNT)
+                *wanted_heap += 8 * PER_WORKER_HEAP;


nit, sorry: could we perhaps import that WORKER_COUNT constant here?

I'd thought about it, but I think we really should do the // TODO there and make it tunable. then the number here would be that tunable, defaulted to DEFAULT_WORKER_COUNT (also 8)! and be basically the same as nworkers for the file backend.

fine with me!

hawkw · 2026-02-05T20:57:05Z

bin/propolis-server/src/lib/initializer.rs

+            // 64 * 1K is a wild over-estimate while we support 1-15 queues
+            // across virtio-block and nvme.
+            wanted_heap += 64 * 1024;


64k ought to be enough for anybody?

hawkw · 2026-02-05T20:59:53Z

bin/propolis-server/src/lib/vm/ensure.rs

+    let balloon = vec![0u8; wanted_heap + 16 * propolis::common::MB];
+    std::mem::drop(balloon);


question: rather than allocating a balloon, zero-initializing it, dropping it, and hoping that it correctly sets the max heap size such that the allocator won't try to brk to get more heap despite us having allocated a wanted_heap and change...why not just call brk ourselves here?

we know in practice that growing the heap ended up at brk(), but the allocator might not use brk/sbrk to actually manage the storage (there's a mmap backend for umem for example which we're not using today and probably won't tomorrow, but..)

so there's still the iffy bit that maybe the allocations have weird alignment constraints, or the allocator gets fragmented, such that 320 contiguous MiB or whatever doesn't cut it later on. to be more confident about that I think we'd want the kind of pooled buffers you were talking about earlier, but also I really really want to fix the OS so we don't need this or the file backend buffers lol

agreed wholeheartedly with...all of this.

hawkw · 2026-02-05T21:31:40Z

huh, seems phd_tests::hw::lspci_lifecycle_test has hit a timeout booting the VM: https://buildomat.eng.oxide.computer/wg/0/artefact/01KGQRWSKSJKNN9QXY8PPY27EG/lo4zfMXju35PmPUhejEF21RN5YPMZAMVs5qcXgIyXfJs0hEJ/01KGQRXWP544KACC9ZPBCCJQ84/01KGQTREWP1ZF5QAFJDN7PFY4M/phd-runner.log?format=x-bunyan#L1300

hawkw · 2026-02-05T21:34:34Z

specifically, it seems it was waiting for a login prompt: https://buildomat.eng.oxide.computer/wg/0/artefact/01KGQRWSKSJKNN9QXY8PPY27EG/lo4zfMXju35PmPUhejEF21RN5YPMZAMVs5qcXgIyXfJs0hEJ/01KGQRXWP544KACC9ZPBCCJQ84/01KGQTREWP1ZF5QAFJDN7PFY4M/phd-runner.log?format=x-bunyan#L1290

let's see what the serial output from the guest has to say:

BdsDxe: loading Boot0001 "UEFI " from PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)
BdsDxe: starting Boot0001 "UEFI " from PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)
Welcome to GRUB!

GNU GRUB  version 2.06

��������������������������������������Ŀ���������������������������������������[19;02H     Use the ^ and v keys to select which entry is highlighted.          
      Press enter to boot the selected OS, `e' to edit the commands       
      before booting or `c' for a command-line.                            *Linux virt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     The highlighted entry will be executed automatically in 1s.                    The highlighted entry will be executed automatically in 0s.                   Booting `Linux virt'

  0%                                               6% ###                                          15% ######                                       16% #######                                      19% ########                                     20% #########                                    21% #########                                    22% #########                                    22% ##########                                   23% ##########                                   24% ##########                                   25% ###########                                  26% ###########                                  27% ###########                                  27% ############                                 28% ############                                 29% ############                                 29% #############                                30% #############                                31% #############                                31% ##############                               32% ##############                               33% ##############                               34% ###############                              35% ###############                              36% ###############                              36% ################                             37% ################                             38% ################                             38% #################                            39% #################                            40% #################                            40% ##################                           41% ##################                           42% ##################                           43% ###################                          45% ###################                          45% ####################                         46% ####################                         47% ####################                         47% #####################                        49% #####################                        50% ######################                       71% ###############################              76% #################################            77% ##################################           82% ####################################         83% ####################################         84% #####################################        86% #####################################        87% ######################################       88% ######################################       88% #######################################      89% #######################################      90% #######################################      90% ########################################     91% ########################################     92% ########################################     93% ########################################     93% #########################################   100% ############################################

   OpenRC 0.44.10 is starting up Linux 5.15.41-0-virt (x86_64)

 * /proc is already mounted
 * Mounting /run ... * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ... [ ok ]
 * Remounting devtmpfs on /dev ... [ ok ]
 * Mounting /dev/mqueue ... [ ok ]
 * Mounting modloop  ... * Verifying modloop
 [ ok ]
 * Mounting security filesystem ... [ ok ]
 * Mounting debug filesystem ... [ ok ]
 * Mounting persistent storage (pstore) filesystem ... [ ok ]
 * Mounting efivarfs filesystem ... [ ok ]
 * Starting busybox mdev ... [ ok ]
 * Loading hardware drivers ... [ ok ]
 * Loading modules ... [ ok ]
 * Setting system clock using the hardware clock [UTC] ... [ ok ]
 * Checking local filesystems  ... [ ok ]
 * Remounting filesystems ... [ ok ]
 * Mounting local filesystems ... [ ok ]
 * Configuring kernel parameters ... [ ok ]
 * Migrating /var/lock to /run/lock ... [ ok ]
 * Creating user login records ... [ ok ]
 * Cleaning /tmp directory ... [ ok ]
 * Setting hostname ... [ ok ]
 * Starting busybox syslog ... [ ok ]
 * Starting firstboot ... [ ok ]

Welcome to Alpine Linux 3.16
Kernel 5.15.41-0-virt on an x86_64 (/dev/ttyS0)

localhost login: root
Welcome to Alpine!

The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.

You can setup the system with the command: setup-alpine

You may change this message by editing /etc/motd.

localhost:~# stty -F `tty` cols 9999
localhost:~# sudo lspci -vvx
-ash: sudo: not found
localhost:~# sudo lshw -notime
-ash: sudo: not found

LMAO WHAT

iximeow · 2026-02-05T23:16:18Z

well, I took some notes in #1035, reran phd-run and it passed.. really wish we had a core of propolis that was stuck, but lacking that I'll see about getting some other debugging-related things in separately..

hawkw

great sure whatever it's awful but it works

hawkw · 2026-02-05T23:17:20Z

bin/propolis-server/src/lib/vm/ensure.rs

+    let balloon = vec![0u8; wanted_heap + 16 * propolis::common::MB];
+    std::mem::drop(balloon);


agreed wholeheartedly with...all of this.

iximeow added 3 commits February 5, 2026 19:26

the world's most annoying typo

ebb847e

words

3ea6fa8

hawkw reviewed Feb 5, 2026

View reviewed changes

iximeow marked this pull request as ready for review February 5, 2026 21:12

AlejandroME added this to the 18 milestone Feb 5, 2026

hawkw approved these changes Feb 5, 2026

View reviewed changes

hawkw mentioned this pull request Feb 5, 2026

want PHD lshw_lifecycle_test to, uh, actually involve lshw(1) in some way #1036

Open

iximeow merged commit 47f0c4c into oxidecomputer:master Feb 6, 2026
12 checks passed

		// (see propolis::block::crucible::Crucible::WORKER_COUNT)
		wanted_heap += 8 PER_WORKER_HEAP;

		let balloon = vec![0u8; wanted_heap + 16 * propolis::common::MB];
		std::mem::drop(balloon);

Conversation

iximeow commented Feb 5, 2026

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw commented Feb 5, 2026

Uh oh!

hawkw commented Feb 5, 2026

Uh oh!

iximeow commented Feb 5, 2026

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants