RCU CPU stall warning in a multi-core system simulation #51

chiangkd · 2024-08-04T18:29:35Z

Execute semu with multi-core system simulation

make check SMP=4

The RCU CPU stall warning, as discussed in #49 , is accompanied by an increase in timer interrupts and clock_gettime system calls to produce a real-time timer. This causes the CPU to wait longer than a typical grace period, which is usually 21 seconds.

[   33.044071] rcu: INFO: rcu_sched self-detected stall on CPU
[   33.048071] rcu: 	2-....: (5247 ticks this GP) idle=da1c/1/0x40000002 softirq=689/689 fqs=1847
[   33.056071] 	(t=5253 jiffies g=-887 q=11 ncpus=4)
[   33.064071] CPU: 2 PID: 8 Comm: kworker/u8:0 Not tainted 6.1.94 #12
[   33.068071] Hardware name: semu (DT)
[   33.072071] Workqueue: events_unbound async_run_entry_fn
[   33.080071] epc : eat+0x34/0x54
[   33.088071]  ra : do_reset+0x38/0x6c
[   33.096071] epc : c03811fc ra : c038137c sp : c0871e40
[   33.104071]  gp : c04de858 tp : c0852f40 t0 : c0aabc70
[   33.112071]  t1 : 00000008 t2 : 67e0cb47 s0 : c0871e50
[   33.120071]  s1 : c03a48b8 a0 : 00000001 a1 : 00000000
[   33.128071]  a2 : 00000000 a3 : 0053be47 a4 : 00000000
[   33.136071]  a5 : c03a48b8 a6 : 00000000 a7 : 0000000c
[   33.144071]  s2 : 004725a7 s3 : 0038da58 s4 : 00000000
[   33.152071]  s5 : 00000030 s6 : 00000007 s7 : c03a496c
[   33.156071]  s8 : c03a48b8 s9 : c038112c s10: c03a4968
[   33.160071]  s11: c0381538 t3 : c09e4140 t4 : 00000003
[   33.168071]  t5 : ec37d331 t6 : c0871e08
[   33.172071] status: 00000120 badaddr: c03a48c0 cause: 80000005
[   33.180071] [<c03811fc>] eat+0x34/0x54
[   33.184071] [<c038137c>] do_reset+0x38/0x6c
[   33.192071] [<c0381514>] write_buffer+0x40/0x64
[   33.200071] [<c0381d20>] unpack_to_rootfs+0x1c4/0x300
[   33.212072] [<c03824ec>] do_populate_rootfs+0x70/0xd8
[   33.220072] [<c002bd58>] async_run_entry_fn+0x30/0xd0
[   33.232072] [<c002195c>] process_one_work+0x198/0x224
[   33.240072] [<c0021ee4>] worker_thread+0x238/0x294
[   33.248072] [<c002898c>] kthread+0xd4/0xd8
[   33.252072] [<c0002508>] ret_from_exception+0x0/0x1c

Implementing multi-thread support for semu, as discussed in PR #49, might improve the performance of the booting process.

The text was updated successfully, but these errors were encountered:

Mes0903 · 2024-12-25T13:48:51Z

Although implementing multi-threaded system emulation can significantly alleviate this issue, I think it does not mean the problem will no longer occur after multi-threaded system emulation is complete. Perhaps we can start by optimizing the timer.

Currently, the function semu_timer_get relies on clock_gettime, and semu_timer_get is primarily called during reads and writes to mtime. In other words, every time mtime is accessed, clock_gettime is invoked. This leads to an excessive number of timer interrupts, causing performance issues in the emulator.

In #49, it was suggested that lowering the frequency set in semu_timer_init could mitigate this problem. However, this would mean the Linux kernel perceives the clock as no longer being "real-time" but rather a "slowed down" or "imprecise" clock, which goes against the intent of that pull request.

To strike a balance between these two extremes, I think we can maintain a dedicated emulator timer and updating it at the start of each emulation cycle. This way, clock_gettime would only be called once per emulation cycle, while tick updates would be maintained separately. The logic for reading mtime would also be simplified.

I made some modifications to test this approach, and it did result in a slight performance improvement for the emulator. On my machine, with SMP=6, the RCU CPU Stall warning no longer appears. However, with SMP=8, the warning still occurs.

chiangkd · 2024-12-25T17:46:56Z

I completely agree that your implementation can help avoid RCU CPU stall warnings. However, as the number of simulated cores increases, the warning is likely to occur again unless the emulation period per cycle is also increased proportionally to the number of cores. Is my understanding correct?

I rebuilt the Linux kernel with CONFIG_PREEMPT=y to enable kernel preemption. This configuration suppresses the warning when simulating with SMP=32. The system boots successfully, as shown below:

[  351.875391] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[  356.664664] virtio_net virtio0: Assigned random MAC address 6e:8d:9a:9d:3f:86
[  360.199865] clk: Disabling unused clocks
[ 1133.368994] Freeing initrd memory: 8188K
[ 1135.085089] Freeing unused kernel image (initmem) memory: 180K
[ 1135.149092] Kernel memory protection not selected by kernel config.
[ 1135.201095] Run /init as init process
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login:

But preemption may increase the frequency of context switches, causing overall CPU usage to increase.
For high-throughput tasks, overall performance may be reduced.

Mes0903 · 2024-12-26T11:55:07Z

However, as the number of simulated cores increases, the warning is likely to occur again unless the emulation period per cycle is also increased proportionally to the number of cores. Is my understanding correct?

yes, the improvement is limited, as the number of cores grow up, the warning would occur again.

Mes0903 · 2024-12-26T12:06:22Z

The accuracy of the timer primarily affects user programs. I think that during the boot process, it is unnecessary to use such a high-precision timer. Instead, a less precise timer, or even one that simply increments in a straightforward manner, could be used. After the boot process is complete, the system can switch back to a more precise timer.

Here is a simple example. I added a global flag boot_complete to detect whether the boot process has finished. This can be determined by the first transition to U mode.

static void op_sret(hart_t *vm)
{
    /* Restore from stack */
    vm->pc = vm->sepc;
    mmu_invalidate(vm);
    vm->s_mode = vm->sstatus_spp;
    vm->sstatus_sie = vm->sstatus_spie;

    /* After the booting process is complete, initrd will be loaded. At this
     * point, the sytstem will switch to U mode for the first time. Therefore,
     * by checking whether the switch to U mode has already occurred, we can
     * determine if the boot process has been completed.
     */
    if (!boot_complete && !vm->s_mode)
        boot_complete = true;

    /* Reset stack */
    vm->sstatus_spp = false;
    vm->sstatus_spie = true;
}

Before the boot process is complete, I didn't use clock_gettime. Instead, I simply increment tv_nsec and combine it with the lower frequency mentioned in #49 to eliminate the CPU stall warning.

bool boot_complete = false;
static struct timespec host_time;

// ...

void semu_timer_init(semu_timer_t *timer, uint64_t freq)
{
    timer->freq = freq;
    clock_gettime(CLOCKID, &host_time);
    semu_timer_rebase(timer, 0);
}

static uint64_t semu_timer_clocksource(uint64_t freq)
{
#if defined(HAVE_POSIX_TIMER)
    if (boot_complete) {
        clock_gettime(CLOCKID, &host_time);
        return host_time.tv_sec * freq +
               mult_frac(host_time.tv_nsec, freq, 1e9);
    } else {
        return host_time.tv_sec * freq +
               mult_frac(host_time.tv_nsec++, freq / 1000, 1e9);
    }
    
// ...
#endif
}

uint64_t semu_timer_get(semu_timer_t *timer)
{
    /* Rebase the timer to the current time after the boot process. */
    static bool first = true;
    if (first && boot_complete) {
        first = false;
        timer->begin = semu_timer_clocksource(timer->freq);
    }

    return semu_timer_clocksource(timer->freq) - timer->begin;
}

// ...

This is the sample output:

mes@DESKTOP-HLQ9F6A:~/semu$ make check SMP=32
  CC    riscv.o
  CC    ram.o
  CC    utils.o
  CC    plic.o
  CC    uart.o
  CC    main.o
  CC    aclint.o
  CC    virtio-blk.o
  CC    virtio-net.o
  CC    netdev.o
  LD    semu
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000000] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.000007] Console: colour dummy device 80x25
[    0.000008] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.000009] pid_max: default: 32768 minimum: 301
[    0.000014] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.000015] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.000038] ASID allocator disabled (0 bits)
[    0.000040] rcu: Hierarchical SRCU implementation.
[    0.000040] rcu:     Max phase no-delay instances is 1000.
[    0.000057] smp: Bringing up secondary CPUs ...
[    0.000295] smp: Brought up 1 node, 32 CPUs
[    0.000320] devtmpfs: initialized
[    0.000381] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.000382] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.000406] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.000486] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    0.000662] clocksource: Switched to clocksource riscv_clocksource
[    0.000806] NET: Registered PF_INET protocol family
[    0.000811] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000858] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.000859] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000860] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.000862] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.000865] TCP: Hash tables configured (established 4096 bind 4096)
[    0.000868] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.000869] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.000874] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.000883] Unpacking initramfs...
[    0.005874] Freeing initrd memory: 8188K
[    0.005907] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    0.006895] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.006917] printk: console [ttyS0] disabled
[    0.006918] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    0.006919] printk: console [ttyS0] enabled
[    0.006919] printk: console [ttyS0] enabled
[    0.006920] printk: bootconsole [ns16550] disabled
[    0.006920] printk: bootconsole [ns16550] disabled
[    0.006952] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    0.006977] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    0.006993] virtio_net virtio0: Assigned random MAC address 9e:21:77:c8:0b:a8
[    0.007019] clk: Disabling unused clocks
[    0.007029] Freeing unused kernel image (initmem) memory: 180K
[    0.007030] Kernel memory protection not selected by kernel config.
[    0.007031] Run /init as init process
[   17.029125] hrtimer: interrupt took 79999969 ns
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login: root
# cat /proc/cpuinfo
processor       : 0
hart            : 0
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 1
hart            : 1
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 2
hart            : 2
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 3
hart            : 3
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 4
hart            : 4
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 5
hart            : 5
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 6
hart            : 6
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 7
hart            : 7
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 8
hart            : 8
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 9
hart            : 9
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 10
hart            : 10
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 11
hart            : 11
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 12
hart            : 12
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 13
hart            : 13
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 14
hart            : 14
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 15
hart            : 15
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 16
hart            : 16
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 17
hart            : 17
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 18
hart            : 18
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 19
hart            : 19
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 20
hart            : 20
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 21
hart            : 21
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 22
hart            : 22
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 23
hart            : 23
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 24
hart            : 24
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 25
hart            : 25
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 26
hart            : 26
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 27
hart            : 27
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 28
hart            : 28
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 29
hart            : 29
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 30
hart            : 30
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

processor       : 31
hart            : 31
isa             : rv32ima
mmu             : sv32
mvendorid       : 0x12345678
marchid         : 0x80000001
mimpid          : 0x1

chiangkd · 2024-12-26T18:00:42Z

I agree that your changes can quickly and easily resolve the RCU stall warning issue! However, your log cannot represent the actual boot time in this situation ([ 17.029125]). It should take much more time, am I correct?

I try to reproduce your work, the RCU stall warning is resolved when simulate SMP=32

[    0.007006] Run /init as init process
[    0.026121] hrtimer: interrupt took 12000723 ns
Starting syslogd: OK

another message hrtimer: interrupt took XXXX ns is shown (also in your log)

A quick research that this warning is produced by hrtimer_interrupt() (high resolution timer) because our kernel has config CONFIG_HIGH_RES_TIMERS=y

I'm not sure is any side effect here

Mes0903 · 2024-12-26T18:41:04Z

However, your log cannot represent the actual boot time in this situation ([ 17.029125]). It should take much more time, am I correct?

Yes, the primary reason for incrementing tv_nsec is to allow the boot process time to progress, ensuring the boot process can proceed smoothly. Therefore, during the boot process, the timer does not serve as an actual timing mechanism.

Once the system switches to U mode for the first time, the boot_complete flag is set, and the time reference is reset. As a result, the [17.029125] here represents 17.029125 seconds since the first switch to U mode.

As for the HRT warning, I think it was triggered due to a sudden jump in the system clock right after the boot process completed. I'm not sure is any side effect here too.

jserv · 2024-12-26T23:01:51Z

A quick research that this warning is produced by hrtimer_interrupt() (high resolution timer) because our kernel has config CONFIG_HIGH_RES_TIMERS=y

This option was set for the sake of benchmarking purpose. RT-Tests relies on HRT features.

jserv · 2024-12-26T23:07:26Z

However, your log cannot represent the actual boot time in this situation ([ 17.029125]). It should take much more time, am I correct?

The timer increments should align with the frequency defined in the device tree. We could use an approach similar to BogoMips to make the necessary adjustments.

Mes0903 · 2024-12-29T09:51:00Z

After multiple attempts, I realize that independently maintaining nsec is not a good idea. This results in time intervals between reads being uniform, whereas in reality, we call clock_gettime at varying time intervals.

In contrast, I believe maintaining a frequency scaling factor is a better solution. As mentioned in #49, this achieves the purpose of slowing down time during the boot process. Since it’s merely a scaling factor, we can still derive real-time values from it.

As for the hrtimer: interrupt took XXXX ns issue, after a few days of testing, I found that this doesn’t seem to be caused by time drift. In fact, during these tests, I re-based the timer after the boot process using the recorded time to ensure no drift, but the hrtimer message still appeared. The timing of the message is not consistent—it sometimes appears after the "Welcome to Buildroot" message, and sometimes right after switching to U-mode. But always after switch to U-mode.

Interestingly, on my machine, the warning doesn't appear at all with SMP=16, regardless of how the frequency is adjusted. Moreover, the current implementation only affects the timer during the boot process; after switching to U-mode, the timer behaves exactly as before. Therefore, I believe this warning is due to the current sequential emulation approach. After the multi-threaded emulation is implemented, I think the situation would be mitigated a lot.

Here’s a diagram of the overall flow:

Below is a simple example:

static uint64_t semu_timer_clocksource(uint64_t freq)
{
#if defined(HAVE_POSIX_TIMER)
    struct timespec t;
    clock_gettime(CLOCKID, &t);

    if (boot_complete)
        return t.tv_sec * freq + mult_frac(t.tv_nsec, freq, 1e9);
    else
        return t.tv_sec * (freq / 100) + mult_frac(t.tv_nsec, (freq / 100), 1e9); 
#elif defined(HAVE_MACH_TIMER)
    static mach_timebase_info_data_t t;
    if (t.denom == 0)
        (void) mach_timebase_info(&t);
    return mult_frac(mult_frac(mach_absolute_time(), t.numer, t.denom), freq,
                     1e9);
#else
    return time(0) * freq;
#endif
}

uint64_t semu_timer_get(semu_timer_t *timer)
{
    static bool first = true;
    if (first && boot_complete) {
        first = false;
        semu_timer_rebase(timer, 0);
        printf("\033[1;33m[SEMU LOG]: Switch to real time\033[0m\n");
    }

    return semu_timer_clocksource(timer->freq) - timer->begin;
}

I think this approach is better than maintaining two separate timers during the boot process. Dividing the frequency by 100 means the boot process operates at one one-hundredth of real-time, allowing us to easily derive the actual boot time. This scaling factor can also be configured in the Makefile. I attempted to dynamically measure the cost of clock_gettime to set this value, but realized that using clock_gettime as a benchmark was not accurate. Therefore, in this example, a scale of 100 was used for simplicity.

Here is the output of the test:

mes@DESKTOP-HLQ9F6A:~/MesRv32emu/semu$ make check SMP=32
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000000] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.000694] Console: colour dummy device 80x25
[    0.000893] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.000992] pid_max: default: 32768 minimum: 301
[    0.001489] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.001588] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.004070] ASID allocator disabled (0 bits)
[    0.004269] rcu: Hierarchical SRCU implementation.
[    0.004368] rcu:     Max phase no-delay instances is 1000.
[    0.006056] smp: Bringing up secondary CPUs ...
[    0.049211] smp: Brought up 1 node, 32 CPUs
[    0.055530] devtmpfs: initialized
[    0.071056] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.071360] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.077127] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.097399] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    0.141221] clocksource: Switched to clocksource riscv_clocksource
[    0.176651] NET: Registered PF_INET protocol family
[    0.177847] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.189801] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.190000] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.190199] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.190697] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.191394] TCP: Hash tables configured (established 4096 bind 4096)
[    0.192190] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.192390] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.193684] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.196073] Unpacking initramfs...
[    0.217861] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    0.463628] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.468923] printk: console [ttyS0] disabled
[    0.469123] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    0.469422] printk: console [ttyS0] enabled
[    0.469422] printk: console [ttyS0] enabled
[    0.469622] printk: bootconsole [ns16550] disabled
[    0.469622] printk: bootconsole [ns16550] disabled
[    0.477610] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    0.483698] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    0.487487] virtio_net virtio0: Assigned random MAC address 1e:07:10:f3:00:37
[    0.494367] clk: Disabling unused clocks
[    1.486134] Freeing initrd memory: 8188K
[    1.488423] Freeing unused kernel image (initmem) memory: 180K
[    1.488622] Kernel memory protection not selected by kernel config.
[    1.488821] Run /init as init process
[SEMU LOG]: Switch to real time
[    7.946974] hrtimer: interrupt took 69481784 ns
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login:

Another output for the factor set to 50:

mes@DESKTOP-HLQ9F6A:~/MesRv32emu/semu$ make check SMP=32
  CC    riscv.o
  CC    ram.o
  CC    utils.o
  CC    plic.o
  CC    uart.o
  CC    main.o
  CC    aclint.o
  CC    virtio-blk.o
  CC    virtio-net.o
  CC    netdev.o
  LD    semu
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000000] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.001390] Console: colour dummy device 80x25
[    0.001589] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.001788] pid_max: default: 32768 minimum: 301
[    0.002781] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.002980] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.007549] ASID allocator disabled (0 bits)
[    0.007947] rcu: Hierarchical SRCU implementation.
[    0.007947] rcu:     Max phase no-delay instances is 1000.
[    0.011523] smp: Bringing up secondary CPUs ...
[    0.095725] smp: Brought up 1 node, 32 CPUs
[    0.108034] devtmpfs: initialized
[    0.138606] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.139202] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.150715] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.190212] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    0.279523] clocksource: Switched to clocksource riscv_clocksource
[    0.350564] NET: Registered PF_INET protocol family
[    0.352978] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.375270] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.375872] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.376273] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.377275] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.378478] TCP: Hash tables configured (established 4096 bind 4096)
[    0.380083] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.380484] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.382890] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.387903] Unpacking initramfs...
[    0.415743] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    0.910292] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.921461] printk: console [ttyS0] disabled
[    0.921867] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    0.922476] printk: console [ttyS0] enabled
[    0.922476] printk: console [ttyS0] enabled
[    0.922881] printk: bootconsole [ns16550] disabled
[    0.922881] printk: bootconsole [ns16550] disabled
[    0.939299] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    0.951841] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    0.959919] virtio_net virtio0: Assigned random MAC address 8e:5e:2e:78:c2:e5
[    0.974247] clk: Disabling unused clocks
[    2.984914] Freeing initrd memory: 8188K
[    2.989718] Freeing unused kernel image (initmem) memory: 180K
[    2.989918] Kernel memory protection not selected by kernel config.
[    2.990318] Run /init as init process
[SEMU LOG]: Switch to real time
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login: [   88.934001] hrtimer: interrupt took 60891661 ns

Also another output of the factor set to 10:

mes@DESKTOP-HLQ9F6A:~/MesRv32emu/semu$ make check SMP=32
  CC    riscv.o
  CC    ram.o
  CC    utils.o
  CC    plic.o
  CC    uart.o
  CC    main.o
  CC    aclint.o
  CC    virtio-blk.o
  CC    virtio-net.o
  CC    netdev.o
  LD    semu
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000000] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.006954] Console: colour dummy device 80x25
[    0.008940] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.009934] pid_max: default: 32768 minimum: 301
[    0.014901] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.015894] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.040726] ASID allocator disabled (0 bits)
[    0.042712] rcu: Hierarchical SRCU implementation.
[    0.042712] rcu:     Max phase no-delay instances is 1000.
[    0.059595] smp: Bringing up secondary CPUs ...
[    0.510227] smp: Brought up 1 node, 32 CPUs
[    0.577710] devtmpfs: initialized
[    0.738451] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.740435] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.801942] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    1.015202] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    1.502151] clocksource: Switched to clocksource riscv_clocksource
[    1.887699] NET: Registered PF_INET protocol family
[    1.900829] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    2.030783] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    2.032790] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    2.034797] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    2.040818] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    2.047842] TCP: Hash tables configured (established 4096 bind 4096)
[    2.055870] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    2.057877] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    2.070922] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    2.095005] Unpacking initramfs...
[    2.187237] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    4.908858] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    4.971514] printk: console [ttyS0] disabled
[    4.972524] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    4.975553] printk: console [ttyS0] enabled
[    4.975553] printk: console [ttyS0] enabled
[    4.978582] printk: bootconsole [ns16550] disabled
[    4.978582] printk: bootconsole [ns16550] disabled
[    5.065338] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    5.132830] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    5.178077] virtio_net virtio0: Assigned random MAC address 6e:6c:05:99:21:82
[    5.251437] clk: Disabling unused clocks
[   15.876369] Freeing initrd memory: 8188K
[   15.901243] Freeing unused kernel image (initmem) memory: 180K
[   15.903233] Kernel memory protection not selected by kernel config.
[   15.905222] Run /init as init process
[SEMU LOG]: Switch to real time
Starting syslogd: [   16.343392] hrtimer: interrupt took 61000570 ns
OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login:

In my environment, even with varying scale factors, the hrtimer warning consistently appeared at approximately 60000000 ns. I think this observation supports my hypothesis.

Mes0903 · 2025-01-03T12:37:32Z

Here is a summary of two potential approaches to mitigate RCU CPU stalls under the current sequentially-emulation scenario, we have two methods now

Methods

Scale Frequency
Manually Maintain Increment of nsec

1. Scale Frequency

This method involves calling clock_gettime during the boot process to obtain the time difference and then scaling the frequency to slow down the system. An example implementation was provided in earlier discussions of this issue.

Pros

The boot process can use logs to approximate real-world time very easily.
By scaling, RCU CPU stall warnings can be significantly reduced. As long as the scaling factor is large enough, no matter how many harts are involved, there will be no RCU CPU stall warnings during the boot process.

Cons

Requires calling clock_gettime, resulting in a longer execution time.

2. Manually Increment `nsec`

This method uses statistical estimations to determine an empirical value representing the boot process period. Based on this value, the increment of nsec is designed. A rough estimate of the relationship is provided below:

The orange segments in the diagram represent calls to semu_timer_clocksource. Each call adds a predefined value (blue arrows) regardless of the interval between calls.

Pros

Does not require clock_gettime, so the boot process executes faster.

Cons

During the boot process, semu_timer_clocksource no longer functions as a timer period calculator but acts like a counter. Therefore, the logs cannot be used to approximate real-world time.
Since nsec increments are based on a designed value, RCU CPU stalls may still occur under different environments.

Synchronization and Time Rebase

Regardless of the chosen method, the emulator's internal time and real-world execution time will not synchronize during the boot process. Thus, when switch back to real-time timer, an big time difference would appear. However, this time discrepancy can be resolved by rebasing real-time with emulation time when switching to U mode.

Additionally, hrtimer warnings during the boot process are unrelated to the timers being used. This issue can be revisited during the development of multi-threaded emulation to check if it can be mitigated.

Analysis of Method 2: Empirical Estimation

Below is the data from tests conducted on my computer with freq/10, showing consistency in results across multiple trials:

SMP	times call `semu_timer_clocksource`	time(sec) of boot process	hrtimer warning
1	223,992,364	3.40001
2	382,486,686	8.01002
3	577,491,593	13.44003
4	774,125,110	17.85185
5	973,274,729	22.94007
6	1,174,038,398	27.11009
7	1,377,244,622	31.80010
8	1,605,001,986	37.52011
9	1,793,136,295	41.41014
10	2,005,988,752	45.53015
11	2,220,126,569	51.66018
12	2,440,897,255	56.13018
13	2,651,860,790	60.71019
14	2,882,701,067	65.92020
15	3,103,978,838	70.30022
16	3,343,030,072	76.31025
17	3,566,365,881	80.24026
18	3,800,214,669	86.59028
19	4,031,961,176	92.00030
20	4,280,331,336	94.47030
21	4,516,731,902	101.68033
22	4,883,959,327	104.95035
23	5,143,022,258	110.69036
24	5,260,058,753	118.59098
25	5,526,277,854	125.30041
26	5,790,681,086	132.98045	50000184 ns
27	6,044,658,240	140.04046	80000307 ns
28	6,328,119,424	146.18047	60000231 ns
29	6,598,156,499	154.15050	80000261 ns
30	6,868,480,625	159.83052	90000308 ns
31	7,129,979,196	163.82054	50000169 ns
32	7,410,129,712	170.80054	80000508 ns

Tests were also conducted on my workstation:

SMP	times call `semu_timer_clocksource`	time(sec) of boot process	hrtimer warning
1	223,450,834	15.21302
2	388,551,174	31.45406
3	586,279,749	48.33009
4	791,644,232	68.00714
5	1,003,639,012	83.64418
6	1,216,761,778	99.95122	12000031 ns
7	1,438,276,507	120.21144	14000047 ns
8	1,704,344,789	122.50440	11000030 ns
9	1,900,605,464	156.91848	10000031 ns
10	2,140,147,966	176.43249	11000031 ns
11	2,451,031,756	179.20599	12000062 ns
12	2,633,717,918	217.70393	14000046 ns
13	2,993,790,985	216.13076	15000046 ns
14	3,165,383,012	262.75081	14000046 ns
15	3,437,855,090	286.43180	15000015 ns

Since the workstation was slow, the execution time was long. Thus I just statistics until SMP=15.

Target Time Configuration

To use the second method, a target time need to be determined. If a target boot time of 10 seconds is set, nsec increment values can be calculated based on the SMP parameter.

For example, with SMP=4 and a target time of 10 seconds ($10^{10}$ ns), each call to semu_timer_clocksource adds approximately:

$$ \frac{10^{10}}{774125110} \approx 13 \text{ ns} $$

to nsec.

However, this method may introduce timing discrepancies across different environments. For instance, with SMP=1, the boot process takes approximately 3 seconds on my personal computer but 18 seconds on my workstation, resulting in a sixfold difference.

This leads to an implicit problem: if adding a core increases the number of semu_timer_clocksource calls by an approximate number $2 \times 10^8$, and the target time is set to 10 seconds, then each call to semu_timer_clocksource will increment nsec by approximately:

$$ \frac{10^{10}}{SMPs \times 2 \times 10^8} = \frac{50}{SMPs} \text{ ns} $$

where SMPs represents the number followed by SMP parameter.

Under this method, the time in emulator during boot process is calculated as:

$$ \text{Number of Calls} \times \frac{10^{10}}{SMPs \times 2 \times 10^8} $$

If the assumption of $2 \times 10^8$ calls per core is incorrect, the emulation time will deviate from real-world time. For instance, if the actual number of semu_timer_clocksource calls exceeds $2 \times 10^8$, the boot process will take much longer than the target time, potentially triggering RCU CPU stall warnings.

The value $2 \times 10^8$ was derived from tests on my personal computer and workstation. Despite the sixfold difference in execution times, the number of semu_timer_clocksource calls was remarkably consistent, leading to this assumption. The corresponding numbers could be checked in the tables above: for each increment of the SMP parameter, the number of calls to semu_timer_clocksource roughly increases by $2 \times 10^8$.

If we still want a coarse-grained timer during the boot process to roughly approximate real-world time, clock_gettime could be called at specific intervals (e.g., every $10^n$ calls). A pseudo-random number generator (PRNG) or the earlier mentioned relationship could then provide time displacement values, allowing approximate synchronization with real-world time during the boot process.

However, if the actual number of semu_timer_clocksource calls exceeds $2 \times 10^8$, time recalibration using clock_gettime may lead to time regression if real-world time is less than emulation time.

In contrast, if the number of semu_timer_clocksource calls is too low, time will continue to increment, leading only to a deviation that can be corrected via rebase.

Although manually incrementing nsec avoids calling clock_gettime, differences in execution time across environments reduce the stability of RCU CPU stall warning mitigation. This also eliminates the ability to correlate boot process logs with real-world time.

Nonetheless, since boot process timing may not be critical, meanwhile, as the number of harts increment, we can easily notice that the execution time of boot process is getting longer and longer, so I think the benefits of avoiding the call of clock_gettime still remain attractive.

In my opinion,

If boot time accuracy matters: Use scaled frequency and continue relying on clock_gettime to update nsec. There is an simple example code in the previous discussion of this issue.
If boot time accuracy does not matter: Manually update nsec without scaling frequency. Increment nsec by the method mentioned above.
- If the boot process timing is completely irrelevant, I think even just update nsec by an really small PRNG number like 1~3 is okay

Maybe we can discuss which method to adopt or any better modifications here. Once decided, I think I can start to submit a PR.

jserv · 2025-01-03T14:42:34Z

@chiangkd and @RinHizakura, please comment the above.

chiangkd · 2025-01-03T17:22:36Z

I tend to prefer using "scaled frequency", based on your analysis.

Pros

The boot process can use logs to approximate real-world time very easily.

By scaling, RCU CPU stall warnings can be significantly reduced. As long as the scaling factor is large enough, no matter how >many harts are involved, there will be no RCU CPU stall warnings during the boot process.

Cons

Requires calling clock_gettime, resulting in a longer execution time.

In my opinion, using real-world time offers valuable benefits for developers. It aids in analyzing and identifying potential improvements to accelerate the booting process (e.g., multi-threaded simulations).

RinHizakura · 2025-01-04T04:15:20Z

If we just consider the description, scaling frequency seems to be better for me due to the timing accuracy.

However, I am still hesitating after thinking further about it. For example, considering debug mode(yes we didn't implement this for semu, but the possibility still exists) the situation can be more complicated: The guest can be stopped, but the host timer will keep going. The clock_gettime() method may lead to a more complex in this case. On the other, most of the time we don't evaluate the software(Linux) performance on the emulator, because there are too many biases on the emulation even if we can solve this timer issue.

In summary, do we have any real experience in evaluating the boot time accuracy on the emulator(e.g. qemu)? If we never did this(in my experience this is true), the time accuracy does not matter. I'll support just using the increment counter-based method.

Mes0903 · 2025-01-05T08:43:00Z

In summary, do we have any real experience in evaluating the boot time accuracy on the emulator(e.g. qemu)?

If I understand correctly, QEMU by default provides a rough(coarse) mapping to the host real-time. This is sufficient for the Guest OS to perceive "time passing," but it does not guarantee accurate or stable synchronization. At the same time, QEMU offers another clock type(QEMU_CLOCK_VIRTUAL_RT), the "increment-based" method(icount). In this mode, the emulator updates time based on the number of executed instructions(or events). This ensures functional correctness and repeatability, unaffected by fluctuations in host speed.

For semu, since we aim to allow user programs to leverage benchmarking tools like Dhrystone or CoreMark for performance testing, it is essential to read accurate time. However, reading mtime is a privileged operation, so such user programs will not have as frequent system calls as during the boot process.

Thus, as mentioned earlier, I think time during the boot process is relatively less critical. During this period, simply ensuring that the nsec value increments is sufficient, and timer functionality can begin after mode switching.

The guest can be stopped, but the host timer will keep going. The clock_gettime() method may lead to more complexity in this case.

At this stage, I believe we should focus solely on addressing the RCU CPU stall warning issue. Our current implementation is relatively simplified. Though more complex debugging scenarios could be resolved (e.g., using rebase to adjust time offsets), doing so would require enumerating and handling different cases within our simplified implementation, leading to code clutter.

In my opinion, if we want to account for more complex factors, a more comprehensive timer framework would be needed. For example, adopting a cycle-accurate simulation approach with a dedicated thread to update timer values and signaling this thread to obtain monotonic time would be a viable solution. However, I don't think this is very necessary currently.

RinHizakura · 2025-01-05T10:26:04Z

Thus, as mentioned earlier, I think time during the boot process is relatively less critical. During this period, simply ensuring that the nsec value increments is sufficient, and timer functionality can begin after mode switching.

Agree. If there is no case where the accuracy of boot time is important, it would be more attractive if the emulator could boot quickly. For example, when we use it as a CI/CD test platform. So I still prefer the counter-based method. My experience may be too one-sided, so feel free to make any exceptions.

At this stage, I believe we should focus solely on addressing the RCU CPU stall warning issue.

Right, I may overthink. We can ignore the extended features as of now.

chiangkd · 2025-01-05T15:12:39Z

I tend to prefer using "scaled frequency", based on your analysis.

Sorry I'd like to revise my previous opinion. I initially preferred using a "scaled frequency" approach based on your analysis. I agree with @RinHizakura and your proposed Method 2—simply incrementing the counter during mode switching.

Thus, as mentioned earlier, I think time during the boot process is relatively less critical. During this period, simply ensuring that the nsec value increments is sufficient, and timer functionality can begin after mode switching.

I agree that precise timekeeping during the boot process is relatively less critical, and frequent clock_gettime() calls are unnecessary at this stage. During this phase, it is sufficient to ensure the nsec value increments consistently, with full timer functionality initialized after the mode switch.

I’ve learned a lot from this discussion—thank you!

Mes0903 · 2025-01-06T15:48:52Z

TL;DR

After implementing the method of incrementing nsec, I encountered some issues. Switching to maintaining the increment of ticks resolved the problems, although I am still unsure why these issues occurred. For now, everything appears to be working well.

Detailed Explanation

Initial Attempt: Incrementing `nsec`

During the implementation of the increment method, I noticed that the time displayed by the emulator during the boot process did not increase proportionally with the number of harts. Instead, it oscillated significantly, albeit with an acceleration effect.

Here is the initial implementation:

#include <math.h>
#include <stdio.h>
#include <time.h>

#include "utils.h"

#if defined(__APPLE__)
#define HAVE_MACH_TIMER
#include <mach/mach_time.h>
#elif !defined(_WIN32) && !defined(_WIN64)
#define HAVE_POSIX_TIMER

/*
 * Use a faster but less precise clock source because we need quick
 * timestamps rather than fine-grained precision.
 */
#ifdef CLOCK_MONOTONIC_COARSE
#define CLOCKID CLOCK_MONOTONIC_COARSE
#else
#define CLOCKID CLOCK_REALTIME_COARSE
#endif
#endif

/* To simplify the example, just hardcode the configuration of SMP here. */
#ifndef SMP
#define SMP 32
#endif

bool boot_complete = false;

/* We want the guest to see 10 seconds by the end of boot.
 *  => 10 seconds = 1e10 ns
 * We'll guess there are about (SMP * 2e8) total calls to
 * semu_timer_total_ticks.
 */
static const double TARGET_BOOT_TIME_NS = 1.0e10; /* 10 seconds in ns */
static const double CALLS_PER_SMP = 2.0e8;        /* from table's assumption */

/* This is the increment for each call, in nanoseconds. */
static double nsec_increment = 0.0;
static struct timespec emulator_time;
static uint64_t offset = 0;

/* For testing how long did the boot process take. */
static uint64_t count = 0;
static struct timespec boot_start;

/* Calculate "x * n / d" without unnecessary overflow or loss of precision.
 *
 * Reference:
 * https://elixir.bootlin.com/linux/v6.10.7/source/include/linux/math.h#L121
 */
static inline uint64_t mult_frac(uint64_t x, uint64_t n, uint64_t d)
{
    const uint64_t q = x / d;
    const uint64_t r = x % d;

    return q * n + r * n / d;
}

static inline uint64_t get_ticks(struct timespec *ts, uint64_t freq)
{
    return ts->tv_sec * freq + mult_frac(ts->tv_nsec, freq, 1e9);
}

void semu_timer_init(semu_timer_t *timer, uint64_t freq)
{
    timer->freq = freq;

    /* From example:
     *   calls = SMP * 2e8
     *   nsec_increment = 1e10 / calls
     */
    double calls = SMP * CALLS_PER_SMP;
    nsec_increment = TARGET_BOOT_TIME_NS / calls;

    /* rebase by emulator_time */
    clock_gettime(CLOCKID, &emulator_time);
    timer->begin = get_ticks(&emulator_time, timer->freq);

    /* For measuring how long the boot process takes */
    boot_start = emulator_time;
}

static uint64_t semu_timer_total_ticks(semu_timer_t *timer)
{
#if defined(HAVE_POSIX_TIMER)
    if (!boot_complete) {
        ++count;
        emulator_time.tv_nsec += nsec_increment;

        /* If overflow 1 second, update sec part */
        if (emulator_time.tv_nsec >= 1.0e9) {
            uint64_t overflow_sec = floor(emulator_time.tv_nsec / 1.0e9);
            emulator_time.tv_sec += overflow_sec;
            emulator_time.tv_nsec = fmod(emulator_time.tv_nsec, 1.0e9);
        }

        return get_ticks(&emulator_time, timer->freq);
    } else {
        static bool first = true;
        if (first) {
            first = false;

            struct timespec rt_boot;
            clock_gettime(CLOCKID, &rt_boot);

            uint64_t rt_base = get_ticks(&rt_boot, timer->freq);
            uint64_t et_base = get_ticks(&emulator_time, timer->freq);

            offset = rt_base - et_base;

            /* print the testing info */
            printf(
                "\033[1;31m[SEMU LOG]: Boot complete, switch to real-time "
                "timer\033[0m\n");

            double boot_time = (rt_boot.tv_sec - boot_start.tv_sec) +
                               (rt_boot.tv_nsec - boot_start.tv_nsec) / 1.0e9;
            printf(
                "\033[1;31m[SEMU LOG]: Boot time: %.5f seconds, called %ld "
                "times semu_timer_clocksource\033[0m\n",
                boot_time, count);

            printf(
                "\033[1;31m[SEMU LOG]: timer->begin: %ld, "
                "rt_base: %ld, et_base: %ld, offset: %ld\033[0m\n",
                timer->begin, rt_base, et_base, offset);
        }

        clock_gettime(CLOCKID, &emulator_time);

        return get_ticks(&emulator_time, timer->freq) - offset;
    }
#elif defined(HAVE_MACH_TIMER)
    static mach_timebase_info_data_t t;
    if (t.denom == 0)
        (void) mach_timebase_info(&t);
    return mult_frac(mult_frac(mach_absolute_time(), t.numer, t.denom), freq,
                     1e9);
#else
    return time(0) * freq;
#endif
}

uint64_t semu_timer_get(semu_timer_t *timer)
{
    return semu_timer_total_ticks(timer) - timer->begin;
}

void semu_timer_rebase(semu_timer_t *timer, uint64_t time)
{
    timer->begin = semu_timer_total_ticks(timer) - time;
}

The output:

mes@DESKTOP-HLQ9F6A:~/semu$ make check SMP=32
  CC    riscv.o
  CC    ram.o
  CC    utils.o
  CC    plic.o
  CC    uart.o
  CC    main.o
  CC    aclint.o
  CC    virtio-blk.o
  CC    virtio-net.o
  CC    netdev.o
  LD    semu
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000013] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.007380] Console: colour dummy device 80x25
[    0.008748] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.009633] pid_max: default: 32768 minimum: 301
[    0.014808] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.015438] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.040353] ASID allocator disabled (0 bits)
[    0.042148] rcu: Hierarchical SRCU implementation.
[    0.042493] rcu:     Max phase no-delay instances is 1000.
[    0.060750] smp: Bringing up secondary CPUs ...
[    0.315316] smp: Brought up 1 node, 32 CPUs
[    0.341302] devtmpfs: initialized
[    0.404967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.405788] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.430439] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.513758] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    0.696968] clocksource: Switched to clocksource riscv_clocksource
[    0.848966] NET: Registered PF_INET protocol family
[    0.853929] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.903390] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.904286] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.905154] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.907162] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.910137] TCP: Hash tables configured (established 4096 bind 4096)
[    0.913478] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.914286] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.919500] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.930174] Unpacking initramfs...
[    0.969899] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    2.029020] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.051915] printk: console [ttyS0] disabled
[    2.052836] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    2.053785] printk: console [ttyS0] enabled
[    2.053785] printk: console [ttyS0] enabled
[    2.054503] printk: bootconsole [ns16550] disabled
[    2.054503] printk: bootconsole [ns16550] disabled
[    2.088792] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    2.114963] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    2.131372] virtio_net virtio0: Assigned random MAC address 2a:34:a1:24:85:37
[    2.160763] clk: Disabling unused clocks
[    6.129851] Freeing initrd memory: 8188K
[    6.138502] Freeing unused kernel image (initmem) memory: 180K
[    6.139086] Kernel memory protection not selected by kernel config.
[    6.139707] Run /init as init process
[SEMU LOG]: Boot complete, switch to real-time timer
[SEMU LOG]: Boot time: 146.10029 seconds, called 7057127651 times semu_timer_clocksource
[SEMU LOG]: timer->begin: 7922787382082, rt_base: 7932283900770, et_base: 7923246095379, offset: 9037805391
[   26.228166] hrtimer: interrupt took 180000431 ns
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login:

Below is the graphs generated based on the data from the tables at the end of this comment:

The results indicated a truncation issue due to timespec using a non-floating-point data type (e.g., long) for storing time. This explains why the emulator’s time did not accurately increment to the target time.

However, it does not explain why emulator time decreased significantly as the number of harts increased. I haven't recognize why the issues occurred.

Transition to Maintaining Ticks

Since the emulator ultimately requires ticks rather than time, time is merely a tool to derive ticks. Thus, I changed the implementation to directly maintain ticks.

Implementation Details

To begin, I renamed semu_timer_clocksource to semu_timer_total_ticks. Based on the previous assumptions, by the time the boot process completes, approximately $SMPs \times 2 \times 10^8$ calls to semu_timer_total_ticks will have occurred.

With a target time set to 10 seconds, this translates to $10 \times SMPs \times 2 \times 10^8$ calls within 10 seconds. Additionally, we have set CLOCK_FREQ to 65,000,000, resulting in 650,000,000 ticks over 10 seconds.

Combining these two points, a simple calculation reveals that each call to semu_timer_total_ticks must increment by:

$$ \frac{10 \times 65,000,000}{SMPs \times 2 \times 10^8} $$

ticks.

Since this result is typically a floating-point number, I used a floating-point type for the accumulation variable. The calculated value is converted to uint64_t before being returned within semu_timer_total_ticks.

During the rebase process, I discovered that directly using semu_timer_rebase(timer, 0) is not feasible. The desired outcome is to align the emulator’s time with real-time, adding subsequent increments to the emulator's existing time. However, using semu_timer_rebase as-is does not achieve this goal.

If we use real-time to rebase, it will disregard the emulator’s previous time, resetting it to 0 and beginning accumulation anew. This leads to time regression.

If we use emulator-time to rebase, it will subtract the emulator’s time from the real-time value, which does not address the RCU CPU stall warning issue.

The proper method is to calculate the offset between real-time and emulator time. This offset, measured in ticks, is then subtracted from the tick count obtained after switching to the real-time timer using clock_gettime. This adjustment ensures accurate alignment of times:

Take the output of SMP=32 below as example, the start ticks is 8188341156344. When switching to real-time timer, the ticks of real-time timer is 8197945583709, the ticks of emulator time is 8189070441181. Thus, the offset is 8875142528:

Here is the preliminary implementation:

#include <math.h>
#include <stdio.h>
#include <time.h>

#include "utils.h"

#if defined(__APPLE__)
#define HAVE_MACH_TIMER
#include <mach/mach_time.h>
#elif !defined(_WIN32) && !defined(_WIN64)
#define HAVE_POSIX_TIMER

/*
 * Use a faster but less precise clock source because we need quick
 * timestamps rather than fine-grained precision.
 */
#ifdef CLOCK_MONOTONIC_COARSE
#define CLOCKID CLOCK_MONOTONIC_COARSE
#else
#define CLOCKID CLOCK_REALTIME_COARSE
#endif
#endif

/* To simplify the example, just hardcode the configuration of SMP here. */
#ifndef SMP
#define SMP 32
#endif

bool boot_complete = false;

/* We want the guest to see 10 seconds by the end of boot.
 *  => 10 seconds = 'SMP * 2e8' total calls to semu_timer_total_ticks.
 * also, 1 second = 'freq' ticks.
 * Thus, per call, we should increment by '10 * freq / (SMP * 2e8)' ticks.
 */
static const uint8_t TARGET_BOOT_TIME_SEC = 10;
static const double ticks_increment =
    (TARGET_BOOT_TIME_SEC * CLOCK_FREQ) / (2.0e8 * SMP);

/* This is the increment for each call, in nanoseconds. */
static struct timespec emulator_time;
static int64_t offset = 0;

/* For testing how long did the boot process take. */
static uint64_t count = 0;
static struct timespec boot_start;
static uint64_t boot_start_ticks;
static double boot_ticks;

/* Calculate "x * n / d" without unnecessary overflow or loss of precision.
 *
 * Reference:
 * https://elixir.bootlin.com/linux/v6.10.7/source/include/linux/math.h#L121
 */
static inline uint64_t mult_frac(uint64_t x, uint64_t n, uint64_t d)
{
    const uint64_t q = x / d;
    const uint64_t r = x % d;

    return q * n + r * n / d;
}

static inline uint64_t get_ticks(struct timespec *ts, uint64_t freq)
{
    return ts->tv_sec * freq + mult_frac(ts->tv_nsec, freq, 1e9);
}

void semu_timer_init(semu_timer_t *timer, uint64_t freq)
{
    timer->freq = freq;

    /* rebase by emulator_time */
    clock_gettime(CLOCKID, &emulator_time);
    boot_start_ticks = get_ticks(&emulator_time, timer->freq);
    timer->begin = boot_start_ticks;
    boot_ticks = boot_start_ticks;

    printf("\033[1;31m[SEMU LOG]: Start ticks: %lu\033[0m\n", boot_start_ticks);
    printf("\033[1;31m[SEMU LOG]: ticks_increment: %f\033[0m\n",
           ticks_increment);

    /* For measuring how long the boot process takes */
    boot_start = emulator_time;
}

static uint64_t semu_timer_total_ticks(semu_timer_t *timer)
{
#if defined(HAVE_POSIX_TIMER)
    if (!boot_complete) {
        ++count;
        boot_ticks += ticks_increment;
        return (uint64_t) boot_ticks;
    } else {
        static bool first = true;
        if (first) {
            first = false;

            struct timespec rt_boot;
            clock_gettime(CLOCKID, &rt_boot);

            uint64_t rt_base = get_ticks(&rt_boot, timer->freq);
            uint64_t et_base = (uint64_t) boot_ticks;

            offset = rt_base - et_base;

            /* print the testing info */
            printf(
                "\033[1;31m[SEMU LOG]: Boot complete, switch to real-time "
                "timer\033[0m\n");

            double boot_time = (rt_boot.tv_sec - boot_start.tv_sec) +
                               (rt_boot.tv_nsec - boot_start.tv_nsec) / 1.0e9;
            printf(
                "\033[1;31m[SEMU LOG]: Boot time: %.5f seconds, called %ld "
                "times semu_timer_total_ticks\033[0m\n",
                boot_time, count);

            printf(
                "\033[1;31m[SEMU LOG]: timer->begin: %lu, "
                "rt_base: %lu, et_base: %lu, offset: %ld\033[0m\n",
                timer->begin, rt_base, et_base, offset);
        }

        clock_gettime(CLOCKID, &emulator_time);
        return get_ticks(&emulator_time, timer->freq) - offset;
    }
#elif defined(HAVE_MACH_TIMER)
    static mach_timebase_info_data_t t;
    if (t.denom == 0)
        (void) mach_timebase_info(&t);
    return mult_frac(mult_frac(mach_absolute_time(), t.numer, t.denom), freq,
                     1e9);
#else
    return time(0) * freq;
#endif
}

uint64_t semu_timer_get(semu_timer_t *timer)
{
    return semu_timer_total_ticks(timer) - timer->begin;
}

void semu_timer_rebase(semu_timer_t *timer, uint64_t time)
{
    timer->begin = semu_timer_total_ticks(timer) - time;
}

The output:

mes@DESKTOP-HLQ9F6A:~/semu$ make check SMP=32
  CC    riscv.o
  CC    ram.o
  CC    utils.o
  CC    plic.o
  CC    uart.o
  CC    main.o
  CC    aclint.o
  CC    virtio-blk.o
  CC    virtio-net.o
  CC    netdev.o
  LD    semu
 DTC    minimal.dtb
Ready to launch Linux kernel. Please be patient.
failed to allocate TAP device: Operation not permitted
[SEMU LOG]: Start ticks: 8188341156344
[SEMU LOG]: ticks_increment: 0.101562
[    0.000000] Linux version 6.1.99 (jserv@node1) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2024.02.4) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Thu Jul 18 13:04:10 CST 2024
[    0.000000] Machine model: semu
[    0.000000] earlycon: ns16550 at MMIO 0xf4000000 (options '')
[    0.000000] printk: bootconsole [ns16550] enabled
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x999 Version=0x1
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s11604 r8192 d21164 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130048
[    0.000000] Kernel command line: earlycon console=ttyS0
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 126976 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 262144 bytes
[    0.000000] printk: early log buf free: 63952(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 504100K/524288K available (3578K kernel code, 345K rwdata, 873K rodata, 185K init, 140K bss, 20188K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] plic: interrupt-controller@0: mapped 31 interrupts with 32 handlers for 32 contexts.
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] riscv-timer: riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0xefdb196da, max_idle_ns: 440795204367 ns
[    0.000021] sched_clock: 64 bits at 65MHz, resolution 15ns, wraps every 2199023255550ns
[    0.011532] Console: colour dummy device 80x25
[    0.013860] Calibrating delay loop (skipped), value calculated using timer frequency.. 130.00 BogoMIPS (lpj=260000)
[    0.015045] pid_max: default: 32768 minimum: 301
[    0.023329] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.024314] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.064384] ASID allocator disabled (0 bits)
[    0.067303] rcu: Hierarchical SRCU implementation.
[    0.067842] rcu:     Max phase no-delay instances is 1000.
[    0.097174] smp: Bringing up secondary CPUs ...
[    0.505388] smp: Brought up 1 node, 32 CPUs
[    0.546817] devtmpfs: initialized
[    0.648255] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.649787] futex hash table entries: 8192 (order: 7, 524288 bytes, linear)
[    0.688998] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.821427] platform soc@F0000000: Fixed dependency cycle(s) with /soc@F0000000/interrupt-controller@0
[    1.117924] clocksource: Switched to clocksource riscv_clocksource
[    1.360409] NET: Registered PF_INET protocol family
[    1.368445] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    1.447862] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    1.449541] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    1.450612] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    1.454023] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    1.458664] TCP: Hash tables configured (established 4096 bind 4096)
[    1.463880] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    1.465421] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    1.473833] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    1.488511] Unpacking initramfs...
[    1.540352] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    3.236442] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    3.273553] printk: console [ttyS0] disabled
[    3.274724] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    3.276207] printk: console [ttyS0] enabled
[    3.276207] printk: console [ttyS0] enabled
[    3.277608] printk: bootconsole [ns16550] disabled
[    3.277608] printk: bootconsole [ns16550] disabled
[    3.332378] virtio_blk virtio1: 1/0/0 default/read/poll queues
[    3.374667] virtio_blk virtio1: [vda] 4800 512-byte logical blocks (2.46 MB/2.34 MiB)
[    3.400856] virtio_net virtio0: Assigned random MAC address 2a:2b:84:52:fe:ab
[    3.445666] clk: Disabling unused clocks
[    9.770108] Freeing initrd memory: 8188K
[    9.784206] Freeing unused kernel image (initmem) memory: 180K
[    9.785467] Kernel memory protection not selected by kernel config.
[    9.786437] Run /init as init process
[SEMU LOG]: Boot complete, switch to real-time timer
[SEMU LOG]: Boot time: 147.76042 seconds, called 7180650711 times semu_timer_total_ticks
[SEMU LOG]: timer->begin: 8188341156344, rt_base: 8197945583709, et_base: 8189070441181, offset: 8875142528
[   33.089529] hrtimer: interrupt took 70000215 ns
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK

Welcome to Buildroot
buildroot login:

Output and Results Comparison

Below are the results comparing the three methods: scale freq by 10, increment nsec, and increment ticks.

Times Called semu_timer_clocksource:

SMP	scale freq by `10`	increment `nsec`	increment ticks
1	223,992,364	239,936,190	239,937,385
2	382,486,686	410,350,413	410,377,969
3	577,491,593	597,403,029	599,377,578
4	774,125,110	800,859,292	825,078,230
5	973,274,729	1,006,497,972	1,005,718,212
6	1,174,038,398	1,211,971,435	1,213,769,036
7	1,377,244,622	1,417,808,155	1,422,112,653
8	1,605,001,986	1,622,327,309	1,627,353,803
9	1,793,136,295	1,832,897,356	1,841,933,374
10	2,005,988,752	2,053,838,405	2,056,838,991
11	2,220,126,569	2,258,511,835	2,266,359,688
12	2,440,897,255	2,545,025,764	2,483,505,990
13	2,651,860,790	2,732,577,418	2,703,809,852
14	2,882,701,067	2,898,655,823	2,917,853,568
15	3,103,978,838	3,122,944,650	3,141,405,055
16	3,343,030,072	3,355,869,282	3,450,041,050
17	3,566,365,881	3,534,461,625	3,587,717,644
18	3,800,214,669	3,770,840,706	3,819,287,621
19	4,031,961,176	3,997,057,349	4,047,732,234
20	4,280,331,336	4,240,098,183	4,277,291,251
21	4,516,731,902	4,470,467,091	4,506,903,787
22	4,883,959,327	4,822,792,362	4,731,616,330
23	5,143,022,258	4,954,544,816	4,975,643,903
24	5,260,058,753	5,198,275,343	5,208,915,523
25	5,526,277,854	5,592,128,390	5,601,777,728
26	5,790,681,086	5,566,017,777	5,696,829,697
27	6,044,658,240	5,813,936,910	5,932,303,468
28	6,328,119,424	6,059,320,278	6,189,134,088
29	6,598,156,499	6,304,558,565	6,425,892,515
30	6,868,480,625	6,553,948,238	6,682,215,118
31	7,129,979,196	6,808,544,491	6,935,364,394
32	7,410,129,712	7,067,744,976	7,180,650,711

Boot Process Time (seconds):

SMP	scale freq by `10`	increment `nsec`	increment ticks
1	3.40001	3.43001	3.37001
2	8.01002	7.82002	7.60002
3	13.44003	12.97003	13.00003
4	17.85185	17.59004	16.74004
5	22.94007	21.54004	21.20004
6	27.11009	26.15005	25.74005
7	31.80010	30.50006	30.52006
8	37.52011	34.66007	34.33017
9	41.41014	39.05009	38.61007
10	45.53015	42.95009	42.74009
11	51.66018	47.63010	46.40010
12	56.13018	50.37010	51.27010
13	60.71019	53.60011	55.89012
14	65.92020	60.54012	60.16012
15	70.30022	65.15013	65.81013
16	76.31025	70.09014	67.95014
17	80.24026	73.90014	73.29030
18	86.59028	78.92016	78.91032
19	92.00030	82.91016	84.04035
20	94.47030	88.05019	88.06036
21	101.68033	92.53019	92.11038
22	104.95035	94.37019	96.46040
23	110.69036	103.57021	101.64042
24	118.59098	107.63022	106.39043
25	125.30041	109.88022	108.33045
26	132.98045	114.10023	116.97026
27	140.04046	120.04024	121.35049
28	146.18047	124.85025	128.63053
29	154.15050	129.38026	131.42054
30	159.83052	135.39027	136.87057
31	163.82054	140.07028	142.77042
32	170.80054	146.00028	147.76042

Emulator Time:

SMP	emulator time (`nsec`)	emulator time (ticks)
1	11.086913	11.087134
2	9.344152	9.334738
3	8.656941	9.047707
4	8.694160	9.359388
5	9.095570	9.093272
6	8.750582	9.135809
7	8.944581	9.148663
8	8.759224	9.156270
9	8.237356	9.208044
10	9.205089	9.225684
11	8.083883	9.235738
12	9.128268	9.241278
13	7.331282	9.288196
14	7.742981	9.305096
15	8.340038	9.334116
16	8.955129	9.622751
17	6.269885	9.388393
18	6.666563	9.411124
19	7.066534	9.423658
20	7.491480	9.434454
21	7.888990	9.449546
22	8.529552	9.467739
23	8.726807	9.551230
24	9.146896	9.573380
25	9.866130	9.875079
26	4.871893	9.620185
27	5.083991	9.613555
28	5.293480	9.695915
29	5.501661	9.714826
30	5.713393	9.741353
31	5.929234	9.735223
32	6.150587	9.786437

Due to the slower environment of my workstation, I tested only SMP=4 and SMP=32 on my workstation:

SMP	Scale Freq by 10	Increment Ticks
4	70.87429	56.76912
32	634.91923	578.99135

It can show that this approach saved nearly one minute on the slower environment.

Remarks

Maintaining ticks resolved the inconsistency issues encountered with nsec increments. When using the nsec increment method, each boot process execution yield slight differences in the number of semu_timer_total_ticks calls and the emulator's internal time. These variations result in inconsistent internal emulator time across runs.

However, the ticks increment method consistently produces identical results for every run. I think this is because the value added to the ticks during each call to semu_timer_total_ticks is fixed, resulting in the emulator time being the same during each boot process.

But there is an fluctuation when SMP is set to 4, 16 and 26. I am uncertain whether this is entirely expected. Aside from this, there is no operational problems been observed in the emulator's functionality so far.

Currently, the number of SMPs is hardcoded in the code as a macro. Therefore, when testing, adjustments must be made according to the SMP count specified at compile time.

Mes0903 · 2025-01-06T16:20:59Z

Comparatively, the scaling method appears more stable (as it only scales time). Also, there are still unexplained results when using the ticks increment method. Therefore, I am unsure which method is better.

Based on these results, I’d like to ask for your opinions. Currently, the advantages and disadvantages of the scaling method are as follows:

Advantages:
- The results are highly interpretable.
- It is easy to calculate the correspondence between emulator time and real time during the boot process.
Disadvantages:
- A larger scaling parameter is required to ensure the elimination of the RCU CPU stall warning.
  - I believe a value of 1000 is ideal—it is large enough and allows real time to be easily understood at a glance.
- Longer execution time:
  - On my workstation (a slower environment), the difference between the two methods for SMP=32 is nearly one minute.

Advantages of the Ticks Increment Method:

It provides highly consistent boot times, ensuring the elimination of the RCU CPU stall warning.
Faster execution during the boot process.

Disadvantages:

Some results remain unexplained.
The displayed time during the boot process is difficult to map back to real time (there might be a way, but I am unaware of it).

As everything is confirmed, I think I can start to submit a PR.

BTW, In the sample code above, I’ve added a debug message showing the boot process duration at startup for easier debugging and performance testing. Should we include this message in the final PR submission?

jserv · 2025-01-06T16:36:16Z

BTW, In the sample code above, I’ve added a debug message showing the boot process duration at startup for easier debugging and performance testing. Should we include this message in the final PR submission?

Don't do this.

Mes0903 · 2025-01-07T17:36:18Z

However, the ticks increment method consistently produces identical results for every run. I think this is because the value added to the ticks during each call to semu_timer_total_ticks is fixed, resulting in the emulator time being the same during each boot process.

I noticed that the results (the times call semu_timer_total_ticks) are not always consistent. After retesting the second method a day later, I observed that the values had changed. However, after a few hours, the results could sometimes revert to being identical to the previous values. In contrast, when conducting multiple tests within a short period under the same SMP configuration, the results remained the same.

semu_timer_total_ticks is called in two locations: semu_timer_get and semu_timer_rebase. In other words, semu_timer_total_ticks reflects the number of times the Guest OS reads or writes to the timer.

I am unsure why the results remain consistent over short intervals but fluctuate when retested after a day. Additionally, I am not certain whether these results are fully analyzable. I speculate that this behavior is closely related to the state of the Host OS.

At the very least, the consistency of the short-term results is good. The table below shows the values I just measured. Some values are identical to those from yesterday, while most are different. However, they still exhibit a similar distribution.

SMP	times call `semu_timer_total_ticks`	deviation	increment of ticks on each call	total ticks(emulator time)
1	239,937,385	16.64%	3.250000	779,796,501.25
2	410,377,969	2.52%	1.625000	666,864,199.625
3	600,190,253	0.03%	1.083333	650,205,907.353249
4	825,078,230	3.03%	0.812500	670,376,061.875
5	1,007,098,511	0.70%	0.650000	654,614,032.15
6	1,213,304,632	1.09%	0.541667	657,207,080.101544
7	1,419,857,500	1.39%	0.464286	659,219,959.245
8	1,627,353,803	1.68%	0.406250	661,112,482.46875
9	1,835,991,887	1.96%	0.361111	662,996,866.306457
10	2,056,837,458	2.76%	0.325000	668,472,173.85
11	2,269,651,319	3.06%	0.295455	670,579,830.455145
12	2,485,299,111	3.43%	0.270833	673,101,014.129463
13	2,703,809,852	3.83%	0.250000	675,952,463
14	2,917,880,774	4.03%	0.232143	677,365,596.518682
15	3,134,960,385	4.30%	0.216667	679,242,461.736795
16	3,450,041,050	7.24%	0.203125	700,789,588.28125
17	3,587,880,492	5.23%	0.191176	685,916,640.938592
18	3,810,315,132	5.51%	0.180556	687,975,258.973392
19	4,052,935,937	6.24%	0.171053	693,266,850.831661
20	4,274,307,156	6.41%	0.162500	694,574,912.85
21	4,503,423,577	6.73%	0.154762	696,958,839.623674
22	4,742,850,891	7.22%	0.147727	700,647,133.574757
23	4,973,145,039	7.50%	0.141304	702,725,286.590856
24	5,213,953,075	7.93%	0.135417	706,057,883.557275
25	5,597,276,263	10.67%	0.130000	727,645,914.19
26	5,696,829,697	8.72%	0.125000	712,103,712.125
27	5,935,553,504	9.02%	0.120370	714,462,575.27648
28	6,173,571,960	9.29%	0.116071	716,572,670.96916
29	6,620,649,951	12.39%	0.112069	741,969,619.358619
30	6,683,572,727	10.22%	0.108333	724,051,484.234091
31	6,942,616,048	10.69%	0.104839	727,856,923.856272
32	7,180,650,711	10.87%	0.101562	729,281,247.510582

This table helps explain why there is oscillation in the results. We assume that for every increment of 1 in SMP, the number of times semu_timer_total_ticks is called should increase by $2 \times 10^8$. However, in reality, there is always some deviation. This deviation currently shows an upward trend as SMP increases, and in certain configurations (e.g., SMP set to 1, 2, 4, 16, 25, or 29), the deviation becomes more pronounced. These points also exhibit significant increments in the plotted graphs.

Additionally, it is noteworthy that the total ticks are always greater than the ticks corresponding to 10 seconds (650,000,000). However, the time displayed during the boot process is almost always less than 10 seconds. Dividing the total ticks by 650,000,000 yields the target boot time, and subtracting this from the displayed time results in a value that generally falls between 0.9 and 1.5 seconds.

In summary, there are still several unexplained aspects of this method's results:

The number of times semu_timer_total_ticks is called occasionally varies.
In certain SMP configurations, the time deviation becomes larger.
Even though the total ticks exceed the target ticks, the emulator time is still less than the target time (10 seconds), with the gap ranging between 0.9 and 1.5 seconds.

Nonetheless, this method reduces the execution time of the boot process. Although the resulting values are less interpretable, the deviations remain within an acceptable range. Therefore, I think the second method is still preferable.

Mes0903 · 2025-01-10T21:16:43Z

After several days of analysis and discussion, I think the acceleration effect of the increment method is limited. However, since this method requires complete management of the tick count, it results in increased program complexity. At the same time, it has led to some currently unexplained results, but it remains an interesting experiment that I believe will be helpful for future analysis..

Therefore, I revisited the first scale method. After designing the scale parameter, it is now capable of setting the target boot time. The only remaining drawback is that the boot process is slower. Consequently, I plan to submit a PR using the scale method.

Below is an explanation of the sample code:

Using gprof, I found that semu_timer_clocksource accounts for approximately 10% to 15% of the boot process time:

CPU: 13th Gen Intel(R) Core(TM) i7-13700

 mes@DESKTOP-HLQ9F6A:~/semu$ gprof ./semu
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  Ts/call  Ts/call  name    
  39.91     55.19    55.19                             vm_step
  10.55     69.78    14.59                             semu_timer_clocksource
   9.27     82.60    12.82                             semu_start
   6.49     91.57     8.97                             aclint_mswi_update_interrupts
   6.19    100.13     8.56                             mmu_store
   5.73    108.05     7.92                             mmu_translate
   3.98    113.56     5.51                             semu_timer_get
   3.33    118.16     4.60                             op_rv32i
   2.92    122.20     4.04                             aclint_mtimer_update_interrupts
   2.76    126.02     3.82                             aclint_sswi_update_interrupts
   2.10    128.93     2.91                             ram_read
   1.59    131.13     2.20                             mmu_load
   1.27    132.89     1.76                             mem_store
   1.19    134.54     1.65                             _init
   0.62    135.40     0.86                             mem_load
   0.46    136.04     0.64                             u8250_check_ready
   0.41    136.61     0.57                             ram_write
   0.39    137.15     0.54                             csr_read
   0.37    137.66     0.51                             virtio_net_refresh_queue
   0.28    138.05     0.39                             mem_fetch
   0.17    138.28     0.23                             op_csr_cs

CPU: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

 mes@Mes:~/semu$ gprof ./semu
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  Ts/call  Ts/call  name    
  32.96     77.77    77.77                             vm_step
  15.96    115.43    37.66                             semu_timer_clocksource
  13.95    148.34    32.91                             semu_start
   6.10    162.74    14.40                             aclint_mswi_update_interrupts
   5.95    176.77    14.03                             mmu_store
   5.35    189.39    12.62                             mmu_translate
   3.85    198.47     9.08                             aclint_mtimer_update_interrupts
   2.73    204.91     6.44                             op_rv32i
   2.46    210.72     5.81                             ram_read
   2.11    215.70     4.98                             aclint_sswi_update_interrupts
   1.74    219.81     4.11                             mmu_load
   1.74    223.91     4.10                             semu_timer_get
   1.30    226.98     3.07                             mem_store
   0.92    229.14     2.16                             ram_write
   0.81    231.05     1.91                             _init
   0.61    232.48     1.43                             mem_load
   0.55    233.78     1.30                             csr_read
   0.35    234.60     0.82                             u8250_check_ready
   0.22    235.11     0.51                             op_csr_cs
   0.21    235.61     0.50                             virtio_net_refresh_queue
   0.14    235.95     0.34                             mem_fetch
   0.00    235.96     0.01                             semu_timer_init

The logic inside semu_timer_clocksource is relatively simple. Thus, I assume its execution time is almost equal to clock_gettime. Based on the previous discussions, where we estimated that semu_timer_clocksource is called approximately $(2 \times 10^8)$ times, we can roughly predict the boot process will take

$$ \text{SecPerCall} \times \text{SMPs} \times 2 \times 10^8 \times \frac{100}{10\sim15} $$

seconds.

And for example, if this calculation results in 200 seconds, we can use the target time to set the scale factor. Assuming the target time is 10 seconds, the scaling factor would be $(10 / 200)$. It makes us still can set the target time for boot process.

Below is the sample code:

Bogomips:

 #ifndef SEMU_SMP
 #define SEMU_SMP 1
 #endif
 
 #ifndef SEMU_BOOT_TARGET_TIME
 #define SEMU_BOOT_TARGET_TIME 10
 #endif
 
 bool boot_complete = false;
 static double scale_factor;
 
 static double measure_bogomips_ns(uint64_t target_loop)
 {
     struct timespec start, end;
     clock_gettime(CLOCKID, &start);
 
     for (uint64_t loops = 0; loops < target_loop; loops++)
         clock_gettime(CLOCKID, &end);
 
     int64_t sec_diff = end.tv_sec - start.tv_sec;
     int64_t nsec_diff = end.tv_nsec - start.tv_nsec;
     double ns_per_call = (sec_diff * 1e9 + nsec_diff) / target_loop;
     double predict_sec = ns_per_call * SEMU_SMP * 2;
     scale_factor = SEMU_BOOT_TARGET_TIME / predict_sec;
 
     return ns_per_call;
 }

This function is called at the beginning of semu_timer_init, and the two macros are currently configured directly in the Makefile:

 //...
 SMP ?= 1
 CFLAGS += -D SEMU_SMP=$(SMP)
 CFLAGS += -D SEMU_BOOT_TARGET_TIME=10
 //...

semu_timer_clocksource

 static uint64_t semu_timer_clocksource(semu_timer_t *timer)
 {
 #if defined(HAVE_POSIX_TIMER)
     static int64_t offset = 0;
 
     struct timespec emulator_time;
     clock_gettime(CLOCKID, &emulator_time);
 
     if (!boot_complete) {
         return get_ticks(&emulator_time, timer->freq * scale_factor);
     } else {
         static bool first = true;
         if (first) {
             first = false;
             offset = get_ticks(&emulator_time, timer->freq) -
                      get_ticks(&emulator_time, timer->freq * scale_factor);
         }
 
         return get_ticks(&emulator_time, timer->freq) - offset;
     }
 #elif defined(HAVE_MACH_TIMER)
     static mach_timebase_info_data_t emulator_time;
     if (emulator_time.denom == 0)
         (void) mach_timebase_info(&emulator_time);
     return mult_frac(mult_frac(mach_absolute_time(), emulator_time.numer,
                                emulator_time.denom),
                      freq, 1e9);
 #else
     return time(0) * freq;
 #endif
 }

Currently, only the HAVE_POSIX_TIMER method has been implemented. I am not familiar with macOS, so I may need to investigate further. I plan to complete it and submit the PR (#67) tomorrow.

jserv assigned Mes0903 Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCU CPU stall warning in a multi-core system simulation #51

RCU CPU stall warning in a multi-core system simulation #51

chiangkd commented Aug 4, 2024

Mes0903 commented Dec 25, 2024 •

edited by jserv

Loading

chiangkd commented Dec 25, 2024

Mes0903 commented Dec 26, 2024

Mes0903 commented Dec 26, 2024 •

edited

Loading

chiangkd commented Dec 26, 2024

Mes0903 commented Dec 26, 2024

jserv commented Dec 26, 2024

jserv commented Dec 26, 2024

Mes0903 commented Dec 29, 2024 •

edited

Loading

Mes0903 commented Jan 3, 2025 •

edited

Loading

jserv commented Jan 3, 2025

chiangkd commented Jan 3, 2025

RinHizakura commented Jan 4, 2025 •

edited

Loading

Mes0903 commented Jan 5, 2025

RinHizakura commented Jan 5, 2025

chiangkd commented Jan 5, 2025

Mes0903 commented Jan 6, 2025 •

edited

Loading

Mes0903 commented Jan 6, 2025

jserv commented Jan 6, 2025

Mes0903 commented Jan 7, 2025

Mes0903 commented Jan 10, 2025 •

edited

Loading

RCU CPU stall warning in a multi-core system simulation #51

RCU CPU stall warning in a multi-core system simulation #51

Comments

chiangkd commented Aug 4, 2024

Mes0903 commented Dec 25, 2024 • edited by jserv Loading

chiangkd commented Dec 25, 2024

Mes0903 commented Dec 26, 2024

Mes0903 commented Dec 26, 2024 • edited Loading

chiangkd commented Dec 26, 2024

Mes0903 commented Dec 26, 2024

jserv commented Dec 26, 2024

jserv commented Dec 26, 2024

Mes0903 commented Dec 29, 2024 • edited Loading

Mes0903 commented Jan 3, 2025 • edited Loading

Methods

1. Scale Frequency

Pros

Cons

2. Manually Increment nsec

Pros

Cons

Synchronization and Time Rebase

Analysis of Method 2: Empirical Estimation

Target Time Configuration

jserv commented Jan 3, 2025

chiangkd commented Jan 3, 2025

RinHizakura commented Jan 4, 2025 • edited Loading

Mes0903 commented Jan 5, 2025

RinHizakura commented Jan 5, 2025

chiangkd commented Jan 5, 2025

Mes0903 commented Jan 6, 2025 • edited Loading

TL;DR

Detailed Explanation

Initial Attempt: Incrementing nsec

Transition to Maintaining Ticks

Implementation Details

Output and Results Comparison

Remarks

Mes0903 commented Jan 6, 2025

jserv commented Jan 6, 2025

Mes0903 commented Jan 7, 2025

Mes0903 commented Jan 10, 2025 • edited Loading

Mes0903 commented Dec 25, 2024 •

edited by jserv

Loading

Mes0903 commented Dec 26, 2024 •

edited

Loading

Mes0903 commented Dec 29, 2024 •

edited

Loading

Mes0903 commented Jan 3, 2025 •

edited

Loading

2. Manually Increment `nsec`

RinHizakura commented Jan 4, 2025 •

edited

Loading

Mes0903 commented Jan 6, 2025 •

edited

Loading

Initial Attempt: Incrementing `nsec`

Mes0903 commented Jan 10, 2025 •

edited

Loading