Sample Header Ad - 728x90

Why is the Linux CPU stalling when having multithreaded memory writes?

0 votes
1 answer
118 views
HW Specs: - CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX - RAM: 512 GB, Manufacturer unknown, will try to provide if needed Linux Specs: - OS: Ubuntu 22.04.4 LTS - Linux Kernel: 5.15.0-119-generic I'm trying to get model training to work with pytorch on a Linux server, where I have observed performance degradation of a factor ~10 after letting a resource intensive training task run for a couple of minutes (Training on 4 GPUs having a multithreaded Dataloader each). Trying to isolate the root cause for this issue, I have now come up with a minimal test in python reproducing the issue, by continuously writing 1GB of data to RAM. Running this with 32 Threads in parallel (CPU has 128 Threads available) the CPU stalls after 0%) until giving it some cooldown time of approx 1min. I have run the test on another server (48 CPU threads, 160GB RAM) for 10 minutes without any problems (On this server multi-GPU training is also running without any performance degradation). Opposed to the self-implemented memory write test, I have also tried a benchmark test using sysbench, writing 10TB of data with up to 96 Threads without any problem. This is where I don't really understand the difference, whether this task writes the data only in some sort of buffer without really allocating any RAM memory? I ran the test with the follwing command: sysbench --threads=96 --time=0 --memory-block-size=128K --memory-total-size=10T --report-interval=1 --memory-oper=write memory run The main observable difference of sysbench to my python test script was in htop, where sysbench had all threads running as normal priority/user threads (green bars) while my python script caused a large portion being kernel time (red bars), in my understanding caused by a lot of wait time required. My question now is, does this diagnostic give some indication on why the system is stalling? Might there be a hardware issue with RAM or could this be an issue with the OS? Or what further tests could I do to isolate the root cause? --- Edit: In the following you can find the minimal python script:
import time
import numpy as np
import threading

data = np.zeros((1024, 1024, 1024, 1), dtype=np.uint8)

def allocate_memory():
    while True:
        start_time = time.time()
        _ = data * 0
        end_time = time.time()
        print(f"Time: {end_time - start_time:.3f} s")
    
    print(data.shape)

def run_in_threads(num_threads):
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=allocate_memory)
        thread.start()
        threads.append(thread)
    
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    num_threads = 32
    run_in_threads(num_threads)
Asked by m4fr1699 (1 rep)
Nov 5, 2024, 08:27 AM
Last activity: Nov 7, 2024, 08:58 AM