Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

65 views

Need help to understand a kernel "deadlock" in armhf kernel 5.4

I'm struggling to understand a condition that occurs rarely (MTBF measured in weeks to months), but results in a fatal kernel lockup when it does. The environment is kernel 5.4.0, running on a dual-core ARM A9 32-bit (armhf) processor in an AMD/Xilinx Zynq 7020 SoC. I've been able to examine stack t...

                                  I'm struggling to understand a condition that occurs rarely (MTBF measured in weeks to months), but results in a fatal kernel lockup when it does.

The environment is kernel 5.4.0, running on a dual-core ARM A9 32-bit (armhf) processor in an AMD/Xilinx Zynq 7020 SoC.
I've been able to examine stack traces on two failed devices using the Xilinx Vitis system debugger. In both cases, both cores were waiting for (attempting to lock) the same spinlock. The spinlock is the pi_lock member of the struct task_struct structure associated with a threaded IRQ handler.

Core #0 is running the idle process, responding to an interrupt, attempting to wake up the IRQ handler thread. Core #1 is reading the /proc//stat sysfs pseudofile associated with the IRQ handler thread (while reading all of the sysfs pseudofiles of the form /proc//stat on behalf of pidof, probably at the request of sessionclean, invoked by apache2). 

The interrupt being handled on one device is for the BQ25890 battery charger (asserted ~zero to 10 times per second, and on the other device, for a custom kernel module (~100/sec.). Both interrupts use the default primary IRQ handler.

The current hypothesis is that that core #0 got stuck first, followed some time later by core #1, rather than both occurring simultaneously, but this is speculative.

Stack traces for the two cores and threaded IRQ handler source for one of the devices are below.

**Questions**

Is there enough information in the stack traces to provide a clue as to what went wrong? If so, what went wrong? If not, what additional information is needed? One of the devices is still available, and some (not all) variables can be interrogated.
Is the fact that interrupts are disabled (cpsr.i == 1 for both cores) significant?

Please note that because of the extremely long time between failures, black-box testing techniques - that is, making changes in order to observe the result - are not practical.

**ARM Cortex-A9 MPCore #0 (Suspended)**

    0x80165d68: ./include/linux/compiler.h, line 199
    0x80165d68 arch_spin_lock(): .../arch/arm/include/asm/spinlock.h, line 75
    0x8076f774: ./include/linux/spinlock.h, line 193
    0x8076f774: ...include/linux/spinlock_api_smp.h, line 119
    0x8076f774 _raw_spin_lock_irqsave(): kernel/locking/spinlock.c, line 159
    0x8014a394 try_to_wake_up(): kernel/sched/core.c, line 2551
    0x8014a724 wake_up_process(): kernel/sched/core.c, line 2667
    0x8016f600 __irq_wake_thread(): kernel/irq/handle.c, line 134
    0x8016f820 __handle_irq_event_percpu(): kernel/irq/handle.c, line 167
    0x8016f888 handle_irq_event_percpu(): kernel/irq/handle.c, line 189
    0x8016f924 handle_irq_event(): kernel/irq/handle.c, line 206
    0x80174230 handle_level_irq(): kernel/irq/chip.c, line 650
    0x8016e808: ./include/linux/irqdesc.h, line 156
    0x8016e808 generic_handle_irq(): kernel/irq/irqdesc.c, line 644
    0x80447a20: drivers/gpio/gpio-zynq.c, line 632
    0x80447a20 zynq_gpio_irqhandler(): drivers/gpio/gpio-zynq.c, line 661
    0x8016e808: ./include/linux/irqdesc.h, line 156
    0x8016e808 generic_handle_irq(): kernel/irq/irqdesc.c, line 644
    0x8016eec4 __handle_domain_irq(): kernel/irq/irqdesc.c, line 681
    0x80102300: ./include/linux/irqdesc.h, line 174
    0x80102300 gic_handle_irq(): drivers/irqchip/irq-gic.c, line 383
    0x80101a70 __irq_svc(): arch/arm/kernel/entry-armv.S, line 212
    0x805827c0 cpuidle_enter_state(): .../arch/arm/include/asm/irqflags.h, line 39
    0x805829d8 cpuidle_enter(): drivers/cpuidle/cpuidle.c, line 344
    0x8014ecf0: kernel/sched/idle.c, line 117
    0x8014ecf0: kernel/sched/idle.c, line 201
    0x8014ecf0 do_idle(): kernel/sched/idle.c, line 263
    0x8014eeac cpu_startup_entry(): kernel/sched/idle.c, line 355
    0x80769828 rest_init(): init/main.c, line 451
    0x80c00b30 arch_call_rest_init(): init/main.c, line 572
    0x80c00f78 start_kernel(): init/main.c, line 784
    0x00000000
    0x00000000

**The offending lock**

    lock	arch_spinlock_t *	{{slock=0x94449440, tickets={owner=0x9440, next=0x9444}}}	
    	slock	u32	0x94449440	
    	tickets	struct __raw_tickets	{owner=0x9440, next=0x9444}	
    		owner	u16	0x9440	
    		next	u16	0x9444	

**Threaded IRQ handler source**

Note that this code does not appear in the call stacks, but is the code for the thread that core #0 is attempting to wake up.

On one of the devices:

    static irqreturn_t bq25890_irq_handler_thread(int irq, void *private)
    {
    	struct bq25890_device *bq = private;
    	int ret;
    	struct bq25890_state state;
    
    	ret = bq25890_get_chip_state(bq, &state);
    	if (ret lock);
    	bq->state = state;
    	mutex_unlock(&bq->lock);
    
    	power_supply_changed(bq->charger);
    
    handled:
    	return IRQ_HANDLED;
    }

On the other device:

    static irqreturn_t core100_irq_handler( int irq, void* data )
    {
    	struct core100_drvdata* p_info;
    	struct core100_block_header* next;
    	unsigned long iflags;
    	p_info = ( struct core100_drvdata* ) data;
    	if ( ( p_info->state != state_reading ) && ( p_info->state != state_writing ) )
    	{
    		return IRQ_HANDLED;
    	}
    	next = &p_info->current_block->next->header;
    	next->count_at_pps = read_register( p_info->count_pps_register );
    	next->frc_at_pps = read_register( p_info->frc_pps_register );
    	next->count = adc_counter();
    	next->frc = free_running_counter();
    	spin_lock_irqsave( &p_info->state_lock, iflags );
    	p_info->dma_head += bytes_per_block( p_info );
    	if ( p_info->dma_head - p_info->dma_tail > bytes_per_buffer( p_info ) )
    	{
    		p_info->dma_tail = p_info->dma_head - bytes_per_buffer( p_info );
    	}
    	p_info->current_block = p_info->current_block->next;
    	wake_up_interruptible( &p_info->wq );
    	spin_unlock_irqrestore( &p_info->state_lock, iflags );
    	return IRQ_HANDLED;
    }

Kernel source: 
[struct task_struct]


**ARM Cortex-A9 MPCore #1 (Suspended)**

    0x80165d68: ./include/linux/compiler.h, line 199
    0x80165d68 arch_spin_lock(): .../arch/arm/include/asm/spinlock.h, line 75
    0x8076f774: ./include/linux/spinlock.h, line 193
    0x8076f774: ...include/linux/spinlock_api_smp.h, line 119
    0x8076f774 _raw_spin_lock_irqsave(): kernel/locking/spinlock.c, line 159
    0x80148914 task_rq_lock(): kernel/sched/core.c, line 109
    0x8014e22c: kernel/sched/cputime.c, line 281
    0x8014e22c thread_group_cputime(): kernel/sched/cputime.c, line 326
    0x8014e650 thread_group_cputime_adjusted(): kernel/sched/cputime.c, line 678
    0x802ccb5c do_task_stat(): fs/proc/array.c, line 510
    0x802cdaf4 proc_tgid_stat(): fs/proc/array.c, line 632
    0x802c7c0c proc_single_show(): fs/proc/base.c, line 756
    0x80281b18 seq_read(): fs/seq_file.c, line 229
    0x8025dd1c __vfs_read(): fs/read_write.c, line 425
    0x8025de60 vfs_read(): fs/read_write.c, line 461
    0x8025e088 ksys_read(): fs/read_write.c, line 587
    0x8025e0ec: fs/read_write.c, line 597
    0x8025e0ec __se_sys_read(): fs/read_write.c, line 595
    0x80101000 __idmap_text_end()
    0x76e627e6
    0x76edc616

Here's an alternative view, showing the calls stacks associated with their calling processes. "Swapper" is an archaic term for the idle process:

    Kernel	
    	0 swapper/0        (Suspended), ARM Cortex-A9 MPCore #0	
    		0x80165d68: ./include/linux/compiler.h, line 199	
    		0x80165d68 arch_spin_lock(): .../arch/arm/include/asm/spinlock.h, line 75	
    		0x8076f774: ./include/linux/spinlock.h, line 193	
    		0x8076f774: ...include/linux/spinlock_api_smp.h, line 119	
    		0x8076f774 _raw_spin_lock_irqsave(): kernel/locking/spinlock.c, line 159	
    
    11950 pidof	
    	11950 (Suspended), ARM Cortex-A9 MPCore #1	
    		0x80165d68: ./include/linux/compiler.h, line 199	
    		0x80165d68 arch_spin_lock(): .../arch/arm/include/asm/spinlock.h, line 75	
    		0x8076f774: ./include/linux/spinlock.h, line 193	
    		0x8076f774: ...include/linux/spinlock_api_smp.h, line 119	
    		0x8076f774 _raw_spin_lock_irqsave(): kernel/locking/spinlock.c, line 159	

**RCU Configuration**

    # RCU Subsystem
    CONFIG_PREEMPT_RCU=y
    # CONFIG_RCU_EXPERT is not set
    CONFIG_SRCU=y
    CONFIG_TREE_SRCU=y
    CONFIG_TASKS_RCU=y
    CONFIG_RCU_STALL_COMMON=y
    CONFIG_RCU_NEED_SEGCBLIST=y
    # end of RCU Subsystem
    # RCU Debugging
    # CONFIG_RCU_PERF_TEST is not set
    # CONFIG_RCU_TORTURE_TEST is not set
    CONFIG_RCU_CPU_STALL_TIMEOUT=21
    # CONFIG_RCU_TRACE is not set
    # CONFIG_RCU_EQS_DEBUG is not set
    # end of RCU Debugging

**Kernel Thread Prioities**

    # ps -eo pid,tid,class,rtprio,ni,pri,wchan:14,comm | grep -E '(PID|bq25890|raw_dma)'
      PID   TID CLS RTPRIO  NI PRI WCHAN          COMMAND
      214   214 FF      50   -  90 irq_thread     irq/59-bq25890_
      229   229 FF      50   -  90 irq_thread     irq/60-raw_dma

**Processor Status Registers**
On at least one failed system, the interrupt disable bit is set on both processors. That seems wrong, although the fast interrupt disable bit is not set. If I set a breakpoint in arch_spin_lock() on a working system, the interrupt disable bit is not set.
It would seem that the only way for the spinlock to be released is in response to an interrupt, since the active threads on both cores aren't going to do it.

    Core #0
    cpsr	200f0193	537854355	
    	n	0	0	Negative condition code flag	
    	z	0	0	Zero condition code flag	
    	c	1	1	Carry condition code flag	
    	v	0	0	Overflow condition code flag	
    	q	0	0	Cumulative saturation flag	
    	it	00	0	If-Then execution state bits	
    	j	0	0	Jazelle bit	
    	ge	f	15	SIMD Greater than or Equal flags	
    	e	0	0	Endianness execution state bit	
    	a	1	1	Asynchronous abort disable bit	
    	i	1	1	Interrupt disable bit	
    	f	0	0	Fast interrupt disable bit	
    	t	0	0	Thumb execution state bit	
    	m	13	19	Mode field	
    irq	
    	sp	80d94400	2161722368	
    	lr	80101e20	2148539936	
    	spsr	200f0193	537854355	
    		n	0	0	Negative condition code flag	
    		z	0	0	Zero condition code flag	
    		c	1	1	Carry condition code flag	
    		v	0	0	Overflow condition code flag	
    		q	0	0	Cumulative saturation flag	
    		it	00	0	If-Then execution state bits	
    		j	0	0	Jazelle bit	
    		ge	f	15	SIMD Greater than or Equal flags	
    		e	0	0	Endianness execution state bit	
    		a	1	1	Asynchronous abort disable bit	
    		i	1	1	Interrupt disable bit	
    		f	0	0	Fast interrupt disable bit	
    		t	0	0	Thumb execution state bit	
    		m	13	19	Mode field	

    Core #1
    cpsr	200f0093	537854099	
    	n	0	0	Negative condition code flag	
    	z	0	0	Zero condition code flag	
    	c	1	1	Carry condition code flag	
    	v	0	0	Overflow condition code flag	
    	q	0	0	Cumulative saturation flag	
    	it	00	0	If-Then execution state bits	
    	j	0	0	Jazelle bit	
    	ge	f	15	SIMD Greater than or Equal flags	
    	e	0	0	Endianness execution state bit	
    	a	0	0	Asynchronous abort disable bit	
    	i	1	1	Interrupt disable bit	
    	f	0	0	Fast interrupt disable bit	
    	t	0	0	Thumb execution state bit	
    	m	13	19	Mode field	
    irq	
    	sp	80d94440	2161722432	
    	lr	80101a00	2148538880	
    	spsr	60030193	1610809747	
    		n	0	0	Negative condition code flag	
    		z	1	1	Zero condition code flag	
    		c	1	1	Carry condition code flag	
    		v	0	0	Overflow condition code flag	
    		q	0	0	Cumulative saturation flag	
    		it	00	0	If-Then execution state bits	
    		j	0	0	Jazelle bit	
    		ge	3	3	SIMD Greater than or Equal flags	
    		e	0	0	Endianness execution state bit	
    		a	1	1	Asynchronous abort disable bit	
    		i	1	1	Interrupt disable bit	
    		f	0	0	Fast interrupt disable bit	
    		t	0	0	Thumb execution state bit	
    		m	13	19	Mode field	


                                

Omnia tu scis quod malum (1 rep)

Nov 24, 2023, 10:59 PM • Last activity: Nov 29, 2023, 05:49 PM

12 votes

2 answers

30815 views

Compiled kernel 4.19 will not boot: "Kernel panic not syncing : System is deadlocked on memory"

ubuntu kernel compiling memory deadlock

I am compiling kernel 4.19 on Ubuntu 14.04 because I have a assignment to add a system call, but when I try to boot this kernel there's a error: Kernel panic - not syncing: System is deadlocked on memory Nothing happens after this message appears [![boot sequence stop at deadlock error][1]][1] 1. I...

                                  I am compiling kernel 4.19 on Ubuntu 14.04 because I have a assignment to add a system call, but when I try to boot this kernel there's a error: 

    Kernel panic - not syncing: System is deadlocked on memory
Nothing happens after this message appears



1. I have compiled my kernel several times and installed the modules. No errors were shown in the old terminal.  
2. I used GParted to enlarge my /dev/sda1 and I have set up a swap area, all done.  
3. The stock Ubuntu 14.04 kernel boots fine. I can log in and use smoothly.
4. Commands I used:

        sudo cp /boot/config-**** .config
        sudo make menuconfig               # I did not change anything here
        sudo make -j4
        sudo make modules_install
        sudo make install
        reboot
5. For the syscall, I just add a very simple helloworld in the sys.c:

        asmlinkage int sys_mysyscall(int arg){printk("hello %d\n",arg);return 0;}
and I have added it in the syscalls.h and syscall_64.tbl.
                                

wjrforcyber (371 rep)

Jan 5, 2019, 02:26 PM • Last activity: Aug 2, 2023, 11:31 AM

0 votes

0 answers

542 views

linux kernel freeze

linux-kernel freeze interrupt hang deadlock

I am using linux kernel 4.19 on intel (icelake) based server. Sometime immediately after linux boot (at the login prompt) it get stuck(once is 50 reboot or so). It is not responding to anything on serial terminal or monitor. When it get stuck, keyboard caps lock light is not turning on and off. It i...

                                  I am using linux kernel 4.19 on intel (icelake) based server. Sometime immediately after linux boot (at the login prompt) it get stuck(once is 50 reboot or so). It is not responding to anything on serial terminal or monitor. When it get stuck, keyboard caps lock light is not turning on and off. It is not taking even sysrq requests. Only way to recover is to reset the server. 

I debugged, tried deadlock debugging, it is not helping. The issue occurs randomly after 10 or sometime 50 reboot. There is no backtrace or oops messages. In what type of situation this can happen? Is it possible that it deadlock in the interrupt context? if yes what is the best way to debug this? Or is may be some type of hardware issue? What else i can do to debug this further?

saurin (39 rep)

Apr 12, 2022, 04:37 AM • Last activity: Apr 12, 2022, 04:43 AM

2 votes

0 answers

4647 views

System is deadlocked on memory

kernel boot upgrade kernel-panic deadlock

I've upgraded my kernel on LinuxLite 4.8 from 4.15.0-74-generic to 5.4.30 (unpatched) using the commands: ``` sudo cp /boot/config-$(uname -r) .config sudo make menuconfig sudo make sudo make modules_install sudo make install reboot ``` I just saved the standard settings in menuconfig without changi...

I've upgraded my kernel on LinuxLite 4.8 from 4.15.0-74-generic to 5.4.30 (unpatched) using the commands:

sudo cp /boot/config-$(uname -r) .config
sudo make menuconfig
sudo make
sudo make modules_install
sudo make install
reboot

I just saved the standard settings in menuconfig without changing anything. After rebooting I faced the following error: > end Kernel panic - not syncing: System is deadlocked on memory Would anyone know how to solve this? I already found https://unix.stackexchange.com/questions/492667/compiled-kernel-4-19-will-not-boot-kernel-panic-not-syncing-system-is-deadlo question, but I cannot afford to use 4 GB of RAM as the motherboard only supports a maximum of 2 GB (which is what I am using currently). I can still boot using the older kernel version. Thanks in advance!

Daniel (21 rep)

Apr 7, 2020, 01:44 PM • Last activity: Apr 7, 2020, 01:50 PM

2 votes

2 answers

280 views

Bash: join pipes without deadlock

bash pipe fifo deadlock

I want to list a bunch of filenames via `find`, pipe them through a utility (let's call this `util`) which outputs a new name for each input name, and then rename each file from the old name to the new. The most basic solution would be this: find . -print0 | while IFS= read -d '' -r old_name; do new...

                                  I want to list a bunch of filenames via find, pipe them through a utility (let's call this util) which outputs a new name for each input name, and then rename each file from the old name to the new. 

The most basic solution would be this:

    find . -print0 | while IFS= read -d '' -r old_name; do
        new_name="$(echo "$file" | util)"
        mv "$old_name" "$new_name"
    done

The problem with this approach is that util is too slow to fire up for each filename separately. So the solution is to launch util only once and pipe all the filenames through this single process:

    find . -print0 >old_names
    util new_names
    
    exec {old_fd}new_names &

    exec {old_fd}util does input/output buffering...

So my question is: what's the proper way of doing this in bash?
                                

Tamás Zahola (143 rep)

Feb 2, 2019, 06:43 PM • Last activity: Feb 3, 2019, 12:29 AM

0 votes

0 answers

57 views

How to view pending SCSI requests?

dvd scsi deadlock optical-media

If the PC sends a request to read a sector on an optical disc, which turns out to be unreadable, the drive deadlocks the program into responselessness and motionlessness, until it returns the sector or an error. During the time the program is locked by the drive, which makes the program effectively...

                                  If the PC sends a request to read a sector on an optical disc, which turns out to be unreadable, the drive deadlocks the program into responselessness and motionlessness, until it returns the sector or an error.

During the time the program is locked by the drive, which makes the program effectively a slave, how can I see the pending SCSI request with details such as read retry count and requested LBA?

neverMind9 (1720 rep)

Apr 4, 2018, 01:38 PM • Last activity: Apr 4, 2018, 01:43 PM

4 votes

0 answers

1234 views

Why is Firefox stuck, and how might I get it unstuck?

firefox strace multithreading deadlock

Firefox is hung. Usually I kill it with -1, -9 or -15, but this time, dang it, I want to find out why. I have an hour or so to waste on this, and seek education above actually solving the problem. Goal: get this instance of firefox running again, without killing it and restarting. Firefox was workin...

                                  Firefox is hung.  Usually I kill it with -1, -9 or -15, but this time, dang it, I want to find out why.  I have an hour or so to waste on this, and seek education above actually solving the problem.   

Goal: get this instance of firefox running again, without killing it and restarting.  

Firefox was working fine, though sluggish due to heavy memory usage.  I had Avogadro's number of tabs open in about a thousand browser windows.  (Some exaggeration may be involved.)  I was closing some tabs, when at one point the browser window stopped repainting their contents.   Dragging unrelated windows over causes remnants of those windows to sit there forever.  Whatever is the X11 equivalent of WM_PAINT, it's being ignored.   Attempts to fire up a new Firefox by clicking on URLs in emails bring up only an error popup saying "Firefox is already running, but is not responding."

Firefox has process ID 9297.  
Process 526 is the windows manager.

=> ps axl | grep 9297

    0  1000  9297   526  13  -7 6080428 4465776 poll_s S

(I was playing with renice to see if priority had anything to do with firefox hanging. Nope. But I left it at -7.)

=> ps -efL | grep 9297

    darenw    9297   526  9297 57   61 Sep22 ?        6-23:10:34 firefox 
    darenw    9297   526  9300  0   61 Sep22 ?        00:00:00 firefox
    darenw    9297   526  9301  0   61 Sep22 ?        00:00:00 firefox 
    ... dozens like these ...
    darenw    9297   526  7607  0   61 16:17 ?        00:00:00 firefox
    darenw    9297   526  7657  0   61 16:17 ?        00:00:00 firefox
    darenw   20000  9297 20000  1    1 Sep23 ?        03:34:36 [plugin-containe] 

=> ps axl  | grep 20000

    0  1000 20000  9297  20   0      0     0 exit   Z    ?        216:31 [plugin-containe] 

This one lingers because the main thread 9297 hasn't yet finished off that thread by obtaining its exit code. At least, that's my understanding of "defunct" processes.  I'm not sure how to investigate this detail further, or how to determine if this is why Firefox is hung.

=> strace -f -p 9297

    ...
    [pid 13945]  )      = 0 (Timeout)                  
    [pid 13945] select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)                                                                                                  
    [pid 13945] select(0, NULL, NULL, NULL, {0, 10000}                                                                                                       
    [pid  9356]  )       = -1 ETIMEDOUT (Connection timed out)                                                                                            
    [pid  9356] futex(0x7f28b7df8be8, FUTEX_WAKE_PRIVATE, 1) = 0                                                                                                             
    [pid  9356] futex(0x7f28b7df8c14, FUTEX_WAIT_BITSET_PRIVATE, 1, {6910910, 806929252}, ffffffff            
    ...

Same six lines repeat for as long as I let strace run, though sometimes there's only two instead of three lines for either PID.
I didn't see any mention of other pids, besides these two.

=> ps axl  | grep 13945

    4     0  4985 23253  20   0  11064  2236 pipe_w S+   pts/19     0:00 grep 13945

=> ps axl  | grep 9356

    4     0  5111 23253  20   0  11064  2232 pipe_w S+   pts/19     0:00 grep 9356

So what are these two processes, 13945 and 9356?  

=> ps -eL | grep 9297  |grep 9356     
    9297  9356 ?        02:02:50 SoftwareVsyncTh
    9297 13945 ?        01:04:15 firefox

So what is SoftwareVsyncTh?   Google does not help much. This symbol appears in 'ps' output and other listing of processes and threads, but not in any online source code, or Q&A forums mentioning it in a specific way.  For all I know, it, and these two processes, have nothing to do with Firefox being stuck and not painting its windoes.

What further commands could I use to uncover more clues?  Is there a way to get a list of windows and tabs with their urls, and close one (the most suspicious/spammy looking url) from a shell command line?

Well, whatever is going on with those, I find I can get some info on the main thread:

=> strace -p 9297

    --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TKILL, si_pid=9297, si_uid=1000} ---
    rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
    poll([{fd=23, events=POLLPRI|POLLOUT}], 1, 5000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) 

which repeats over and over with no changes.

I am not a genius at threads, processes, mutexes and futexes and all that, but maybe by solving this mystery, I'll become one!   I just need to know more commands for investigating further, and understanding of what exactly is going on in the strace results which may relate to the hangedness of Firefox.

Are there any command I could try, to kick Firefox back into action?
      
System specs:   quad core Intel something, 16GB, Arch Linux last updated about a month or two ago. Using icewm, multiple workspaces and many text editors, PDF viewers, browsers and whatever open.  Running conky to show RAM and swap usage in upper left corner of screen. I'm usually on the verge of going into swap.

DarenW (3532 rep)

Oct 5, 2016, 08:37 AM

2 votes

0 answers

122 views

What does Linux wait for when switches from Xorg to text console?

xorg linux-kernel console graphics deadlock

Sometimes my Xorg hangs and locks the console dead, with a “stale frame” of GUI on the monitor. Hardware and driver-related problems are likely (see [here][1] for description of particular configuration), but this question is focused on a particularly annoying effect: inability to switch to text mod...

                                  Sometimes my Xorg hangs and locks the console dead, with a “stale frame” of GUI on the monitor. Hardware and driver-related problems are likely (see here  for description of particular configuration), but this question is focused on a particularly annoying effect: inability to switch to text mode. Otherwise the system is fully functioning. The box serves a LAN, I can log in and even run somethings like chvt inside, to no avail. Of course, I can terminate the Xorg process to free the console… but it’s so rude. Not the same situation as in that question , where Ctrl+Alt+F1 failed consistently, but, in my case, it fails *sometimes* and only when X is unable to work anyway.

So, the question: **Which exactly step of console switching is vulnerable to deadlocks?** Modern Linux kernels (3.x) are assumed. Many years ago (in the age of Linux 2.4) I had reasonably good understanding of virtual console-related stuff and even wrote some notes about it, but modern vc_screen.c changed considerably from 10-years-old specimens, with such new calls as console_lock() and console_unlock() appeared in it.

Incnis Mrsi (2076 rep)

Aug 25, 2015, 11:53 AM • Last activity: Sep 9, 2015, 06:10 PM

1 votes

1 answers

1295 views

Deadlock on read/wait

pipe read ipc deadlock

My process deadlocks. `master` looks like this: p=Popen(cmd, stdin=PIPE, stdout=PIPE) for ....: # a few million p.stdin.write(...) p.stdin.close() out = p.stdout.read() p.stdout.close() exitcode = p.wait() `child` looks something like this: l = list() for line in sys.stdin: l.append(line) sys.stdout...

                                  My process deadlocks. master looks like this:

    p=Popen(cmd, stdin=PIPE, stdout=PIPE)
    for ....: # a few million
        p.stdin.write(...)
    p.stdin.close()
    out = p.stdout.read()
    p.stdout.close()
    exitcode = p.wait()

child looks something like this:

    l = list()
    for line in sys.stdin:
       l.append(line)
    sys.stdout.write(str(len(l)))

* strace -p PID_master shows that master is stuck in wait4(PID_child,...).
* strate -p PID_child shows that child is stuck in read(0,...). 

How can that be?! 
**I did close the stdin, why is child still reading from it?!**


                                

sds (1726 rep)

Aug 3, 2015, 01:12 PM • Last activity: Aug 3, 2015, 03:41 PM

12 votes

2 answers

8970 views

Solving Ethernet watchdog timer deadlocks

debian ethernet deadlock

I have a debian linux box (Debian Squeeze) that deadlocks every few hours if I run a python script that sniffs an interface... The stack trace is attached to the bottom of this question. Essentially, I have a Broadcom ethernet interface (`bnx2` driver) that seems to die when I start a sniffing sessi...

                                  I have a debian linux box (Debian Squeeze) that deadlocks every few hours if I run a python script that sniffs an interface...

The stack trace is attached to the bottom of this question.  Essentially, I have a Broadcom ethernet interface (bnx2 driver) that seems to die when I start a sniffing session and then it tries to transmit a frame out the same interface.

From what I can tell, a kernel watchdog timer is tripping...

    NETDEV WATCHDOG: eth3 (bnx2): transmit queue 0 timed out

I think there is a way to control watchdog timers with ioctl (ref: [EmbeddedFreak: How to use linux watchdog](http://embeddedfreak.wordpress.com/2010/08/23/howto-use-linux-watchdog/)) .

Questions (Original):
---------------------

How can I find which watchdog timer(s) is controlling eth3?  Bonus points if you can tell me how to change the timer or even disable the watchdog...

Questions (Revised):
---------------------

How can I prevent the ethernet watchdog timer from causing problems?




Stack trace
-----------

    Apr 30 08:38:44 Hotcoffee kernel: [275460.837147] ------------[ cut here ]------------
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837166] WARNING: at /build/buildd-linux-2.6_2.6.32-41squeeze2-amd64-NDo8b7/linux-2.6-2.6.32/debian/build/source_amd64_none/net/sched/sch_generic.c:261 dev_watchdog+0xe2/0x194()
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837169] Hardware name: PowerEdge R710
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837171] NETDEV WATCHDOG: eth3 (bnx2): transmit queue 0 timed out
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837172] Modules linked in: 8021q garp stp parport_pc ppdev lp parport pci_stub vboxpci vboxnetadp vboxnetflt vboxdrv ext2 loop psmouse power_meter button dcdbas evdev pcspkr processor serio_raw ext4 mbcache jbd2 crc16 sg sr_mod cdrom ses ata_generic sd_mod usbhid hid crc_t10dif enclosure uhci_hcd ehci_hcd megaraid_sas ata_piix thermal libata usbcore nls_base scsi_mod bnx2 thermal_sys [last unloaded: scsi_wait_scan]
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837202] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837204] Call Trace:
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837206]    [] ? dev_watchdog+0xe2/0x194
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837211]  [] ? dev_watchdog+0xe2/0x194
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837217]  [] ? warn_slowpath_common+0x77/0xa3
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837220]  [] ? dev_watchdog+0x0/0x194
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837223]  [] ? warn_slowpath_fmt+0x51/0x59
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837228]  [] ? try_to_wake_up+0x289/0x29b
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837231]  [] ? netif_tx_lock+0x3d/0x69
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837237]  [] ? netdev_drivername+0x3b/0x40
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837240]  [] ? dev_watchdog+0xe2/0x194
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837242]  [] ? __wake_up+0x30/0x44
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837249]  [] ? run_timer_softirq+0x1c9/0x268
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837252]  [] ? __do_softirq+0xdd/0x1a6
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837257]  [] ? lapic_next_event+0x18/0x1d
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837262]  [] ? call_softirq+0x1c/0x30
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837265]  [] ? do_softirq+0x3f/0x7c
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837267]  [] ? irq_exit+0x36/0x76
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837270]  [] ? smp_apic_timer_interrupt+0x87/0x95
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837273]  [] ? apic_timer_interrupt+0x13/0x20
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837274]    [] ? acpi_idle_enter_bm+0x27d/0x2af [processor]
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837283]  [] ? acpi_idle_enter_bm+0x276/0x2af [processor]
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837289]  [] ? cpuidle_idle_call+0x94/0xee
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837293]  [] ? cpu_idle+0xa2/0xda
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837297]  [] ? early_idt_handler+0x0/0x71
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837301]  [] ? start_kernel+0x3dc/0x3e8
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837304]  [] ? x86_64_start_kernel+0xf9/0x106
    Apr 30 08:38:44 Hotcoffee kernel: [275460.837306] ---[ end trace 92c65e52c9e327ec ]---
                                

Mike Pennington (2521 rep)

May 1, 2012, 06:57 PM • Last activity: May 4, 2015, 06:46 PM

1 votes

3 answers

6940 views

System freezes when browsing internet

ubuntu logs hardware freeze deadlock

About a week ago my system began to freeze (deadlock, only manual shutdown with laptop button helped) from time to time. The frequency of deadlock is increasing. Today it was about 10 deadlocks with only browsing the internet and terminal emulator and vim. It usually happens when loading new url and...

                                  About a week ago my system began to freeze (deadlock, only manual shutdown with laptop button helped) from time to time.

The frequency of deadlock is increasing. Today it was about 10 deadlocks with only browsing the internet and terminal emulator and vim. It usually happens when loading new url and then all the system is dead. It's very annoying for me as it happens more often.

My system is Lubuntu 11.10 with xmonad window manager.

Could you help me to find the cause of this problem? Are there some logs associated with deadlocks? Could it be hardware problem?

(the system was dead twice while writing this question too)

thank you

xralf (15189 rep)

Jan 6, 2012, 05:17 PM • Last activity: Jan 28, 2012, 05:17 AM

Showing page 1 of 11 total questions