Sample Header Ad - 728x90

PostgreSQL - Replica Recovery Process Stuck frequently (epoll_wait taking too much time)

1 vote
1 answer
1378 views
We have Master Slave Setup with Asynchronous Streaming Replication (Replication Slots). Frequently our Replica's entire Postgres processes seems to be doing nothing (or stuck?). Since replication_slots is used for replication, this has been affecting the master with pg_xlog getting backlogged. postgres=# select version(); version -------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Ubuntu 9.6.11-1.pgdg18.04+1), compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit (1 row) As per Slave postgres log, last received wal is processed and the next restartpoint is not starting. No logs after this. Just simply stuck. After restarting, replica starts to catch up, but it again gets stuck after some time. 2018-11-23 12:47:26.625 UTC LOG: restartpoint starting: xlog 2018-11-23 12:49:48.258 UTC LOG: restartpoint complete: wrote 10057 buffers (10.3%); 0 transaction log file(s) added, 198 removed, 0 recycled; write=130.568 s, sync=0.702 s, total=141.633 s; sync files=1874, longest=0.205 s, average=0.000 s; distance=5435211 kB, estimate=5435211 kB 2018-11-23 12:49:48.258 UTC LOG: recovery restart point at 1143/8B16D800 2018-11-23 12:49:48.258 UTC DETAIL: last completed transaction was at log time 2018-11-23 07:00:19.510771+00 2018-11-23 12:51:04.092 UTC LOG: restartpoint starting: xlog 2018-11-23 12:55:34.744 UTC LOG: restartpoint complete: wrote 3086 buffers (3.2%); 0 transaction log file(s) added, 120 removed, 212 recycled; write=269.875 s, sync=0.029 s, total=270.652 s; sync files=4955, longest=0.000 s, average=0.000 s; distance=3238852 kB, estimate=5215575 kB 2018-11-23 12:55:34.744 UTC LOG: recovery restart point at 1144/50C5EA00 2018-11-23 12:55:34.744 UTC DETAIL: last completed transaction was at log time 2018-11-23 07:04:51.819643+00 I analysed trace of Slave Postgres Recovery process during both active and stuck mode, and seems that during stuck mode, recovery process is doing very minimal activity (or stuck). **perf trace -p pid of Slave Recovery Process** ? ( ? ): ... [continued]: epoll_wait()) = 0 0.048 ( 0.023 ms): close(fd: 992 ) = 0 0.100 ( 0.022 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 0.146 ( 0.024 ms): epoll_create1(flags: 524288 ) = 992 0.192 ( 0.025 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 0.240 ( 0.023 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 0.284 ( 0.020 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 0.323 (5003.804 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 5004.184 ( 0.023 ms): close(fd: 992 ) = 0 5004.233 ( 0.023 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 5004.281 ( 0.025 ms): epoll_create1(flags: 524288 ) = 992 5004.329 ( 0.025 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 5004.374 ( 0.021 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 5004.416 ( 0.033 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 5004.465 (5005.223 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 10009.736 ( 0.025 ms): close(fd: 992 ) = 0 10009.788 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 10009.838 ( 0.025 ms): epoll_create1(flags: 524288 ) = 992 10009.887 ( 0.075 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 10010.004 ( 0.025 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 10010.114 ( 0.032 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 10010.205 (5005.241 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 15015.499 ( 0.024 ms): close(fd: 992 ) = 0 15015.554 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 15015.604 ( 0.028 ms): epoll_create1(flags: 524288 ) = 992 15015.692 ( 0.104 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 15015.824 ( 0.023 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 15015.871 ( 0.025 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 15015.919 (5000.618 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 20016.588 ( 0.022 ms): close(fd: 992 ) = 0 20016.641 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 20016.691 ( 0.027 ms): epoll_create1(flags: 524288 ) = 992 20016.744 ( 0.022 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 20016.787 ( 0.021 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 20016.831 ( 0.020 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 20016.867 (5005.231 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 25022.142 ( 0.023 ms): close(fd: 992 ) = 0 25022.196 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 25022.246 ( 0.028 ms): epoll_create1(flags: 524288 ) = 992 25022.296 ( 0.023 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 25022.340 ( 0.023 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 25022.388 ( 0.020 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 25022.428 (5005.228 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 30027.705 ( 0.025 ms): close(fd: 992 ) = 0 30027.761 ( 0.025 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 30027.811 ( 0.028 ms): epoll_create1(flags: 524288 ) = 992 30027.864 ( 0.022 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 30027.907 ( 0.020 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 30027.949 ( 0.022 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 30027.988 (5005.233 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 35033.275 ( 0.025 ms): close(fd: 992 ) = 0 35033.333 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 35033.382 ( 0.029 ms): epoll_create1(flags: 524288 ) = 992 35033.435 ( 0.077 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 35033.588 ( 0.046 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 35033.663 ( 0.023 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 35033.762 (5005.255 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 40039.070 ( 0.026 ms): close(fd: 992 ) = 0 40039.129 ( 0.025 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 40039.179 ( 0.031 ms): epoll_create1(flags: 524288 ) = 992 40039.233 ( 0.078 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 40039.384 ( 0.047 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 40039.464 ( 0.023 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 40039.562 (5005.258 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 45044.876 ( 0.025 ms): close(fd: 992 ) = 0 45044.933 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 45044.984 ( 0.029 ms): epoll_create1(flags: 524288 ) = 992 45045.078 ( 0.118 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 45045.225 ( 0.091 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 45045.350 ( 0.027 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 45045.441 (3034.775 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = -1 EINTR Interrupted system call 48080.277 ( 0.028 ms): write(fd: 13, buf: 0x7fff2ec83737, count: 1 ) = 1 48080.369 ( 0.021 ms): gettimeofday(tv: 0x7fff2ec83d60 ) = 0 48080.413 ( 0.024 ms): close(fd: 992 ) = 0 48080.468 ( 0.023 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48080.515 ( 0.023 ms): gettimeofday(tv: 0x7fff2ec83e00 ) = 0 48080.561 ( 0.024 ms): lseek(fd: 3, offset: 2490368, whence: SET) = 2490368 48080.607 ( 0.036 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48080.675 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48080.729 ( 0.045 ms): lseek(fd: 652, whence: END) = 300539904 48080.798 ( 0.025 ms): lseek(fd: 652, whence: END) = 300539904 48080.895 ( 0.053 ms): kill(pid: 11330 (postgres), sig: USR1 ) = 0 48081.211 ( 0.132 ms): write(fd: 652, buf: 0x7f5d2e0cc500, count: 8192) = 8192 48081.377 ( 0.025 ms): lseek(fd: 3, offset: 2498560, whence: SET) = 2498560 48081.425 ( 0.032 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48081.498 ( 0.043 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48081.576 ( 0.024 ms): lseek(fd: 652, whence: END) = 300548096 48081.624 ( 0.024 ms): lseek(fd: 652, whence: END) = 300548096 48081.687 ( 0.122 ms): write(fd: 652, buf: 0x7f5d2e0ce500, count: 8192) = 8192 48081.853 ( 0.025 ms): lseek(fd: 3, offset: 2506752, whence: SET) = 2506752 48081.902 ( 0.041 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48081.985 ( 0.025 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48082.036 ( 0.049 ms): lseek(fd: 652, whence: END) = 300556288 48082.117 ( 0.038 ms): lseek(fd: 652, whence: END) = 300556288 48082.193 ( 0.064 ms): write(fd: 652, buf: 0x7f5d2e0d0500, count: 8192) = 8192 48082.286 ( 0.024 ms): lseek(fd: 3, offset: 2514944, whence: SET) = 2514944 48082.353 ( 0.039 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48082.419 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48082.467 ( 0.024 ms): lseek(fd: 652, whence: END) = 300564480 48082.540 ( 0.034 ms): lseek(fd: 652, whence: END) = 300564480 48082.613 ( 0.057 ms): write(fd: 652, buf: 0x7f5d2e0d2500, count: 8192) = 8192 48082.723 ( 0.044 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48082.801 ( 0.040 ms): lseek(fd: 652, whence: END) = 300572672 48082.874 ( 0.025 ms): lseek(fd: 652, whence: END) = 300572672 48082.936 ( 0.061 ms): write(fd: 652, buf: 0x7f5d2e0d4500, count: 8192) = 8192 48083.026 ( 0.024 ms): lseek(fd: 3, offset: 2523136, whence: SET) = 2523136 48083.073 ( 0.054 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48083.156 ( 0.040 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48083.229 ( 0.033 ms): lseek(fd: 652, whence: END) = 300580864 48083.295 ( 0.038 ms): lseek(fd: 652, whence: END) = 300580864 48083.385 ( 0.085 ms): write(fd: 652, buf: 0x7f5d2e0d6500, count: 8192) = 8192 48083.507 ( 0.045 ms): lseek(fd: 3, offset: 2531328, whence: SET) = 2531328 48083.597 ( 0.039 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48083.674 ( 0.033 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48083.754 ( 0.041 ms): lseek(fd: 652, whence: END) = 300589056 48083.839 ( 0.048 ms): lseek(fd: 652, whence: END) = 300589056 48083.955 ( 0.086 ms): write(fd: 652, buf: 0x7f5d2e0d8500, count: 8192) = 8192 48084.078 ( 0.096 ms): lseek(fd: 3, offset: 2539520, whence: SET) = 2539520 48084.213 ( 0.071 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48084.355 ( 0.041 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48084.443 ( 0.052 ms): lseek(fd: 652, whence: END) = 300597248 48084.528 ( 0.056 ms): lseek(fd: 652, whence: END) = 300597248 48084.642 ( 0.103 ms): write(fd: 652, buf: 0x7f5d2e0da500, count: 8192) = 8192 48084.809 ( 0.048 ms): lseek(fd: 3, offset: 2547712, whence: SET) = 2547712 48084.901 ( 0.048 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48085.015 ( 0.036 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48085.095 ( 0.032 ms): lseek(fd: 652, whence: END) = 300605440 48085.164 ( 0.041 ms): lseek(fd: 652, whence: END) = 300605440 48085.266 ( 0.103 ms): write(fd: 652, buf: 0x7f5d2e0dc500, count: 8192) = 8192 48085.419 ( 0.047 ms): lseek(fd: 3, offset: 2555904, whence: SET) = 2555904 48085.505 ( 0.064 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48085.612 ( 0.036 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48085.707 ( 0.055 ms): lseek(fd: 652, whence: END) = 300613632 48085.805 ( 0.052 ms): lseek(fd: 652, whence: END) = 300613632 48085.893 ( 0.093 ms): write(fd: 652, buf: 0x7f5d2e0de500, count: 8192) = 8192 48086.027 ( 0.048 ms): lseek(fd: 3, offset: 2564096, whence: SET) = 2564096 48086.120 ( 0.071 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48086.247 ( 0.048 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48086.342 ( 0.040 ms): lseek(fd: 652, whence: END) = 300621824 48086.421 ( 0.041 ms): lseek(fd: 652, whence: END) = 300621824 48086.499 ( 0.103 ms): write(fd: 652, buf: 0x7f5d2e0e0500, count: 8192) = 8192 48086.641 ( 0.053 ms): lseek(fd: 3, offset: 2572288, whence: SET) = 2572288 48086.740 ( 0.060 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48086.871 ( 0.041 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48086.966 ( 0.026 ms): lseek(fd: 652, whence: END) = 300630016 48087.025 ( 0.031 ms): lseek(fd: 652, whence: END) = 300630016 48087.135 ( 0.084 ms): write(fd: 652, buf: 0x7f5d2e0e2500, count: 8192) = 8192 48087.281 ( 0.055 ms): lseek(fd: 3, offset: 2580480, whence: SET) = 2580480 48087.392 ( 0.061 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48087.516 ( 0.040 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48087.594 ( 0.043 ms): lseek(fd: 652, whence: END) = 300638208 48087.669 ( 0.048 ms): lseek(fd: 652, whence: END) = 300638208 48087.768 ( 0.102 ms): write(fd: 652, buf: 0x7f5d2e0e4500, count: 8192) = 8192 48087.915 ( 0.057 ms): lseek(fd: 3, offset: 2588672, whence: SET) = 2588672 48088.026 ( 0.059 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48088.156 ( 0.041 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48088.244 ( 0.032 ms): lseek(fd: 652, whence: END) = 300646400 48088.321 ( 0.054 ms): lseek(fd: 652, whence: END) = 300646400 48088.413 ( 0.102 ms): write(fd: 652, buf: 0x7f5d2e0e6500, count: 8192) = 8192 48088.554 ( 0.046 ms): lseek(fd: 3, offset: 2596864, whence: SET) = 2596864 48088.655 ( 0.061 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48088.779 ( 0.041 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48088.865 ( 0.056 ms): lseek(fd: 652, whence: END) = 300654592 48088.954 ( 0.036 ms): lseek(fd: 652, whence: END) = 300654592 48089.049 ( 0.050 ms): write(fd: 652, buf: 0x7f5d2e0e8500, count: 8192) = 8192 48089.153 ( 0.037 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48089.261 ( 0.049 ms): lseek(fd: 652, whence: END) = 300662784 48089.355 ( 0.071 ms): lseek(fd: 652, whence: END) = 300662784 48089.463 ( 0.083 ms): write(fd: 652, buf: 0x7f5d2e0ea500, count: 8192) = 8192 48089.616 ( 0.054 ms): lseek(fd: 3, offset: 2605056, whence: SET) = 2605056 48089.724 ( 0.064 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48089.847 ( 0.044 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48089.951 ( 0.051 ms): lseek(fd: 652, whence: END) = 300670976 48090.046 ( 0.044 ms): lseek(fd: 652, whence: END) = 300670976 48090.150 ( 0.089 ms): write(fd: 652, buf: 0x7f5d2e0ec500, count: 8192) = 8192 48090.297 ( 0.045 ms): lseek(fd: 3, offset: 2613248, whence: SET) = 2613248 48090.393 ( 0.039 ms): read(fd: 3, buf: 0x55fb9371ca98, count: 8192) = 8192 48090.483 ( 0.038 ms): read(fd: 8, buf: 0x7fff2ec84457, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 48090.574 ( 0.056 ms): lseek(fd: 652, whence: END) = 300679168 48090.672 ( 0.046 ms): lseek(fd: 652, whence: END) = 300679168 48090.786 ( 0.182 ms): write(fd: 652, buf: 0x7f5d2e0ee500, count: 8192) = 8192 48091.030 ( 0.059 ms): epoll_create1(flags: 524288 ) = 992 48091.135 ( 0.056 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 48091.235 ( 0.056 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 48091.336 ( 0.051 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 48091.431 ( 0.055 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 1 48091.542 ( 0.034 ms): read(fd: 12, buf: 0x7fff2ec83d70, count: 16 ) = 1 48091.629 ( 0.044 ms): gettimeofday(tv: 0x7fff2ec83d60 ) = 0 48091.717 (5005.261 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 53097.027 ( 0.023 ms): close(fd: 992 ) = 0 53097.079 ( 0.024 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 53097.127 ( 0.026 ms): epoll_create1(flags: 524288 ) = 992 53097.177 ( 0.025 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 53097.222 ( 0.021 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 53097.264 ( 0.033 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 53097.313 (5005.226 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 58102.587 ( 0.022 ms): close(fd: 992 ) = 0 58102.638 ( 0.030 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 58102.690 ( 0.024 ms): epoll_create1(flags: 524288 ) = 992 58102.746 ( 0.025 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 58102.795 ( 0.033 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 58102.850 ( 0.022 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 58102.905 (5005.237 ms): epoll_wait(epfd: 992, events: 0x55fb9372ae08, maxevents: 1, timeout: 5000) = 0 63108.190 ( 0.023 ms): close(fd: 992 ) = 0 63108.243 ( 0.025 ms): read(fd: 8, buf: 0x7fff2ec83e07, count: 1 ) = -1 EAGAIN Resource temporarily unavailable 63108.293 ( 0.026 ms): epoll_create1(flags: 524288 ) = 992 63108.342 ( 0.027 ms): epoll_ctl(epfd: 992, op: ADD, fd: 12, event: 0x7fff2ec83d7c) = 0 63108.389 ( 0.021 ms): epoll_ctl(epfd: 992, op: ADD, fd: 8, event: 0x7fff2ec83d7c) = 0 63108.431 ( 0.032 ms): gettimeofday(tv: 0x7fff2ec83d50 ) = 0 **Seems like epoll_wait is taking too much time of each operation**. Is there anything needs to be done to address that? Or any other detailed stacktrace needs to be analysed? **Infrastructure** - AWS EC2 r4.large - ZFS FileSystem - RAID0 - 8 x 17GB - Number of databases - 130+ - Heavy ETL Process
Asked by The Coder (299 rep)
Nov 23, 2018, 12:42 PM
Last activity: Jun 18, 2025, 02:01 AM