How can I make Exim mailserver doing faster retry after temporary DNS failure
0
votes
0
answers
340
views
We are running Exim4 mailserver version 4.90.1 on Ubuntu Server 18.04.1 LTS which is in DMZ and sends out emails from LAN out to the internet.
Sometimes (maybe a couple of mails per 24h of thousands of successfully sent mails)
Exim has a temporary dns problem and because of that defers the mail:
> defer (-1): host lookup did not complete
But when I make a manual DNS test (e.g. with dig), the MX record is fetched without problem for the receiver domain.
Exim defers the mail, and on the next try (after 20 or 30 minutes), the mail gets sent out OK, no DNS problem anymore.
We have not found out why there are sometimes short dns problems for some rare mails out of thousands. But I think software (in this case exim) should be so robust that it can handle a DNS timeout of some seconds.
In the example our bind DNS server had the following entry in the query-error log:
22-Aug-2023 17:20:28.639 query-errors: debug 1: client @0x7f56f74b2050 195.xxx.xxx.87#45395 (examplecustom.com): query failed (SERVFAIL) for examplecustom.com/IN/MX at ../../../bin/named/query.c:8402
Below you see an example case where because of the above error, the mail is completed with a delay of 20 minutes.
My question is:
How can we configure Exim that it retries on DNS lookup error every minute for 20 minutes?
So in the best case when there is a DNS problem for some seconds, the mail is sent after one minute, when retry is OK.
Currently, the delay can be up to 30 minutes and that is not acceptable.
I have read https://www.exim.org/exim-html-current/doc/html/spec_html/ch-retry_configuration.html and tried the following line in /etc/exim4.conf.template
:
# DNS(Lookup) retry every minute for 20 minutes, first
* lookup F,20m,1m; G,16h,1h,1.5; F,4d,6h
# This is Exim default rule
* * F,2h,15m; G,16h,1h,1.5; F,4d,6h
But unfortunately, that did not change anything, the below example had occured with the above extra rule being active.
We have QUEUEINTERVAL='10m'
set in /etc/default/exim4
.
In this example I would want that the mail would have been sent at 17:28:47, but instead Exim says "retry time not reached"
and we lose another 12 minutes. I would like that the retry time would have been reached already at that time. This aggressive retry setting should be specifically for DNS lookup error, only.
2023-08-22 17:20:19 1qBTYB5-0006NM-Nv john.doe@examplecustom.com R=dnslookup T=remote_smtp H=mx01.hornetsecurity.com [94.100.132.8] X=TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=yes C="250 2.0.0 OK accept as AC027FA06BA:4f234742f62fac6098baf99d16db6d79 by mx-gate97-hz1"
2023-08-22 17:40:59 1qBTYB5-0006NM-Nv Completed
Asked by user319783
Aug 25, 2023, 01:15 PM
Last activity: Aug 25, 2023, 01:32 PM
Last activity: Aug 25, 2023, 01:32 PM