Having a job scheduler without notification ability makes no sense because you never know what’s going on with it and whether it is running correctly. Luckily Chronos has built-in support for sending out emails when a job is failed or interrupted, which means all we need is to have a mail server that is able to send out emails.
My solution is to set up the mail agent myself instead of using a existing mailing service like Gmail. My reasons are:
- I don’t want to have the potention to leak my account password.
- I want to have a custom domain in the sender’s email address.
- Setting up an email transfer agent is not that hard.
And here comes in postfix, which is a very popular free open source MTA. It’s estimated that 25% of the public mail servers use it.
I use ansible playbook to install and configure postfix, the environment is AWS Linux AMI (based on Centos 6).
On AWS Linux AMI instances, the sendmail service is normally enabled and started by default, because it’s binded to port 25, we need to stop and disable it first so that port 25 will be freed to use by postfix.
- name: make sure sendmail service is not running and disabled to free port 25
The configuration of postfix is the most complicated and important part to get it work properly. The following is what I have in the configuration file.
soft_bounce = no
You have to set myhostname and mydomain field respectively according to the domain you want to use, and set the replay_domains properly so that emails can reach the receiver, e.g. if you want to be able to send email to a gmail user, you need to add gmail.com to relay_domains. In addition, if you’re running Chronos in docker containers using Marathon, make sure that reject_invalid_hostname and reject_non_fqdn_hostname are not presented in the smtpd_recipient_restrictions, this is because Chronos built-in email client doesn’t use a fqdn hostname, the hostname will be the container’s short ID. Finally, you have to make sure mynetworks contains the CIDR block of your senders’ address, e.g. if you run chronos in docker with default docker networking configuration, you have to add 184.108.40.206/8 to mynetworks, otherwise the sending request from chronos will be blocked.
As we set “virtual_alias_maps = hash:/etc/postfix/virtual” in the configuration file, we need to prepare the virtual file with proper usernames and domains.
Once the file is set up, run postmap to turn the file into lookup table.
- name: set up user database
As we set “mime_header_checks = pcre:/etc/postfix/body_checks” in the configuration file, we need to prepare the file in place.
- name: set up body checks file
- name: start postfix and enable into
The related parameters to configure chronos for failure notifcations are –mail_from, –mail_user, –mail_password, –mail_server and –mail_ssl, in our case, we don’t have to set up –mail_password and –mail_ssl. Set the following parameters based on your domain name:
On the other hand, for each of the chronos job definition file, add owner field into it:
Up to this point, we can test whether this thing wors or not. Set up a job that will fail purposely and run it manually through API or the chronos UI. And you should receive emails like this:
'2016-06-17T04:41:19.171Z'. Retries attempted: 2.
Up to this point, we’re all set to have chronos report errors. The full ansible code can be found at https://github.com/WUMUXIAN/microservices-infra/tree/master/aws. Now let’s hope all our jobs run well and we will never receive this kind of emails after testing. :)