ELK is perfect for small business who need to monitor the logs of their infra. Coming from a Splunk setup, I can say that even if ELK is lacking few minor functionalities, it will convince 90% of companies with its price (free! ;) and community support & content!
Here is some final production dashboards. All of this could be running on one big screen.
We can monitor here all logs with errors, by VM or container. ON 24 hours we can detect app failing, database errors, incoming high traffic, node stop syncing.
All SSH connections with success or failure are displayed, impossible then to miss any attacks (hopefully ;-).
Below is the performance of servers. Basically, we run each 1 minute a "top" on all servers to collect metrics. We display only high CPU or disk of the top 6 servers, which is very powerful because in half of a screen we can monitor 20+ servers.
Here we can see that VM ethworker is swapping badly... :-O
Follow the github to get a full ELK stack running.
If you are interested to forward container performance logs from CAdvisor to ELK, please have a look at this repo (careful: older version of ELK).
Thank you for reading :-) See you in the next post! Greg