How to add robustness to cheap headless computers

How to add robustness to cheap headless computers

With the advent of cheap tiny computers based on ARM processors, there are many new opportunities for inexpensive home and industrial automation. But there is a hiccup. Those cheap tiny PC often show reliability problems, to the point of being unusable for fully headless operation. Today, we are presenting a product that brings a solution to this problem: the Yocto-WatchdogDC.




Let's assume that you have setup your DIY weather station on a desert island. You don't want it to crash once you are back home, given that there won't be anyone to reboot it.

You don't want to go there every two days to restart your not-so-automatic weather station
You don't want to go there every two days to restart your not-so-automatic weather station


Ideally, the best way to be safe would be to fix all possible causes of instability. Unfortunately this is not always possible. System instabilities are often caused by complex combinations of circumstances, not easily reproductibles. They are often related to hardware design issues that cannot be fixed. In most cases, the technical support of these cheap computers is very limited: sometimes nobody knows who is the real manufacturer of the machine, sometimes the manufacturer only provides the bare minimal software support. In the best case, it is up to the user community to take care of the support issues and find solutions as good as possible. The Raspberry Pi is a typical case of this model, with a large user community working around the system technical limitations.

So when an instability problem cannot be avoided, or simply when you prefer to be safe than sorry, one of the solutions is to add to your system a watchdog timer, also known as COP timer (which stands for Computer Operating Properly timer).

connection scheme for the Yocto-WatchdogDC
connection scheme for the Yocto-WatchdogDC



The principle is simple: a watchdog is a kind of timer connected between the power supply and the computer itself. As long as the computer works properly, a software command will reset the timer periodically. If for some reason the software gets stuck (or the PC crashes), the timer will not anymore be reset and the safety trigger will fire: the computer will be power-cycled by the watchdog.

The Yocto-Watchdog can save your headless computer
The Yocto-Watchdog can save your headless computer



Of course, such a drastic recovery method must be a last resort, as shutting down a computer abruptly might have a serious impact on its filesystem. For this reason, the monitoring software should first try to restart the computer using software commands. But for cases where the WiFi or the USB stack stop working for instance, power cycling might be the only way to get things back up and running.

An example

Let see for instance how we can add some robustness to a MK805 (also known as Mini-X), well known for
1. not always booting up properly when powered on
2. loosing Wireless connectivity from time to time.

Here is a simple solution that does not require writing any custom program.

You will find among Yoctopuce libraries a version that works as command line. Source code is available, but you will also find ready-to-use binaries for most common operating systems, including Linux/ARM that we use on the MK805 (armel architecture). One of the binaries is named YWatchdog, and is designed to drive the Yocto-WatchdogDC.

The following commands can start the watchdog monitoring, with an automated power cycle after 120 seconds without sign of life:


./YWatchdog any set_triggerDelay 120000
./YWatchdog any set_running ON
 


(NB. this configuration can also be done with a graphical user interface using the VirtualHub if you like it better).

The following script, that we will save as /root/keep-alive.sh will automatically signal that the computer is still operating properly:

#!/bin/bash /root/YWatchdog any resetWatchdog



You only need to add to the system crontab a call to the script above every minute, using the following command:


sudo crontab -e
 


and by adding this line in the crontab editor:

* * * * * /bin/bash /root/check-alive.sh



From now on, the system will automatically confirm that it is operating properly every minute, and the watchdog will restart after two minutes in case this does not happen.

Now if we want to check that the WiFi is still working, and if needed to power cycle the machine, we only need to add this check to the keep-alive.sh script:

#!/bin/bash # 1. We are alive, keep the watchdog happy /root/YWatchdog any resetWatchdog # 2. Make sure networking is working SITE=http://www.google.com /usr/bin/wget -q --tries=10 --timeout=5 $SITE -O /tmp/chk &> /dev/null if [ ! -s /tmp/chk ];then shutdown -h fi



The script will shut down the computer properly as soon as the internet connection goes down for a significant amount of time. The watchdog will then detect this condition and perform a real power cycle in order to bring the computer back up, and repeat the power cycle as many time as needed to bring this MK805 power circuit really up. The Mk805 should now work much more reliably, as long as it gets power, but this is another story...

If course, you can make all this more efficient, and elaborate on this kind of functionalities by integrating the call to the watchdog function directly into your code, using the adequate Yoctopuce programming libraries. The principle will remain the same: in case it really fails, shutdown everything and restart from scratch.

Last but not least, one more advice: if you really intend to setup a computer on a desert island, to the top of a mountain or on the moon, make sure that its operating system starts from a read-only media and works from a RAM-disk. This will make it much more resilient to unexpected power cycles, regardless of whether they are caused by a power failure or by a watchdog.

Add a comment No comment yet
Back to blog












Yoctopuce, get your stuff connected.