Very often people think that publishing is a last step in long, sometimes sophisticated “development” game. The real challenge, however, starts when application is already published. From this point on we start monitoring the applications. This is very challenging process and we should start forming a strategy for it from the very beginning of project development. As developers we’ve experience the following scenario time and time again- if there is any problem with application your client will come to you and say: “Guys it doesn’t work, please check the error, fix resole…” Sounds familiar, isn’t it?
Here’s a question? Is every problem a bug in source code? No it’s not. Modern applications don’t exist in sandbox-like environments in Commodore or Atari times. Nowadays applications are highly connected, exchange data with other sources through APIs, interfaces and such. With that in mind, there can be a lot of potential problems with the application related to other “parts of the puzzle”, not only in app’s source code.
Typically we monitor all events connected with production apps. Such events are typically categorized as application crash or important diagnostic information. For purpose of monitoring Leaware uses our own, in-house tool “Leaware Logger” that is connected with Xamarin Insight.
Thanks to information from “Leaware Logger” we are able to quickly identify all events that can be potentially dangerous to the application.
Problems are not always related to bugs in app. There can be a few other reasons such as:
Leaware runs cyclic tests of all of our production applications, on real devices and real scenarios. That allows us to see the performance of whole solution as well as if there is any abnormality or bugs that shouldn’t happen… For this purpose, we use Xamarin Test Cloud solution, where we can test run applications in an automated way… on real devices!
Collecting all information is pointless without a conscious idea on how to use it. This is a quite of a challenge. No one wish to be flooded with exceptions, stack traces and so on. This is why we use also nice and sophisticated tools here.
Main tool for collecting all data from all applications is Zabbix (http://www.zabbix.com/). We have some zabbix installations in different locations that are collecting data from all our client’s applications.
Zabbix is a very sophisticated tool, allowing you to setup a log of nice logic, which helps us when something is not performing well.
When production app is connected with backend and backend is not available at the moment then we start receiving a lot of exceptions from the app. On the other hand, we also receive information from webservice monitoring tool, informing us that it doesn’t work.
Zabbix is responsible for collecting all the data and then for making decision if someone should be informed about the fact that something is not performing as it should.
For example. If webservice stop working, zabbix execute proper trigger only when ten last checks fails. Thanks to that we are not flooded with many information.
We internally communicate with our technical staff by using many different channels, but our favorite is Slack. Yes we love Slack (www.slack.com) .
Slack is connected with Zabbix. When Zabbix has “something interesting to say”, it sends proper message to a dedicated zabbix channel. Thanks to that all people involved know about problem very quickly, and thanks to that we can react as fast as it’s possible.