The Fault in Our System: Improving Event-Handling Response at Gemini Observatory


Submillimeter Array Hilo, Hawai'i


Gemini Observatory’s telescope is controlled by many computers. These computers record all “events” as log messages into files: mechanism movement and positioning, temperature, and system errors, to name a few. There are a vast number of log messages being recorded, and not all log messages are important. Thus, a mechanism was needed to extract and handle only important messages.

Our solution was implemented using a three-way communication between a Web application, server, and client log watchers. The web application provides users with a user-friendly interface that allows the user to identify an important log message by submitting a regular expression and its corresponding action. From the Web application, the server sends the regular expression and appropriate action to the proper log watcher. The log watcher reads each recorded log message. If the log message matches with an expression the log watcher triggers the corresponding action such as emailing the user that an expression was matched. This mechanism would reduce the difficulty of identifying and handling critical log messages.

This overall solution, however, does not yet address the disk issue of Gemini Observatory’s log file collections. The writing and reading of Gemini Observatory’s log collections could be improved, as files are currently written and maintained without any concern toward finite disk storage capacity.