Actions
Task #9815
closedMerge Watchdog features in JSW
Start date:
05/05/2015
Due date:
% Done:
0%
Estimated time:
Description
Merge treqs Watchdog capabilities in JSW.
Currently, this script is run as root by cron on cctres[xrootd] and check the status in the database:
see "/opt/jtreqs/bin/watchdog.sh" script on cctreqs
# watchdog */5 * * * * /opt/jtreqs/bin/watchdog.sh -db | tee -a /var/log/jtreqs/watchdog.log
Updated by Chambon Bernard about 10 years ago
- Status changed from Assigned to In progress
J'ai regardé l'exemple suivant (relance demandee par watchdog le 30 avril vers 9h) dans watchdog.log 2015-04-30_09:05:01 Stopping jTReqS: [FAILED] 2015-04-30_09:05:31 Starting jTReqS: [ OK ] dans jtreqs.log 2015-04-30 09:00:42,847 [Activator] WARN f.i.c.s.t.c.activator.Activator - Activator Stopped 2015-04-30 09:01:06,500 [tape-1430376533639-7-KT000700-2884669] INFO f.i.c.s.treqs.tools.Instantiator - Class to instantiate fr.in2p3.cc.storage.treqs.hsm.hpssJNI.HPSSJNIBridge 2015-04-30 09:05:31,629 [main] INFO f.i.c.s.t.control.starter.Starter - Starting Server dans la console 2015-04-22 10:36:34,260 [tape-1429691794228-2-IT769700-2879475] WARN General error. Retrying /hpss/in2p3.fr/group/km3net/mc/prod/v4/JTE/km3_v4_anueNC_179.JTE.root 2015-04-30 09:00:42,847 [Activator] WARN Activator Stopped 2015-04-30 09:01:16,929 [main] ERROR No heartbeat, the application is dying. jTReqS-Server started 2015-04-30 09:05:31,694 [main] ERROR Version: jTReqS Server 1.6.1-SNAPSHOT Pour rappel, dans le code java on a un boucle sur while (cont) Watchdog.getInstance().heartBeat(); regarde le code heartBeat() ... dans tools/Watchdog.java Je dirais que s'il le heart beat n'est pas mis a jour alors on est dans le else de "if (activator && dispatcher)" et donc on a sur la console le message "No heartbeat, the application is dying" message qu'on pourrait mettre dans la conf de JSW pour demander la relance de TREQS style # To restart JVM when next message occurs on stdout wrapper.filter.trigger.2=No heartbeat, the application is dying wrapper.filter.action.2=RESTART # This property controls the number of seconds to pause between # a JVM exiting for any reason, and a new JVM being launched. # The default value is "5 seconds". wrapper.restart.delay=30 A tester !
Updated by Chambon Bernard almost 10 years ago
- Status changed from In progress to Closed
Actions