Project

General

Custom queries

Profile

Actions

Bug #9864

closed

The application is dying

Added by Chambon Bernard almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Start date:
05/13/2015
Due date:
% Done:

0%

Estimated time:

Description

Dispatcher and Activator fail,
After that the app stop due to heartbeat check : "No heartbeat, the application is dying"

Where are the dispatcher and activator stopped

From where is the Dispatcher status set to STOPPING ??

2015-05-13 12:52:16,476 [Dispatcher] INFO   f.i.c.s.t.c.p.AbstractProcess - Calling getProcessStatus return STOPPING
2015-05-13 12:52:16,476 [Dispatcher] INFO   f.i.c.s.t.c.p.AbstractProcess - Setting status to STOPPED 

From where is the Activator status set to STOPPING ??

2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - >< getProcessStatus
2015-05-13 12:51:42,354 [Activator] INFO   f.i.c.s.t.c.p.AbstractProcess - Calling getProcessStatus return STOPPING
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - < keepOn
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - > keepOn
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - >< getProcessStatus
2015-05-13 12:51:42,354 [Activator] INFO   f.i.c.s.t.c.p.AbstractProcess - Calling getProcessStatus return STOPPING
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - < keepOn
2015-05-13 12:51:42,354 [Activator] WARN   f.i.c.s.t.c.a.Activator - Activator Stopped
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.a.Activator - < toStart
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - > setStatus
2015-05-13 12:51:42,354 [Activator] INFO   f.i.c.s.t.c.p.AbstractProcess - Calling setStatus to STOPPED
2015-05-13 12:51:42,354 [Activator] TRACE  f.i.c.s.t.c.p.AbstractProcess - >< getProcessStatus

Actions #1

Updated by Chambon Bernard almost 10 years ago

  • Description updated (diff)
Actions #2

Updated by Chambon Bernard almost 10 years ago

  • Description updated (diff)
Actions #3

Updated by Chambon Bernard almost 10 years ago

For Dispatcher, one cause could be a failure in action>retrieveNewRequests() 

2015-05-13 09:04:44,967 [Dispatcher] ERROR  action 250  f.i.c.s.t.c.dispatcher.Dispatcher - Unknown problem while retrieving new requests: null. Stopping.

then this cause : 
Starter.getInstance().toStop();

Note that this NOT occurs whith 10 ou 100 requests
It appear with 1,000 requests, perhaps du to the fact that new requests are inserted while Dispatcher and Activator are running (Is that a clue)

Memo
> cd /afs/in2p3.fr/home/b/bchambon/TREQS/SERVER/Tests 
>./trcp.pl -u bchambon -v -p input4trcp.1000

Actions #4

Updated by Chambon Bernard almost 10 years ago

Other case : ConcurrentModificationException in StagersController ... and as usual in case of Exception, calling Starter.getInstance().toStop();

2015-05-13 14:48:45,796 [tape-1431521312016-11-JTI57600-109] ERROR  f.i.c.s.t.c.s.Stager - Stopping
java.util.ConcurrentModificationException: null
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) ~[na:1.7.0_75]
    at java.util.ArrayList$Itr.next(ArrayList.java:831) ~[na:1.7.0_75]
    at fr.in2p3.cc.storage.treqs.control.controller.StagersController.getActiveStagersForQueue(StagersController.java:221) ~[treqs-java-1.0-SNAPSHOT.jar:na]
    at fr.in2p3.cc.storage.treqs.control.stager.Stager.stage(Stager.java:197) ~[treqs-java-1.0-SNAPSHOT.jar:na]
    at fr.in2p3.cc.storage.treqs.control.stager.Stager.action(Stager.java:118) ~[treqs-java-1.0-SNAPSHOT.jar:na]
    at fr.in2p3.cc.storage.treqs.control.stager.Stager.toStart(Stager.java:239) ~[treqs-java-1.0-SNAPSHOT.jar:na]
    at fr.in2p3.cc.storage.treqs.control.process.AbstractProcess.run(AbstractProcess.java:205) [treqs-java-1.0-SNAPSHOT.jar:na]
2015-05-13 14:48:45,797 [tape-1431521312016-11-JTI57600-109] TRACE  f.i.c.s.t.c.s.Stager - < toStart

Actions #5

Updated by Chambon Bernard almost 10 years ago

  • Status changed from New to Resolved
  • No sure to have 'resolved" the problem is an elegant way ... but
    1. now Dispatcher and Activator forever loop is no more based on ProcessStatus but it's now a while(true)
    2. Change in heartBeat method from Watchdog class
  • Watchdog is useless now. An internal 'basic' monitoring has been added (see MyObserver class)
Actions #6

Updated by Chambon Bernard almost 10 years ago

  • Subject changed from the application is dying to The application is dying
Actions

Also available in: Atom PDF