Project

General

Profile

Actions

Task #10061

open

Testing 'stability-fix' branch

Added by Chambon Bernard over 9 years ago. Updated over 9 years ago.

Status:
In progress
Priority:
Normal
Assigned To:
-
Category:
-
Start date:
06/02/2015
Due date:
% Done:

0%

Estimated time:
Actions #3

Updated by Chambon Bernard over 9 years ago

Monday 2015/06/01
  • Test_1000 : PASS
    2 files remaining "Registered in Queue ",
    automatic restart => those 2 files staged => ok
  • Test_5000 : FAIL
    4397 files ok
    237 Fail : Registered in Queue
    366 Fail : File locked or HSM is currently unavailable.
    Several (3 or 4) Queue_id with 'Staged files' and 'Registered in Queue'
    Ex 179 : 330 files (also 330 files in queueMap), but only 162 'Staged files'
    Ex 202 : 145 files (also 145 files in queueMap), but only 83 'Staged files'
    Is this due to HSM problem ?
    For "File locked or HSM is currently unavailable"
    Check the next files to be sure ...
    /hpss/in2p3.fr/group/ccin2p3/treqs/RUN01/ccwl0100.11781_000010Mb_0029.dat
    /hpss/in2p3.fr/group/ccin2p3/treqs/RUN01/ccwl0141.15044_000010Mb_0052.dat
    

    Done, those 2 fies (as examples) were not locked, (= staged successfully) => was it an HPSS temporary unavailability ?
    Yes, it's confirmed that tape JS088200 has got I/O failures and then HPSS has locked it
Actions #4

Updated by Chambon Bernard over 9 years ago

Wednesday 2015/06/03
  • Test_1000 : PASS
    2 files remaining "Registered in Queue ",
    automatic restart => those 2 files staged => ok
  • Test_5000 : PASS
Thusday 2015/06/04
  • Test_5000 : PASS, but ...
    1. 9 restart events of app. on message "No staging since a while although there are submitted requests" (normal restart)
    2. several ConcurrentModification Exceptions :
      INFO   | jvm 9    | 2015/06/04 16:37:44 | 2015-06-04 16:37:44,598 [Stager_Qn_JTI58200_QId102_stagerNo_1] ERROR  toStart 259  f.i.c.s.treqs.control.stager.Stager - Stopping
      INFO   | jvm 9    | 2015/06/04 16:37:44 | java.util.ConcurrentModificationException: null
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) ~[na:1.7.0_75]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) ~[na:1.7.0_75]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at fr.in2p3.cc.storage.treqs.model.Queue.getNextReading(Queue.java:787) ~[treqs-java-1.0-SNAPSHOT.jar:na]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at fr.in2p3.cc.storage.treqs.control.stager.Stager.stage(Stager.java:195) ~[treqs-java-1.0-SNAPSHOT.jar:na]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at fr.in2p3.cc.storage.treqs.control.stager.Stager.action(Stager.java:122) ~[treqs-java-1.0-SNAPSHOT.jar:na]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at fr.in2p3.cc.storage.treqs.control.stager.Stager.toStart(Stager.java:255) ~[treqs-java-1.0-SNAPSHOT.jar:na]
      INFO   | jvm 9    | 2015/06/04 16:37:44 |     at fr.in2p3.cc.storage.treqs.control.process.AbstractProcess.run(AbstractProcess.java:214) [treqs-java-1.0-SNAPSHOT.jar:na]
      DEBUG  | wrapperp | 2015/06/04 16:37:46 | send a packet PING : ping
      
Actions #5

Updated by Chambon Bernard over 9 years ago

Thursday 2015/06/04
  • Test_100 : PASS (checking jtreqs.log and wrapper.log)
    No ConcurrentModificationException
    No restart
  • Test_1000 : PASS
    ConcurrentModificationException no
    Restart of app 3 or 4
    All files staged (see ES, june the 5th)
    file:///Users/bchambon/Documents/Logstash-Elasticsearch-Kibana/Kibana/Application4PreProd/index.html#dashboard/temp/ZWHOqkZyQTqBeM7fX7RlvQ
    Some queues with no activation time, strange !
Actions #6

Updated by Chambon Bernard over 9 years ago

Monday 2015/06/08
  • Test_5000 : PASS
    ConcurrentModificationException ?
    Restart of app ?
    All files staged (see ES, june the 8th)
    Some queues with no activation_time nor end_time !
Actions #7

Updated by Chambon Bernard over 9 years ago

  • Status changed from In progress to Feedback
Actions #8

Updated by Chambon Bernard over 9 years ago

Monday 2015/06/22
  • test on test instance (=ccsvli10), by PEB
  • Test_5000 : PASS
    Restart of app, It seems that no, cool
    All files staged (see ES, june the 22th)
    All queues have activation_time + end_time ok
Actions #9

Updated by Chambon Bernard over 9 years ago

  • Status changed from Feedback to In progress
Actions

Also available in: Atom PDF