Project

General

Profile

Actions

Bug #10498

open

Mysql Deadlock

Added by Chambon Bernard over 8 years ago. Updated over 8 years ago.

Status:
In progress
Priority:
Normal
Assigned To:
-
Category:
-
Start date:
07/27/2015
Due date:
% Done:

0%

Estimated time:

Description

Found in /var/log/mysql.log

150721 12:19:36 [Note] Event Scheduler: [root@localhost].[treqsjobs.cleaning_jqueues] event execution failed.
150722 16:39:36 [ERROR] Event Scheduler: [root@localhost][treqsjobs.cleaning_jqueues] Deadlock found when trying to get lock; try restarting transaction
150722 16:39:36 [Note] Event Scheduler: [root@localhost].[treqsjobs.cleaning_jqueues] event execution failed.
150722 16:46:36 [ERROR] Event Scheduler: [root@localhost][treqsjobs.cleaning_jqueues] Deadlock found when trying to get lock; try restarting transaction
150722 16:46:36 [Note] Event Scheduler: [root@localhost].[treqsjobs.cleaning_jqueues] event execution failed.

Actions #1

Updated by Chambon Bernard over 8 years ago

Deadlock seems related to mysql event scheduler, mainly 'cleaning_jqueues' which interacts with jqueues and also jrequests tables (see mysql.log)
More details with SHOW ENGINE INNODB STATUS gives :

------------------------
LATEST DETECTED DEADLOCK
------------------------
150727  8:00:36
*** (1) TRANSACTION:
TRANSACTION 0 101886833, ACTIVE 0 sec, process no 12317, OS thread id 140274448369408 starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1216, 122 row lock(s)
MySQL thread id 8416867, query id 22431805 event_scheduler preparing
DELETE FROM jqueues WHERE 
   ( (end_time is not null) and 
     (TIMESTAMPDIFF(SECOND,end_time,now()) >  NAME_CONST('old_normally_ended_var',172800)) and
     (status=230) and 
     (id not in (SELECT queue_id FROM  jrequests where queue_id is not null))
   )
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 6096 n bits 1256 index `queue_id` of table `treqsjobs`.`jrequests` trx id 0 101886833 lock mode S waiting
Record lock, heap no 894 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 800f8a64; asc    d;; 1: len 4; hex 80474a5e; asc  GJ^;;

*** (2) TRANSACTION:
TRANSACTION 0 101886834, ACTIVE 0 sec, process no 12317, OS thread id 140274446771968 updating or deleting, thread declared inside InnoDB 476
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1216, 22 row lock(s), undo log entries 6
MySQL thread id 8416868, query id 22431802 event_scheduler updating
delete from jrequests where 
   ((end_time is not null) and 
    (TIMESTAMPDIFF(SECOND,end_time,now()) >  NAME_CONST('old_request_var',172800)) and
    ((status = 140) OR (status = 150) OR  ((status = 160) AND (tries >= 3)))
   )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 6096 n bits 1256 index `queue_id` of table `treqsjobs`.`jrequests` trx id 0 101886834 lock_mode X locks rec but not gap
Record lock, heap no 894 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 800f8a64; asc    d;; 1: len 4; hex 80474a5e; asc  GJ^;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 6096 n bits 1256 index `queue_id` of table `treqsjobs`.`jrequests` trx id 0 101886834 lock_mode X locks rec but not gap waiting
Record lock, heap no 884 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 800f8a60; asc    `;; 1: len 4; hex 80474a70; asc  GJp;;

*** WE ROLL BACK TRANSACTION (1)

Such deadlock can leads to sql exception in treqs app
o Case SQL exceptions for Dispatcher, dispatcher stops and app restarts with help of JSW, => ok
o Case SQL exceptions Watchdog, watchdog stops (see bug # 10490), this implies an NAGIOS alert ! (event is treqs still running correctly)

Possible actions
1) Try to ensure that adhoc indexes are available (see clause where, used in event scheduler requests).

2) Try to improve event scheduler requests with a) events not running at the same time, b) ensure serialization of execution

3) other ?

Actions #2

Updated by Chambon Bernard over 8 years ago

Checking on 07/28
  • No deadlock found on cctreqs[xrootd]
Checking on 07/29
  • New deadlock found on cctreqsxrootd for jrequest tables between :
    UPDATE jrequests SET status = 110, message = 'Registered in Queue 1042823', queue_id = 1042823, tape = 'JT113700', position = 459, position_offset = 0, level = 0, size = 644364039, errorcode = 0, submission_time = '2015-07-29 08:27:00' WHERE file = '/hpss/in2p3.fr/group/hess/Simul/Shower/gFixedEnergy_paris_0-8-8-8/KAS07010320145111.gz' AND status < 140
    
    and
    
    cleaning_jrequest event scheduler
    

    action :
    creating new index , checking usage via explain select ... => ok
    create index k2 on jrequests  (`end_time`, `status`, `tries` )
    show index from jrequests ; 
    
    

    Same operation made on jqueues table. Both operations on cctreqs en cctreqsxrootd
Checking on 07/30
  • No deadlock found on cctreqs[xrootd] (low activity)
Checking on 07/31; 08/01,02,03
  • No deadlock found on cctreqs[xrootd]

Checking on 08/04,05
* No deadlock found on cctreqs[xrootd] ?????

Actions #3

Updated by Chambon Bernard over 8 years ago

  • Status changed from New to In progress
Actions

Also available in: Atom PDF