Campain 1 (2018-2019)

General DOMA informations

DOMA_FR project

DOMA_FR tests

  • Global transfers from/ to (same site dest/source site excluded)
Header text LAPP LPSC CC
From
Link

Link

Link
To
Link

Link

Link
  • Data transfer through LHCONE
Transfer South-East CC
Source
Link

Link
Destination
Link

Link
  • Computing activity per job type (Running slots)
Running slots LAPP LPSC CC

Link

Link

Link
  • Data access monitorings as seen by site WN
Destination of access LAPP LPSC CC
Production download
Link

Link

Link
Production upload
Link

Link

Link
Production input
Link

Link

Link
Production output
Link

Link

Link
Analysis Download
Link

Link

Link
Analysis Direct Access
Link

Link

Link
  • Questions
    • Enabling direct access creates much more network usage -> Is it usefull for ATLAS ? (processing urgent request faster ?)
    • Production_output can only be done to 1 site
    • If direct access to IN2P3-CC in direct access with xrootd, what is the size of the portal (2 or 10 Gb/s for ATLAS)
  • Issues
    • Asynchronous transfers of input files goes to closest/fastest site to input site instead of the read_lan0 to the WN (would help to reduce network occupancy between IN2P3-CC and LAPP/LPSC since smoothed by FTS) -> Request to change on 21st September (Panda level)
    • Jobs brokering should take into account downtime of remote SEs (issue with IN2P3-CC downtime) -> Request sent by Rod
    • 10 Gb/s connection LAPP-CC (used for all LAPP WAN transfers to any site) can get saturated if huge amount of jobs starting at same time (No smoothing by Panda) -> No suggestion yet
    • Production_output can only be done to 1 site (usefull if SE destination in downtime while local/remote WN is not) -> No request
    • Users can enforce analysis jobs to copy files instead of direct access -> the whole file is transfered instead of a fraction
    • IN2P3-CC is running only 200-300 analysis jobs while the total site runs 10k : Reason is the different analysis share between T1 (5% ) and T2 (25% ?)
    • IN2P3-CC presents remote direct read access because read_wan0=srm
  • Next steps
    • Deploy Rucio 1.17 to use protocol priority defined in AGIS (bug in 1.16). Solving bug : use most trusted protocol and would help to control srm decreasing usage
    • Monitor job efficiency vs RTT between SE and WN -> Identify when cache does not have impact (assuming no network bandwidth limitation)
    • Understand the ATLAS job brokering algorithm
    • Get the typical transfer rate per job type (Johannes)