Attending: John B, Sam, John H, Daniel, Lukasz, Marcus, Winnie, Pete, Steve, Brian, Brian, Jens

Notes by Sam + Jens

Apologies: Brian

0. OPertational issues

   - Oxford - capacity reporting to REBUS wrong due to misadvertised site name; should be fixed but awaiting confirmation

   - Glasgow - xroot redirector keeling over, silently; is fine when restarted.  Needs restarter, but there is more than
     one process in DPM...

1. An audience with Brian Bockelman from Nebraska who happens to be at RAL today, partly to talk about CVMFS in yesterday's
   CVMFS workshop but also to work today on linking GridFTP to xroot on CEPH.

   - Nebraska - also support LIGO
   - Use of CVMFS to distribute not just code but also data, aiming to scale to ~1PB, bypassing squids; aiming to have ~10TB caches
   - LIGO not keen on certificates; would be useful to have OAuth support as well
   - would like to have local cache in CEPH (or Lustre), so CVMFS provides POSIX and a cache miss is resolved first against
     the local object store and only if that fails extends to the rest of the CVMFS.  Also buffering data on WN.
   - expecting production by September
   - xrootd used to provide https as well as of course xroot
   - Compare EOS = "xroot + namespace"

------------------------------------------------------------------------
LIGO support in OSG: Started out with the "trivial" case of copying the LIGO dataset (around 5 to 10 TB of data, at present) to Nebraska CMS T2. Picked a relatively computationally intensive (== low io rates) LIGO application - Pegasus , and simply submitted jobs to the OSG grid [as HTCondor glide-ins] with the wrappers set to point (over the network) at the Nebraska copy of the data, using gridFTP for staging. This was successful, mainly because the choice of job was well impedance matched.

Current phase: "standard" LIGO users are used to running locally on a POSIXy filesystem. Developing enhancements to CVMFS to allow it to work with very large datasets - moving it from being a software distribution service to a data distribution service - and also to support authenticated flows (auth using X509/VOMS; OAuth2 coming). Obviously, Secure CVMFS uses https, and thus bypasses the usual site Squid; Brian opined that this was not a wholly bad thing, as squids are also not well suited to caching v large files. 
cvmfs 2.2 contains the "large files/data infrastructure" changes; 2.3 will add the Secure CVMFS bits for release to wider production.

Future plans: moving to a tiered caching model - local WN cache/buffer (which should be small, to increase cache efficiency - only holds the recently accessed chunks), a site-level secure cache (replacing Squid for Secure CVMFS), for which Brian B would prefer something like an object store for efficiency. 

Performance has been good, even without tiered caching, but is also a function of matching job io requirements and infrastructure available to the problems you can solve well.

Sam S asked why the decision was made to move CVMFS to being more data-distributed cache like and secure, rather than moving Xrootd (say) to being more like a POSIX filesystem (global fs etc). [Tiered read-only caches as an idea aren't new, but different ones start from different positions]
Brian noted that, for example, EOS did do the Xrootd to POSIX transition. He was of the opinion that the POSIX side of things was actually the more important aspect [and, in fact, the data movement for the http/s layer in this implementation uses http-over-Xrootd, so Xrootd is involved for data movement!]. Also, xrootd federations have issues if your local xrootd server drops out of the namespace, for example.
[And another advantage of CVMFS is that it already supports variadic symlinks, so sites with local-local copies of data can just point directly at that local filesystem and bypass the CVMFS infrastructure entirely.]

Sam S noted that secure CVMFS might also solve the use-case of users who rely on proprietary or licensed software
Brian D notes that the Secure CVMFS is also being investigated by Catalin at RAL for precisely this purpose [and will give updates on this in future].

Sam S asked about Brian B's thoughts on the applicability of the network-default models in the US/OSG Grid evolution to European and UK models where connectivity is slower (both backbone and site-interface). Brian agreed that the OSG models are designed as they are *because* there's 100Gbit interfaces, and 300Gbit backbone, to support fast dataflows in the USA. In a more constrained environment, you would want to concentrate more on caching, data-awareness. 
[There was a discussion concerning the tradeoff between making the user have to think more about their requirements, versus making the infrastructure complex enough to cope with "flexible" loads.
 - prestaging, as in the ARC Caching model, does require that the job understands the data that it will need when it runs (which Grid users are used to, but people used to working with a single filesystem mounted on their cluster and their workstations are not used to).] 
------------------------------------------------------------------------
   

Jens Jensen: (08/06/2016 10:01:01)
Morning John & Sam & John. I haven't got Brian connected yet; I will go chase them. 
Lukasz Kreczko: (10:05 AM)
is anyone speaking?
Samuel Cadellin Skipsey: (10:06 AM)
For those wondering, I believe that Jens is off finding Brian
Jens Jensen: (10:11 AM)
http://indico.cern.ch/event/394783/