WEBVTT 00:00.000 --> 00:18.960 Thank you, and good afternoon, everybody, so I'm Mario, and today I'd like to talk a bit 00:18.960 --> 00:26.960 how we're working to implement an S3-fronted cold storage at serve. Just very quickly, a bit 00:26.960 --> 00:31.760 of what me, so I'm computing engineer, sir, and I started just three months ago, and I joined 00:31.760 --> 00:38.640 the T-Percad on the cap team. And my first task in the team was actually to review proof of concept 00:38.640 --> 00:45.760 it was done by a summer student last year, concerning putting a tape back into an S3 endpoint, 00:45.760 --> 00:50.880 and my ultimate goal in there will be to design an S3 interface for our tipping infrastructure. 00:51.200 --> 00:58.080 Just briefly, what we will discuss today, so we'll go over the project goal and give some 00:58.080 --> 01:05.280 technical context that will shed more light on why the decisions and discussions, why do what 01:05.280 --> 01:10.880 I will talk about. Then we'll analyze the proof of concept that has been developed last year, 01:10.880 --> 01:17.280 and then we'll go over brainstorming of the possible architectural solutions that we will be 01:18.240 --> 01:24.640 and at the end, there will be some time for questions. First of all, what is sir, 01:24.640 --> 01:29.200 so it's the world's biggest laboratory for particle physics, it's mostly concerns itself 01:29.200 --> 01:36.000 with the fundamental physics study, and it's mostly famous for it's most famous for it's 01:36.000 --> 01:40.960 LHC, it's a large hydrogen collider, which is a particle accelerator that is located beneath 01:41.920 --> 01:50.400 the Geneva area, spanning across Switzerland and France border. Why does sir need tape in the first 01:50.400 --> 01:55.920 place? Well, because experiments need to write a lot of data and it needs to be stored somewhere, 01:55.920 --> 02:03.520 and sir, the tapes constitute a very efficient means of storage. In fact, just last year 02:03.520 --> 02:10.000 last December, we had one extra bite of stored data in our tape libraries for x-marry 02:10.000 --> 02:16.080 mandate, but that's not the only use case. We also store user data for could be for various 02:16.080 --> 02:22.080 reasons for compliance or disaster recovery, but that's another use case, and the way that we 02:22.080 --> 02:27.120 stored them, so we have developed internally software to manage the tape libraries, which is called 02:27.600 --> 02:36.000 the Seren tape archive or CTA, and it's open source software, and so the idea here is to have 02:36.000 --> 02:43.360 CTA as a tape backend to a nestry plus place here, API endpoint. Why is, because so as three 02:43.360 --> 02:50.800 is pretty much the industry standard, there's a lot of clients support for it, and also we want to 02:50.800 --> 02:57.280 fit into the ecosystem that is already there with the S3, so we want to avoid reinventing the 02:57.280 --> 03:02.800 wheel of creating another protocol, so S3, what constitutes a good way to interface with it. 03:04.160 --> 03:08.240 I should also mention that for the time being, this project will at least internally 03:08.240 --> 03:17.120 will mostly target the user data archive use case. So to give a brief visual about it, 03:17.120 --> 03:23.040 so imagine this being the whole service that we want to build, so it's take back up service 03:23.040 --> 03:29.920 for users, you have backup buckets, and there must be an S3 point somewhere, but you can identify 03:29.920 --> 03:36.720 within this system a subset, which I would call the appliance, which has its own aspirant point, 03:36.720 --> 03:42.320 and this is the one responsible for actual physical tape storage regardless of how the system 03:42.320 --> 03:46.720 is internally built, and this will be our main challenge to build it. 03:49.040 --> 03:57.360 So first, technical context about certain type archive, so again it's the software that 03:57.920 --> 04:02.480 provides physical access to the tape libraries, and the interesting thing about it is that you 04:02.480 --> 04:10.720 don't interact directly with it. The way, so CTA needs a disk buffer in front, which is an intermediary 04:10.720 --> 04:19.760 storage medium, which will act as a buffer between the client and tape, and this is because 04:19.760 --> 04:25.040 the tape medium is really high latency, it's not always ready to write, so you need this intermediary 04:25.840 --> 04:32.240 space to store that on, and for this reason you can see CTA as a tape back and for the disk buffer, 04:33.120 --> 04:38.800 and for example the most support, I mean there are some support that flows for the files, 04:38.800 --> 04:44.080 you can write files to the disk buffer, and it will be archive, or you can replace the file to be 04:44.080 --> 04:49.760 recalled, and it will be written back to the disk buffer if it was not already there. 04:51.120 --> 04:56.240 Yeah, there are more flows, but we'll mostly focus on these, there's not enough time unfortunately, 04:56.240 --> 05:01.280 and it's free and open software, it's on a certain GitLab or on GitHub, 05:03.280 --> 05:07.920 and just to be a bit more clear, so this is an example of the archive also, there will be some 05:08.480 --> 05:14.160 data producers, which will dump data into the disk buffer, and the disk buffer, which knows that 05:14.160 --> 05:21.440 is being backed by CTA, will basically queue work into the archive queue, and eventually the 05:21.440 --> 05:27.520 tape drives will, when they are free to perform work, will pick up some of the work that needs to be done, 05:27.520 --> 05:34.160 and will mount a new cartridge in there, and it will start writing to tape, you can see there 05:34.160 --> 05:43.280 is 400 megabyte per second average, there is a reason for that, and this is for each of the 05:43.280 --> 05:50.880 tape drives, and the workflow works in reverse, so one thing that I didn't mention here is that 05:50.880 --> 05:57.920 as soon as a file is written to tape, it's actually truncated in the disk buffer, so the disk 05:57.920 --> 06:03.120 buffer retains the metadata, but the content of the data is not on the disk buffer, or all of the 06:03.120 --> 06:09.920 data is fully on the only on tape, and given this situation, you have the recall workflow to call the file back, 06:09.920 --> 06:18.240 so there is a component that will ask the disk buffer, actually, for the file for restoring it from tape, 06:18.240 --> 06:24.240 and this will happen again to a retreat queue, and whenever a drive is ready, and the conditions match, 06:25.520 --> 06:31.440 and the conditions allow it, it will mount the cartridge for which it was requested, and will dump the 06:31.680 --> 06:36.720 content of the file back into the disk buffer, so it will rehydrate the file, metadata was already there, 06:36.720 --> 06:43.840 but now the content is also back there, and the client can access it, so a bit more about the disk buffer 06:44.560 --> 06:50.320 that is a sits in front of CTA, so it's the one holding the metadata, you can see it has a file system 06:50.320 --> 06:57.440 higher hierarchy, so you write as you want to a file system, but one thing that should be noted that 06:57.440 --> 07:04.160 metadata leaves only here, whenever CTA reads from the disk buffer, it will only read the data, 07:04.160 --> 07:11.840 and write exclusively that to tape, so U.S. actually, U.S. I didn't mention that, but this is the 07:11.840 --> 07:18.240 software that we use for as a disk buffer, so U.S. stands for U.S. open storage, and it's also an open 07:18.240 --> 07:24.640 source technology, but basically the fact that metadata is held only by it means that its persistent 07:24.800 --> 07:29.440 is critical, so if you lose U.S. for example, you will be able to retrieve the file content 07:29.440 --> 07:37.120 not there metadata, so you wouldn't know what the files are, and so a little difference with the 07:37.120 --> 07:43.680 normal file system is that U.S. is aware of files being online or offline, so if the content 07:43.680 --> 07:48.800 is there, you can consider a file online, you can retrieve it right away, but if the content is on tape, 07:48.800 --> 07:52.320 even though the metadata is there, then the file is offline, because you need a retrieval operation 07:52.320 --> 08:00.480 to be able to read it back, and it's explicitly designed for large and stable throughput, 08:00.480 --> 08:09.520 this is important for tape, because actually tape has a minimum speed that it should be written 08:09.520 --> 08:15.040 and written to and write from, and this is because of how it works, because it's a linear tape, 08:15.040 --> 08:19.760 and so it spins and breaking it is not something that you can do immediately, if there's no data, 08:19.760 --> 08:29.760 I cannot stop right away, so there are some constraints around speed, about this technical context, 08:29.760 --> 08:37.600 I want to take away the main points about the minimum speed for the tapes, so that whatever solution 08:37.600 --> 08:42.320 needs to be able, you can see this as constraints or features, so it needs to guarantee 08:42.320 --> 08:48.080 a minimum speed stable and minimum speed, then the fact that metadata lives on the disk buffer is 08:48.080 --> 08:54.800 also important, losing the disk buffer, whatever implementation is, it becomes critical because of that, 08:56.400 --> 09:02.400 then there's no object affinity logic in CTAs of today, but this I mean that whenever there is a 09:02.400 --> 09:07.120 bunch of work to do, a bunch of files to write on tape, then a CTA currently doesn't 09:08.080 --> 09:13.520 have any logic around which files to write, it will just do some work, but for some files, 09:13.600 --> 09:18.640 it may make sense to live on the same tape because they have a highest, a higher chance of being 09:18.640 --> 09:26.160 restored together, so you want to mount only one tape and not more of them, also one currently 09:27.840 --> 09:31.680 present semantic on CTA is that the file is considered safe when it's fully on tape, 09:32.560 --> 09:37.600 and not if it's in between, so if you write on the buffer only, then it's not considered safe, 09:37.600 --> 09:44.640 yet, and the clients are able to ask the disk buffer if the file is on tape, or not, 09:45.200 --> 09:50.640 and also there's no modify files semantics or only delete, which may be relevant if you 09:50.640 --> 09:55.760 intend to interact with a street bucket, some operation that a street can do may not be doable 09:57.360 --> 10:02.640 as far as the disk is concerned, speaking of which the other half of the technical introduction that 10:02.640 --> 10:09.040 I want to do is about a street and glacier API, just really quickly, I think a lot of people are 10:09.040 --> 10:18.400 familiar with it, but I still may not be, so S3 is a product of AWS, which offers object storage, 10:18.400 --> 10:25.200 and you talk with using the S3 API, it's a rest interface, HTTP, and here on the left you can see 10:25.200 --> 10:30.720 an example called that you can do, so for example, the get object operational, let's you retrieve 10:30.720 --> 10:37.360 the content and metadata of a file, and you would specify the most important learning methods are 10:37.360 --> 10:44.640 bucket and key, which are relevant for the S3 service, so the bucket is a namespace, a domain of files, 10:44.640 --> 10:51.440 you can see as it has its own file system, and the key is the path within the file system, 10:53.200 --> 10:57.360 and so on the right you can see how it maps to an HTTP request, you can see that the bucket 10:57.360 --> 11:07.280 and the file key appear both in the path URL, and a get object operation will essentially be 11:07.280 --> 11:15.600 a get HTTP verb, and this is the general idea behind the S3, a rest interface, and the kind of 11:15.600 --> 11:22.960 operation you can do is on object level, write, read, delete objects, on bucket level, among which, 11:23.120 --> 11:27.440 so you can of course create a numeric buckets, but there's also the lifecycle configuration, which is 11:27.440 --> 11:33.840 really important, and I previously talked touched upon it, but extensively, so you can configure the 11:33.840 --> 11:40.800 bucket for automated movement between storage classes, which are abstractions of which kind of storage 11:40.800 --> 11:48.800 medium will hold your data, then you have metadata and other operations, but one point that I want 11:49.120 --> 11:56.080 that is very relevant for this talk is the Glacier API subset of S3, so you can see that 11:56.080 --> 12:03.680 it lets you do the archival and recall of the files, just what CTA models, but using the S3 API, 12:03.680 --> 12:11.280 so regarding the archival operation, it's actually not an imperative operation, so you can 12:11.280 --> 12:21.200 not ask the S3 API to move a file to call storage unless you copy the whole object, what is 12:21.200 --> 12:26.640 usually done is that you set a lifecycle policy in the bucket, so you're able to tell for example, 12:26.640 --> 12:32.560 after one year, move the file to tape, and it gets moved to a different storage class, and as soon 12:32.560 --> 12:37.520 as this happens, the object gets truncated, and then you get an invalid object state reply if you 12:37.520 --> 12:42.640 try to get the content, because the file is offline, it's not there anymore, to get it on online 12:42.640 --> 12:48.800 again, you should do a recolour operation, this is an imperative, so you request a restore object, 12:48.800 --> 12:55.440 and the system will do in the background some work to get your file from tape back into the bucket, 12:55.440 --> 13:01.040 and rehydrate it, and after the operation is completed, then the file will be accessible again. 13:01.120 --> 13:09.440 Most important point, as Glacier is a user of facing API, it doesn't specify any kind of interface 13:09.440 --> 13:15.440 to the tape infrastructure, you have no way to tell how to transfer file files to whatever 13:15.440 --> 13:21.440 take system that you use, because it's all an internal detail, so the way that it will be 13:21.440 --> 13:28.480 doable is the implementation specific, and different S3 implementation among the open source 13:28.480 --> 13:35.040 available ones offer a different degree of both compatibility with the S3 and mechanism to 13:35.040 --> 13:41.920 let you move files to tape, so with that intro, we can finally take a look at the proof of concept 13:41.920 --> 13:47.520 that was worked on the last summer, so the architecture that was picked was to take the current stack, 13:47.520 --> 13:55.120 so this buffer in front of CTA, and then another layer, in this case, Nuva was picked, it's also an 13:55.120 --> 14:01.840 open source software with commercial production usage, it provides an S3 interface and a way to write 14:01.840 --> 14:09.040 files to tape, and the whole of all of these three components for the appliance in the 14:09.680 --> 14:17.200 schema that I showed you before, so Nuva is deployed on Kubernetes cluster, there's an operator, 14:17.200 --> 14:23.360 so you deploy the custom resource, and you automatically will get an S3 endpoint in your cluster, 14:25.520 --> 14:32.560 so in stores internally, the data on this, what's called NSFS, it's on namespace file system, 14:34.080 --> 14:38.240 and which is essentially a directory, so it maps objects to file system paths, 14:39.440 --> 14:48.400 and you can see how in fact in the POC we deployed Nuva using a local file system, so you mount 14:48.400 --> 14:53.280 that director within the pod, and you declare that there will be your storage, and so it really 14:53.280 --> 15:00.000 maps to files, and the way that you can write to tape is by using the take cloud interface that they 15:00.000 --> 15:08.560 offer, so in which we can dive a bit, so how does it work in this case, so it's implemented storage 15:08.560 --> 15:15.120 classes as AWS does, so there's the warm storage class and there's glacier, whenever you write 15:15.120 --> 15:20.400 to glacier, be it because you do it explicitly or because a lifecycle rule does it for you, 15:20.800 --> 15:28.000 nothing happens actually, the request gets appended to a log, called migrate log, the same is true 15:28.000 --> 15:36.160 for retrieval, whenever you call a restore object action, then your operation is written down to the 15:36.160 --> 15:42.400 recall log, and these are all asynchronous, so what we'll actually do the operation is whenever 15:42.400 --> 15:48.960 a Crohn drop or an operator will call this managed NSFS script, which we'll actually do some 15:48.960 --> 15:54.080 a log notation and managing locking, and actually delegate the most important part of the 15:54.080 --> 15:58.800 world, so actually moving that up to tape to another script, which you write, so there's an 15:58.800 --> 16:05.200 next interface to it, you can write the migrate and a recall script, and basically the log file will 16:05.200 --> 16:10.240 be passed on to you, and that log file essentially contains the files under the NSFS, so the name 16:10.240 --> 16:16.000 of the files that should be moved to tape, so just move down one by one, and that's up to you 16:16.000 --> 16:23.760 after your implementation, so you can do it however you'd like, so we tested it out, and what 16:23.760 --> 16:30.960 did we observe from this, so in interface is indeed very flexible because it leaves all of the 16:30.960 --> 16:36.320 heavy load to you, it's also smart enough to not recall files, for example, if the disk buffer is 16:36.320 --> 16:41.120 full, so because it's close to full, then you shouldn't put more stuff into the disk buffer, 16:41.120 --> 16:45.280 and the user will have to wait, basically, so the user will do a restore operation, but we'll 16:45.280 --> 16:53.280 have to wait for it to happen, because the infrastructure is not ready. I think that documentation 16:53.280 --> 16:58.800 could use some improvement, in this case, for example, we found some updated documentation 16:58.800 --> 17:04.880 on the, about the log format, and also some things learned unclear, like the boundary of failure 17:04.880 --> 17:12.960 handling, when the responsibility, when something fails during the script execution, then some 17:13.120 --> 17:18.320 features are missing, as far as I could understand, the storage class flight cycle does not, 17:19.120 --> 17:25.840 is not doable, so you can just write directly to Glacier, and also we observe a failure at 17:25.840 --> 17:30.640 around 10K, it's even thing is migration, but honestly this requires investigation, I didn't 17:30.640 --> 17:35.040 have the time to look into it, but this could very well be to our implementation of the 17:35.040 --> 17:44.480 migration script, because it was done in batch, and it's a QC. So a few more words about the architecture, 17:44.480 --> 17:48.800 the duplication of the disk buffer is something that we have implemented, basically, here, 17:48.800 --> 17:54.640 there's Nuva, and there's US, each has their metadata copy, which is not ideal, it could 17:54.640 --> 18:01.200 introduce its synchronization problems, and also duplication of the provision space is also not ideal. 18:02.160 --> 18:07.200 I call this the one more layer approach, which made troubleshooting really painful, because there's a chain 18:07.200 --> 18:15.200 of heterogeneous technologies at work, one after the other in sequence, also the file content 18:15.200 --> 18:23.680 migration being initiated by Nuva, Nuva is not ideal, because again I said tape is a, so it doesn't 18:23.680 --> 18:29.360 fit the CTA model, because CTA basically will, is a long running process that will wait for the 18:29.360 --> 18:36.160 drive to be ready to do work. So if you run a script and you delegate the data movement to 18:36.160 --> 18:41.520 this script, it doesn't fit the CTA model, because CTA expect the object to be fully there for it, 18:41.520 --> 18:47.520 so if you want to link Nuva with CTA, you need to buy a disk buffer in the middle to hold the file, 18:49.120 --> 18:55.520 or a long running implementation, but let's not go there, and the last point which may be 18:55.520 --> 19:01.520 very specific to certain, but there's an abundance of, of sex, sex expertise, because there's a lot of 19:02.720 --> 19:09.600 production users, users, usage, pardon, of Seth, and so we also wanted to take a look at 19:10.160 --> 19:18.240 its implementation of a rather skateway as well before going on. So with that premise, let's go over a few 19:18.240 --> 19:26.640 solutions, but first, one last premise, there's one important feature of Seth that I should 19:26.640 --> 19:33.280 mention, which is a, so a zika project, also called, software abstraction layer. So Seth 19:33.280 --> 19:40.800 has internally split the way that it entered the S3 API from the way, from the storage driver, 19:40.800 --> 19:47.440 so you could theoretically implement a driver to write somewhere else other than rados, for example, 19:47.520 --> 19:53.920 a positify system, and you can also write filters. One example of which is the wasp lifting, 19:53.920 --> 20:00.640 also mentioned during the previous, the previous talk, and the other important feature is the 20:00.640 --> 20:07.760 cloud transition and cloud restore feature, in which you can declare a storage class, a cold storage 20:07.760 --> 20:12.960 class, which actually maps to another bucket, instead of an actual storage, I mean cold storage 20:13.040 --> 20:20.240 medium. So in this case, you can chain buckets and make them colder and colder with this feature. 20:20.800 --> 20:27.600 And I mentioned this because, again, in the schema of the mental model that I worked with, 20:27.600 --> 20:32.800 I distinguished the appliance, which has the actual tape, I mean the responsibility of moving data 20:32.800 --> 20:39.200 to tape from anything else that comes before it, which may very well be another bucket, thanks to 20:39.200 --> 20:44.640 this feature. And there are a few benefits around it. There is isolation and control. Like, 20:44.640 --> 20:50.080 for example, you isolate your appliance from, you can control the bandwidth between the buckets. 20:50.080 --> 20:56.800 So because the appliance needs to output a stable and a stable throughput to the drives, for example, 20:56.800 --> 21:03.360 and other advantages. So that said, our concern here, what I will show you from 21:03.360 --> 21:09.680 in the solution space is about the appliance. So I grew them into the solution families because 21:09.680 --> 21:14.720 there's a few ways how you could do it is. These are just examples, but first approach, one more layer, 21:14.720 --> 21:20.880 what we did with the proof of concept, we saw the pros and cons of this. So, I mean, it's already 21:20.880 --> 21:27.040 working system. So US and CTA are one after the other, and US can already output the throughput that 21:27.120 --> 21:34.480 CTA needs. So you have a layer on top, it's easy to do. But again, you need to, I mean, there's 21:34.480 --> 21:38.960 the metadata duplication, you need to maintain your own storage drive, so you need to maintain 21:38.960 --> 21:44.160 the interface between the two, and it's painful to debug because it's a heterogeneous. 21:46.560 --> 21:51.360 So let's take a look perhaps what I call a solution, one look five, one more thin layer. 21:51.680 --> 21:57.680 Another way to do it would have a thin, uh, a three protocol translator. So since, uh, 21:57.680 --> 22:02.880 a structured, uh, basically reproduced the, um, semantics of US, uh, 22:02.880 --> 22:07.600 based on your port objects, red object, restore objects. So that those are all, uh, things that 22:07.600 --> 22:13.920 US does. So, uh, and in fact, one colleague of mine has, uh, created a project called US 22:13.920 --> 22:19.520 a three, which is a module for diversity gateway, which does this exact translation. So this is 22:19.520 --> 22:25.440 another way to do it. Uh, you save yourself the metadata duplication because there's a direct 22:25.440 --> 22:31.440 translation. So US has a, is the only owner of the metadata. Uh, but you have to care about, uh, 22:31.440 --> 22:38.080 for emulation of a STIP, API for example, it's on you. Uh, another way you could do it is, 22:38.080 --> 22:43.840 uh, what I call client driven emulated glacier in which, uh, your reverse the relationships. So there's 22:43.920 --> 22:50.800 CTA. So, uh, so as three being as far as technology, the question comes, uh, naturally, why not use 22:50.800 --> 22:56.320 that as a this buffer? Maybe CTA could be a client for it. I mean, the elephant in the room on this 22:56.320 --> 23:02.080 question is the fact that the S3, uh, API, a glacier in particular does not provide you any mechanism 23:02.080 --> 23:08.800 to work, to emulate, uh, migration to cold storage. So you cannot really, uh, ask the project, 23:08.800 --> 23:14.160 the, uh, the file to go flying, for example, like it's done usually in the back of, uh, of an 23:14.160 --> 23:20.160 S3 service. But you could emulate it. One, uh, where you could do it is, uh, for example, CTA can 23:20.160 --> 23:26.400 write, or object metadata, uh, so you could mark an object as a flying, for example, through a tag. 23:26.400 --> 23:31.840 And the lower scripts, uh, whenever it finds the tag as offline, it will reply in expected way. 23:31.840 --> 23:35.840 It will emulate glacier by saying, you know, invalid, invalid object state, these objects 23:35.840 --> 23:40.160 are flying. And then CTA is free to work with the objects. That's one other way to do it. 23:42.560 --> 23:49.040 Another way would, would be out of band data control. So we said that S3, for example, doesn't provide 23:49.040 --> 23:56.080 you a way, um, the S3 API, doesn't provide you a way to, uh, basically move the object to cold storage 23:56.080 --> 24:02.160 as a client. But perhaps there are internal ways to do this, like out of band ways, uh, one way 24:02.160 --> 24:06.160 could be to access liberado subjects directly to truncate and re-hydrate objects. 24:06.880 --> 24:11.760 Another way could be to implement either non-standard S3 API or some tooling, for example, 24:11.760 --> 24:17.760 through, uh, some other medium. These are just theoretical, uh, ideas. Well, uh, none of this is 24:17.760 --> 24:23.840 implemented, uh, but this could be other ways in which this could be done. And of course, 24:23.840 --> 24:28.160 pros and cons are very much implementation dependent. For example, the risk, if you access 24:28.240 --> 24:32.640 liberado's object, the risk of exposing internal details, which are not guaranteed. I mean, 24:32.640 --> 24:38.080 there are, in the documentation, but, uh, I think, uh, the right to change them. I mean, 24:38.080 --> 24:43.040 there, there's nothing set in stone there. They could change in a new, uh, release, perhaps. 24:44.320 --> 24:48.640 And yeah, in the end, there are a few solutions, which I won't spend, uh, much 24:48.640 --> 24:55.360 words about. I included them for completeness. Uh, one of them is CTA, implement the S3 API. 24:55.360 --> 25:01.360 I think there are enough S3 API providers, which are open source and, uh, in addition to being, 25:01.360 --> 25:07.680 uh, reinventing the wheel, uh, it also introduces a new functional scope for CTA, which is this 25:07.680 --> 25:14.000 buffers. CTA is a part of a full solution. It doesn't necessarily need to be the full, uh, 25:14.000 --> 25:20.560 appliance here, but it can be the, uh, the back end of an appliance. And so it would introduce, uh, 25:20.560 --> 25:25.920 there's a scope creep here. And same, if you wanted to theoretically create, um, like, bring the 25:25.920 --> 25:31.280 disk buffer to CTA, but then, uh, create this standard interface to integrate with whatever, whatever 25:31.280 --> 25:37.120 solution. Uh, uh, but here, again, new functional scope for CTA, new problems with, uh, are not 25:37.120 --> 25:43.680 considering that at the moment. So the next step for us would be to test out, uh, some of these solutions 25:43.680 --> 25:50.080 and see, uh, what works best. Uh, but I'm also interested if, uh, any of you has implemented this already 25:50.240 --> 25:56.960 to hear about your thoughts, your pains and, uh, uh, your joys with your approach. Uh, so I'm 25:56.960 --> 26:03.840 available, uh, for the rest of, uh, foster, uh, so if you want to talk, I'd be happy to, uh, just 26:03.840 --> 26:09.120 quickly acknowledging some of my colleagues, uh, Michael Vladimir and Niels for providing some, I mean, 26:09.120 --> 26:14.240 technical feedback and, uh, graphs as well as the Seth website and mermaid for the graphics. 26:15.120 --> 26:18.880 And that would be all. 26:26.880 --> 26:37.120 And, uh, I, uh, so it is an alternative to the tape rest, uh, API, uh, interfaces, uh, and 26:37.120 --> 26:42.800 several questions. There is something, some checks, uh, on the checks, some of the files, 26:42.800 --> 26:48.080 when there is a retrieval, uh, from the retrieval, the data and the checks, I'm saying, 26:48.080 --> 26:57.120 on the metadata, uh, of the else. So, uh, so, uh, first question, uh, tape rest API is in 26:57.120 --> 27:03.120 enough. Uh, sure. So the question, the first question is, uh, uh, uh, this is an alternative 27:03.120 --> 27:06.560 to the tape rest API. So this is about Seth, no? 27:06.560 --> 27:10.560 Oh, yeah, about, yeah. Is there a tape rest API? 27:10.560 --> 27:18.640 It pressed the API, you know, the API protocol, the API used by WSIG for, uh, data, uh, for, uh, 27:18.640 --> 27:26.880 for a drive, uh, data from data, uh, varsity gate for human, um, there is a, uh, WSIG standard 27:27.040 --> 27:35.120 of the tape rest API to recall find from a, uh, okay. Okay. No, this is more, mostly about, 27:35.120 --> 27:41.120 so that's, uh, basically, so S3 doesn't provide an interface, will Seth implement this interface. 27:41.120 --> 27:46.240 So I don't know, I, I, I, I, I'm not part of the Seth process, uh, uh, uh, uh, project, but that 27:46.240 --> 27:52.160 will be an interesting, uh, thing. So, uh, I, actually, I wasn't aware of this, uh, so thank you for that. 27:52.160 --> 27:56.640 And the second question was about the checks summing, checking, when never a cloud object is 27:56.720 --> 28:01.280 retrieved, right? So the object cannot change, uh, in that case, is what you mean. 28:01.280 --> 28:08.160 So if the object changes on the second bucket, uh, then the first bucket will return an error. 28:08.960 --> 28:14.880 Is this what you mean? Or if there is an error, when, uh, recalling the child from tape to 28:14.880 --> 28:20.400 this cup, there is some checks about the test that they recall is done correctly, and all that 28:20.480 --> 28:28.000 I, uh, read correctly from tape for, or assume that the reading is always successful from 28:28.000 --> 28:35.120 tape to this. Ah, no, I, as far as I remember, so I'm not a CTA developer, but I think that CTA does it, 28:36.160 --> 28:36.800 on its own. 28:36.800 --> 28:38.000 Thank you. 28:38.000 --> 28:38.800 Thank you. 28:38.800 --> 28:43.200 All what is it said on tape, when you work on people, as we already did. 28:43.200 --> 28:48.480 So CTA is the part, both, uh, sorry, um, so repeat the question. So does, uh, 28:49.120 --> 28:53.040 um, there's only one copy of the file on tape, or there are more than one. 28:53.040 --> 28:57.920 So CTA allows you to configure that. You can configure CTA storage classes, which, uh, 28:57.920 --> 29:00.320 define how many copies of the data do, would you like? 29:00.960 --> 29:20.960 Okay, so, uh, question is, if you were to retrieve a tape from a library, would you be able to take a 29:20.960 --> 29:23.600 look at tape and figure out which files are lost? 29:23.760 --> 29:28.400 Would you be able to work out from EOS, which files are on this type? 29:29.760 --> 29:31.680 Ah, from EOS, which file are on this tape? 29:31.680 --> 29:38.960 So this information is held, actually yes. So, um, basically, the, uh, so the, the, the way that it 29:38.960 --> 29:44.640 works is a, but you want to need CTA. This is the thing. So EOS by itself doesn't, because the way that 29:44.640 --> 29:50.400 CTA and the US, uh, basically agree on where a file is. So how can you track a file? 29:50.480 --> 29:58.480 Basically, um, so on EOS, there is the concept of file path, right? And on CTA, there is no such concept. 29:59.200 --> 30:04.000 Uh, so how does it happen? So CTA, whenever there is a file, we'll also try to figure out if, 30:04.000 --> 30:10.080 if that file is already on tape. And how does it do that? It looks at, um, extended attributes of the object. 30:10.080 --> 30:15.200 So, uh, if it's not there, uh, if there's no extended attribute with the archive ID, 30:15.200 --> 30:20.720 then, uh, it will write, create a new one for it, and identify or internally to CTA. 30:20.720 --> 30:26.960 It will write it as extended attributes. And from them on, that file on EOS would be trackable on tape. 30:26.960 --> 30:32.880 But only through CTA. So CTA tracks where the archives ID are on which tape and where. 30:32.880 --> 30:38.560 And EOS only cares about knowing the file path and the archive ID. So this is, uh, the, the chain. 30:41.760 --> 30:45.040 Okay, so our time's up. If you have a question, please meet me later. 30:45.200 --> 30:47.200 Thank you.