WEBVTT 00:00.000 --> 00:11.640 Great, so everyone, my name is Lisa and I'm the writer, currently working at JetBrains 00:11.640 --> 00:19.800 where I create a commutation for IntelliG idea, so while my current role doesn't involve 00:19.800 --> 00:26.080 any APIs at all, I've spent plenty of time writing the commutation for internal and external 00:26.080 --> 00:33.520 APIs, and let me tell you, getting the feedback to improve all those dogs is one of the 00:33.520 --> 00:40.040 biggest challenges in my career, like for all the time, and this struggle, even though 00:40.040 --> 00:46.920 now it's in the past, let me to explore new ways to tackle this problem, so with our 00:46.920 --> 00:52.760 advancing at liking speed, I started to wonder, could that be the solution? 00:53.000 --> 00:59.800 I believe there is still plenty of hanging, low-hanging fruits there, and so today I'm 00:59.800 --> 01:07.080 excited to share a proof of concept that uses AI-driven user simulation to test identify 01:07.080 --> 01:14.360 gaps in API documentation, to test that, that way, and so that's not about replacing humans, 01:14.360 --> 01:25.120 it's about giving us the tools to do more effectively what we already do best, so when 01:25.120 --> 01:30.480 I've said that there are still a lot of low-hanging fruits, at least some of you 01:30.480 --> 01:39.760 thought, come on, AI is already everywhere, but then it's needed or not, and years, in 01:39.760 --> 01:49.000 this time of exploration, we have so many ideas about how to use AI in the commutation, 01:49.000 --> 01:55.120 let's think of a few popular examples, you've seen or heard or talked about, it can 01:55.120 --> 02:01.520 be improved search, intelligent code snippets, chartboards, write-off in your documentation, 02:01.600 --> 02:08.960 summarization, highlighting, automated trend logs, oh good, all that sounds really great, so 02:08.960 --> 02:15.960 many things that can make our life easier, and our readers' lives easier, but there's one 02:15.960 --> 02:25.280 but to perform all these mentioned before, all these tasks, AI needs to have enough context, 02:25.280 --> 02:30.480 enough information, so otherwise it's not going to give you needed answers or it's going 02:30.480 --> 02:37.680 to boost it, so you want to get what you want, and who's providing that information, 02:37.680 --> 02:45.600 a mere human being, so we'll return back to the goal dot old practice of gathering feedback 02:45.600 --> 02:51.200 from stakeholders, from users, and planning well structured documentation that incorporates 02:51.200 --> 02:59.520 that data, so that again, as I've said in the beginning, regardless of the company size 02:59.520 --> 03:06.000 or the quantitation process maturity, select an actionable feedback on documentation, 03:06.000 --> 03:15.120 it has consistently been a bottleneck, all the processes in my system experience, and this 03:15.120 --> 03:22.880 issue is universal, latch organizations, startup, open source project, it's everywhere, 03:23.760 --> 03:31.840 so that's why does that happen? There are several key reasons, let's say it's time constraints, 03:31.840 --> 03:36.560 first of all, for stakeholders that just approve your pull requests with like, okay, let's 03:36.640 --> 03:43.920 good, looks good to me, they assume that if they have any problems later, they can rely on more 03:43.920 --> 03:51.840 experienced, do members, for example, so they don't have enough time to review your documentation, 03:52.960 --> 03:59.840 in case of users, they just need a solution right away, if it's not in the good documentation, 04:00.480 --> 04:06.160 they're going to find some work around, they're going to go to other sources, like stakeholder 04:06.160 --> 04:14.640 flow, or anything else that can help them to solve their problem right now, so let's think like, 04:14.640 --> 04:23.440 as a feedback is a kind of last resort, sadly, but it's an understandable choice, if you can get 04:23.440 --> 04:31.200 your solution faster, okay, and expertise gaps, here is an interesting thing, it wants both 04:31.200 --> 04:37.120 ways, so you can be two experience, you can be not experienced at all with experienced users, 04:38.640 --> 04:43.360 so they just can feel some gaps in the documentation by their own knowledge, 04:44.320 --> 04:55.920 so and they want to, even notice that something's lacking here, and so they think that all 04:55.920 --> 05:03.120 these details are so basic and they're not needed to anyone, not only for them, with beginners, 05:03.920 --> 05:10.880 it's all the way around, they assume that, okay, maybe it's not a bad documentation, it's not 05:11.840 --> 05:18.880 in complete documentation, but they doubt themselves, and instead of recognizing gap in the 05:18.880 --> 05:23.280 documentation, they just think, okay, they internalize that, this problem, the same, okay, 05:23.280 --> 05:30.560 I should have known that, I should have learned it, and that's my hold, so and all these barriers 05:32.000 --> 05:38.720 when you're in more explicit and open source projects, so because, okay, high quality documentation 05:39.120 --> 05:48.960 is crucial for open source adoption and usability, and yep, contributors have varying 05:48.960 --> 05:55.200 many hats, they are developer testers, writers all at the same time, so they're definitely 05:55.200 --> 06:06.560 very experienced in that topic, and it's hard to change these perspectives, it's hard to change 06:06.560 --> 06:15.200 these roles, it's hard to check your undercommentation, or it's hard to think of beginners, 06:15.200 --> 06:25.600 because you are not a beginner for a long time already, community feedback, well, it's definitely 06:25.600 --> 06:33.920 not that easy and especially on early stage projects, because many users has a take to give 06:33.920 --> 06:39.360 your anything back to waste their time, if they are not a deeply invested in that project, 06:40.000 --> 06:45.840 and well, to make them invested, you need to have good documentation, again, to adopt 06:47.040 --> 06:58.640 your project, so they can adopt your project and become deeply invested, so yeah, the 06:58.640 --> 07:08.560 documentation is crucial, without feedback, it's not possible to feel the gaps, it's not 07:08.560 --> 07:18.080 possible to find them all by you, or at least it's very, very difficult, so the idea to address 07:18.080 --> 07:27.280 this problem with AI driven user simulation, so this approach allows us to test the 07:27.360 --> 07:33.600 documentation for gaps from multiple perspectives, so it can be a beginner, it can be more experienced 07:33.600 --> 07:42.400 user, it can be some scenario, let's say I want to integrate this in this, and it's going to be 07:42.400 --> 07:49.040 more specific questions, and of course, that reduces the cost, on the right of themselves, 07:49.280 --> 08:01.200 take holders and sometimes that's the same person, so yeah, so I think the choice of this 08:01.200 --> 08:11.200 prompting technique, the user simulation is quite obvious here, it's because as I said already 08:11.280 --> 08:22.240 context is very wrong specific, you can use scenarios and bias is, well, I cannot say with 08:22.240 --> 08:32.080 LLM since they try to, especially if they play the role, it's like completely reduced, 08:32.160 --> 08:42.000 but eliminated, but it's reduced, let's say, workflow is like super easy, input your 08:42.000 --> 08:50.640 open identification, you extract the needed information, you generate questions, cross-check 08:50.640 --> 08:58.800 this questions against documentation, and you get the list of questions, so link is available 08:58.800 --> 09:06.960 also on that talk page, if you, because I don't want to stay this slide for longer, 09:07.840 --> 09:17.680 so domain magic under the hood, you prompt, so this prompt is really basic, what I want to start with, 09:17.680 --> 09:26.960 of course it can be more specific for your situation, for your project, so just this simple one 09:26.960 --> 09:34.960 actor is a developer who uses the APR documentation, one to integrate, our objectives is to 09:34.960 --> 09:41.280 analyze with open APR specification, make the list of potential questions, and cross-check, 09:43.280 --> 09:50.640 my interesting learning here is that for LLM's best standard open APR specification is still 09:50.720 --> 10:01.760 version 2, so if you're not indicated, your specification is open API free drop something, 10:02.320 --> 10:09.920 it's going to check you against version 2, and there's the list of questions, gaps, 10:11.040 --> 10:19.920 okay, it's not like in the version 2, some limitations, so that was not here initially, 10:20.560 --> 10:26.640 but I've got a couple of kind of philosophical questions, we'll return to it a bit later, 10:27.440 --> 10:34.320 and I wanted to have cleaner output, more actionable questions, not something random, 10:35.360 --> 10:40.800 and there are several ways to do that, to achieve that cleaner output, one of that is a few prompt 10:41.120 --> 10:50.160 technique, but I found out it's a few short prompt in technique, but I decided not to implement 10:50.160 --> 10:58.320 that way for this proof of concept, as a result, an improvement was not significant compared to 10:58.320 --> 11:03.600 the effort required, so because I can just write one more sentence here, I get it already, 11:04.080 --> 11:13.200 not 100% cleaner, but a lot cleaner than before, and I decided that it's like good result for 11:13.200 --> 11:18.640 proof of concept, and the last thing is, okay, how do you want to get your output? 11:22.080 --> 11:24.640 So yeah, I wanted it to have 11:24.880 --> 11:37.200 this way, it can, what I actually did it like about this output, and there were a lot of repeated 11:37.200 --> 11:47.440 questions for this different methods, for example, about missing mirror messages and error codes, 11:48.400 --> 11:54.400 so maybe they can be grouped differently, but I like it to go method or the method, 11:58.080 --> 12:10.720 so in the beginning, I'd just try to test it with unspecification, with all these intentional 12:10.720 --> 12:18.400 gaps, this was a piece of documentation, I created a while ago for internal use, so we can 12:18.400 --> 12:26.720 rule out situation that it was used in LLM learning, no, that just deleted some parts of 12:26.720 --> 12:35.120 that specification, all these gaps were found, and many more to be honest, but for a 12:35.120 --> 12:43.600 sake of a good example, I'll I've taken pet store API, which I'm sure all of you have seen at 12:43.600 --> 12:49.360 least once, and some of you may be even memorized, they're already, because yeah, if you work with 12:49.360 --> 12:59.680 APIs, it's a starting point, so the fun part, the first set of questions was full of philosophical 12:59.760 --> 13:10.720 ones, it was possible to delete an entity that doesn't exist, yeah, so on the one hand, such 13:10.720 --> 13:21.200 question highlights, the need to refine the output to get a cleaner, but on the other hand, 13:21.200 --> 13:28.400 when I shared this result, with it like to develop as I know, that it led to useful discussion 13:28.400 --> 13:36.560 about error handling and HK's clarity, so for example, do you need so specific error messages 13:36.560 --> 13:45.360 with here and there, so it can be somewhat useful, but yeah, I think you don't want to have all 13:45.360 --> 13:55.520 that questions like all the time, so and what about good questions, so how do you handle 13:55.520 --> 14:04.160 duplicate pets, duplicate it, pets, so same name and attributes, can the ideas be reused 14:07.040 --> 14:16.640 and the handling of bulkhead on your text existence is a huge question actually, 14:17.520 --> 14:30.240 and actually in the past so API, we'll lack of so many error codes and there are messages 14:30.240 --> 14:39.200 usually there are only one error code per endpoint, which is well, not enough for good API 14:39.200 --> 14:57.520 documentation, let's say. So yeah, we can work with output, more, but maybe play in a bit 14:57.520 --> 15:05.920 with philosophical questions also, sometimes, at least for one attempt, so obviously it can be 15:05.920 --> 15:12.880 customized by the way, so first things, yeah, you can use different user personas, it's in 15:12.880 --> 15:21.600 a different scenario, a LLM of your choice, and I would say it's not a big difference or 15:21.600 --> 15:29.040 well, when I was using different LLMs and you don't definitely don't need the latest version 15:29.360 --> 15:38.000 of huge LLMs, so for example for chatjipiti free dot 5 is good enough, you don't need 15:38.000 --> 15:50.400 go further. So it's a way to know those capabilities teams can forward enhance the effectiveness 15:50.400 --> 16:01.840 of user's relation to identify the communication gaps, and yeah, to sum up, so the tech world 16:01.840 --> 16:09.040 is of all wind, but the quality of the communications is like still a cornerstone for successful 16:09.040 --> 16:17.120 adoption, for successful usability, especially in the open source ecosystem, and AI is true potential 16:17.440 --> 16:27.440 in helping human efforts, not to place that. By using ad-driven user simulation, we can 16:27.440 --> 16:33.280 end-insify gaps, and then consistences in API documentation with remarkable efficiency, I would say, 16:34.320 --> 16:42.000 and this approach not only reduces the manual burden, but also brings a fresh and biased perspective, 16:42.000 --> 16:50.960 so yeah. So, and yeah, what I've shown you is just the beginning, a very raw 16:52.240 --> 17:01.840 and it's already, it can demonstrate how we can use AI as a partner in documentation testing 17:01.840 --> 17:10.320 process, but let it be just a starting point for your oral experiments and customizations. 17:11.200 --> 17:18.720 So, this method, like any other can flourish only when it makes a community collaboration, 17:19.440 --> 17:26.480 so yeah, my call is definitely to try it out, test your own API documentation, and share 17:26.480 --> 17:34.800 your insights, and yeah, let's refine this idea together, after all, every step to a better 17:34.800 --> 17:40.400 documentation is a step toward better software, so yeah, and that collaboration. 17:43.680 --> 17:50.320 So yeah, thank you, and I'll be happy to hear your thoughts and questions, if you have any... 18:05.360 --> 18:13.600 That's a question to you, so it's not a given question, it's a threisen question. 18:15.600 --> 18:21.280 Did you know, I mean, the talk to clearly, that's sort of a rendition is very small, 18:21.280 --> 18:27.040 but did you think about how to use these on our real cases? 18:28.000 --> 18:34.800 So, because sometimes the orientation is very big, so you're going to keep more or less costs 18:34.800 --> 18:41.760 and context problems, you need to search to a rack that sees them, do you ever go about how to 18:41.760 --> 18:49.920 do like this? Yeah, repeating the question, so have I thought or tried to use this 18:50.880 --> 18:58.880 on bigger documentation, whereas more context, so I haven't tried it, but since we extract 18:58.880 --> 19:14.800 needed information, we can make this flow a bit more... we can have more levels and just 19:15.440 --> 19:22.160 adding to the prompt only the needed context at the moment, so we need... we want to test 19:22.160 --> 19:28.560 only this part of the documentation, and here's all the relevant, already extracted information, 19:28.560 --> 19:36.640 yes, it should be possible. So yeah, it's a good thing, it's a good step forward. 19:36.640 --> 19:45.680 I have a little bit of heavy tried using this on the next structure documentation, 19:45.680 --> 19:52.320 but rather like now it's just gone. No, I haven't tried and I doubt that would work, 19:52.320 --> 20:00.000 so even now with APIs there is structured documentation, as you said, it creates this philosophical 20:00.000 --> 20:05.520 sometimes weird questions, and if it's like an element of possibilities of creating that 20:05.520 --> 20:14.080 questions, well, I think it would be like not clean at all, not actionable at all, so it's 20:14.080 --> 20:21.680 yeah, I believe a design for the structural documentation, yeah, here. 20:21.760 --> 20:28.320 And yeah, I'll just do a little bit more about how you enable the company to go down the 20:28.320 --> 20:35.040 prompt, so you have the role of the checkers and so on and play the documentation for. 20:35.040 --> 20:50.880 So let me get back to that slide, oh yeah, so I believe a kind of 20:52.640 --> 21:04.000 good. Because I'm a tech writer, I would have clear structure, and it's easier for me to, 21:04.000 --> 21:09.920 if it's not all messed up together, it's just easier to try, okay, now I change only that part 21:09.920 --> 21:19.520 and the check if I like results more. So all, since I've already selected, 21:20.480 --> 21:27.520 user simulation is the general idea, of course they're going to be real, but, okay, we said, 21:28.960 --> 21:36.240 who's doing what, but what's the objective? We're doing something to achieve something, 21:36.880 --> 21:43.200 and yeah, how restrictions and limitations appear here, I've already discussed and output requirements 21:43.280 --> 21:50.560 while it just makes it easier for us to work with output. 21:56.240 --> 21:58.560 Any more questions or? 21:58.560 --> 22:05.120 Okay, have you got any more roles already that you've tried out, or is it just a developer? 22:05.440 --> 22:13.760 So yeah, I've tried just leveraging this, like say, level of the developer, so if it's 22:13.760 --> 22:22.000 junior only learning how to do stuff, and if it's senior developer, and with junior it 22:22.960 --> 22:35.200 there were a lot more basic questions about, so I want to say, like, come on, if you're 22:35.200 --> 22:42.400 reading this documentation, why couldn't you google that before? That kind of basic questions, 22:42.400 --> 22:48.560 but at the same time, it's again just maybe my bias, because I already know it for a long time, 22:49.440 --> 22:56.880 and like yes, yes, this problem with expertise and both sides of expertise went low and high. 22:58.160 --> 23:01.920 And so senior questions, senior developer questions, well,