WEBVTT

00:00.000 --> 00:11.640
Great, so everyone, my name is Lisa and I'm the writer, currently working at JetBrains

00:11.640 --> 00:19.800
where I create a commutation for IntelliG idea, so while my current role doesn't involve

00:19.800 --> 00:26.080
any APIs at all, I've spent plenty of time writing the commutation for internal and external

00:26.080 --> 00:33.520
APIs, and let me tell you, getting the feedback to improve all those dogs is one of the

00:33.520 --> 00:40.040
biggest challenges in my career, like for all the time, and this struggle, even though

00:40.040 --> 00:46.920
now it's in the past, let me to explore new ways to tackle this problem, so with our

00:46.920 --> 00:52.760
advancing at liking speed, I started to wonder, could that be the solution?

00:53.000 --> 00:59.800
I believe there is still plenty of hanging, low-hanging fruits there, and so today I'm

00:59.800 --> 01:07.080
excited to share a proof of concept that uses AI-driven user simulation to test identify

01:07.080 --> 01:14.360
gaps in API documentation, to test that, that way, and so that's not about replacing humans,

01:14.360 --> 01:25.120
it's about giving us the tools to do more effectively what we already do best, so when

01:25.120 --> 01:30.480
I've said that there are still a lot of low-hanging fruits, at least some of you

01:30.480 --> 01:39.760
thought, come on, AI is already everywhere, but then it's needed or not, and years, in

01:39.760 --> 01:49.000
this time of exploration, we have so many ideas about how to use AI in the commutation,

01:49.000 --> 01:55.120
let's think of a few popular examples, you've seen or heard or talked about, it can

01:55.120 --> 02:01.520
be improved search, intelligent code snippets, chartboards, write-off in your documentation,

02:01.600 --> 02:08.960
summarization, highlighting, automated trend logs, oh good, all that sounds really great, so

02:08.960 --> 02:15.960
many things that can make our life easier, and our readers' lives easier, but there's one

02:15.960 --> 02:25.280
but to perform all these mentioned before, all these tasks, AI needs to have enough context,

02:25.280 --> 02:30.480
enough information, so otherwise it's not going to give you needed answers or it's going

02:30.480 --> 02:37.680
to boost it, so you want to get what you want, and who's providing that information,

02:37.680 --> 02:45.600
a mere human being, so we'll return back to the goal dot old practice of gathering feedback

02:45.600 --> 02:51.200
from stakeholders, from users, and planning well structured documentation that incorporates

02:51.200 --> 02:59.520
that data, so that again, as I've said in the beginning, regardless of the company size

02:59.520 --> 03:06.000
or the quantitation process maturity, select an actionable feedback on documentation,

03:06.000 --> 03:15.120
it has consistently been a bottleneck, all the processes in my system experience, and this

03:15.120 --> 03:22.880
issue is universal, latch organizations, startup, open source project, it's everywhere,

03:23.760 --> 03:31.840
so that's why does that happen? There are several key reasons, let's say it's time constraints,

03:31.840 --> 03:36.560
first of all, for stakeholders that just approve your pull requests with like, okay, let's

03:36.640 --> 03:43.920
good, looks good to me, they assume that if they have any problems later, they can rely on more

03:43.920 --> 03:51.840
experienced, do members, for example, so they don't have enough time to review your documentation,

03:52.960 --> 03:59.840
in case of users, they just need a solution right away, if it's not in the good documentation,

04:00.480 --> 04:06.160
they're going to find some work around, they're going to go to other sources, like stakeholder

04:06.160 --> 04:14.640
flow, or anything else that can help them to solve their problem right now, so let's think like,

04:14.640 --> 04:23.440
as a feedback is a kind of last resort, sadly, but it's an understandable choice, if you can get

04:23.440 --> 04:31.200
your solution faster, okay, and expertise gaps, here is an interesting thing, it wants both

04:31.200 --> 04:37.120
ways, so you can be two experience, you can be not experienced at all with experienced users,

04:38.640 --> 04:43.360
so they just can feel some gaps in the documentation by their own knowledge,

04:44.320 --> 04:55.920
so and they want to, even notice that something's lacking here, and so they think that all

04:55.920 --> 05:03.120
these details are so basic and they're not needed to anyone, not only for them, with beginners,

05:03.920 --> 05:10.880
it's all the way around, they assume that, okay, maybe it's not a bad documentation, it's not

05:11.840 --> 05:18.880
in complete documentation, but they doubt themselves, and instead of recognizing gap in the

05:18.880 --> 05:23.280
documentation, they just think, okay, they internalize that, this problem, the same, okay,

05:23.280 --> 05:30.560
I should have known that, I should have learned it, and that's my hold, so and all these barriers

05:32.000 --> 05:38.720
when you're in more explicit and open source projects, so because, okay, high quality documentation

05:39.120 --> 05:48.960
is crucial for open source adoption and usability, and yep, contributors have varying

05:48.960 --> 05:55.200
many hats, they are developer testers, writers all at the same time, so they're definitely

05:55.200 --> 06:06.560
very experienced in that topic, and it's hard to change these perspectives, it's hard to change

06:06.560 --> 06:15.200
these roles, it's hard to check your undercommentation, or it's hard to think of beginners,

06:15.200 --> 06:25.600
because you are not a beginner for a long time already, community feedback, well, it's definitely

06:25.600 --> 06:33.920
not that easy and especially on early stage projects, because many users has a take to give

06:33.920 --> 06:39.360
your anything back to waste their time, if they are not a deeply invested in that project,

06:40.000 --> 06:45.840
and well, to make them invested, you need to have good documentation, again, to adopt

06:47.040 --> 06:58.640
your project, so they can adopt your project and become deeply invested, so yeah, the

06:58.640 --> 07:08.560
documentation is crucial, without feedback, it's not possible to feel the gaps, it's not

07:08.560 --> 07:18.080
possible to find them all by you, or at least it's very, very difficult, so the idea to address

07:18.080 --> 07:27.280
this problem with AI driven user simulation, so this approach allows us to test the

07:27.360 --> 07:33.600
documentation for gaps from multiple perspectives, so it can be a beginner, it can be more experienced

07:33.600 --> 07:42.400
user, it can be some scenario, let's say I want to integrate this in this, and it's going to be

07:42.400 --> 07:49.040
more specific questions, and of course, that reduces the cost, on the right of themselves,

07:49.280 --> 08:01.200
take holders and sometimes that's the same person, so yeah, so I think the choice of this

08:01.200 --> 08:11.200
prompting technique, the user simulation is quite obvious here, it's because as I said already

08:11.280 --> 08:22.240
context is very wrong specific, you can use scenarios and bias is, well, I cannot say with

08:22.240 --> 08:32.080
LLM since they try to, especially if they play the role, it's like completely reduced,

08:32.160 --> 08:42.000
but eliminated, but it's reduced, let's say, workflow is like super easy, input your

08:42.000 --> 08:50.640
open identification, you extract the needed information, you generate questions, cross-check

08:50.640 --> 08:58.800
this questions against documentation, and you get the list of questions, so link is available

08:58.800 --> 09:06.960
also on that talk page, if you, because I don't want to stay this slide for longer,

09:07.840 --> 09:17.680
so domain magic under the hood, you prompt, so this prompt is really basic, what I want to start with,

09:17.680 --> 09:26.960
of course it can be more specific for your situation, for your project, so just this simple one

09:26.960 --> 09:34.960
actor is a developer who uses the APR documentation, one to integrate, our objectives is to

09:34.960 --> 09:41.280
analyze with open APR specification, make the list of potential questions, and cross-check,

09:43.280 --> 09:50.640
my interesting learning here is that for LLM's best standard open APR specification is still

09:50.720 --> 10:01.760
version 2, so if you're not indicated, your specification is open API free drop something,

10:02.320 --> 10:09.920
it's going to check you against version 2, and there's the list of questions, gaps,

10:11.040 --> 10:19.920
okay, it's not like in the version 2, some limitations, so that was not here initially,

10:20.560 --> 10:26.640
but I've got a couple of kind of philosophical questions, we'll return to it a bit later,

10:27.440 --> 10:34.320
and I wanted to have cleaner output, more actionable questions, not something random,

10:35.360 --> 10:40.800
and there are several ways to do that, to achieve that cleaner output, one of that is a few prompt

10:41.120 --> 10:50.160
technique, but I found out it's a few short prompt in technique, but I decided not to implement

10:50.160 --> 10:58.320
that way for this proof of concept, as a result, an improvement was not significant compared to

10:58.320 --> 11:03.600
the effort required, so because I can just write one more sentence here, I get it already,

11:04.080 --> 11:13.200
not 100% cleaner, but a lot cleaner than before, and I decided that it's like good result for

11:13.200 --> 11:18.640
proof of concept, and the last thing is, okay, how do you want to get your output?

11:22.080 --> 11:24.640
So yeah, I wanted it to have

11:24.880 --> 11:37.200
this way, it can, what I actually did it like about this output, and there were a lot of repeated

11:37.200 --> 11:47.440
questions for this different methods, for example, about missing mirror messages and error codes,

11:48.400 --> 11:54.400
so maybe they can be grouped differently, but I like it to go method or the method,

11:58.080 --> 12:10.720
so in the beginning, I'd just try to test it with unspecification, with all these intentional

12:10.720 --> 12:18.400
gaps, this was a piece of documentation, I created a while ago for internal use, so we can

12:18.400 --> 12:26.720
rule out situation that it was used in LLM learning, no, that just deleted some parts of

12:26.720 --> 12:35.120
that specification, all these gaps were found, and many more to be honest, but for a

12:35.120 --> 12:43.600
sake of a good example, I'll I've taken pet store API, which I'm sure all of you have seen at

12:43.600 --> 12:49.360
least once, and some of you may be even memorized, they're already, because yeah, if you work with

12:49.360 --> 12:59.680
APIs, it's a starting point, so the fun part, the first set of questions was full of philosophical

12:59.760 --> 13:10.720
ones, it was possible to delete an entity that doesn't exist, yeah, so on the one hand, such

13:10.720 --> 13:21.200
question highlights, the need to refine the output to get a cleaner, but on the other hand,

13:21.200 --> 13:28.400
when I shared this result, with it like to develop as I know, that it led to useful discussion

13:28.400 --> 13:36.560
about error handling and HK's clarity, so for example, do you need so specific error messages

13:36.560 --> 13:45.360
with here and there, so it can be somewhat useful, but yeah, I think you don't want to have all

13:45.360 --> 13:55.520
that questions like all the time, so and what about good questions, so how do you handle

13:55.520 --> 14:04.160
duplicate pets, duplicate it, pets, so same name and attributes, can the ideas be reused

14:07.040 --> 14:16.640
and the handling of bulkhead on your text existence is a huge question actually,

14:17.520 --> 14:30.240
and actually in the past so API, we'll lack of so many error codes and there are messages

14:30.240 --> 14:39.200
usually there are only one error code per endpoint, which is well, not enough for good API

14:39.200 --> 14:57.520
documentation, let's say. So yeah, we can work with output, more, but maybe play in a bit

14:57.520 --> 15:05.920
with philosophical questions also, sometimes, at least for one attempt, so obviously it can be

15:05.920 --> 15:12.880
customized by the way, so first things, yeah, you can use different user personas, it's in

15:12.880 --> 15:21.600
a different scenario, a LLM of your choice, and I would say it's not a big difference or

15:21.600 --> 15:29.040
well, when I was using different LLMs and you don't definitely don't need the latest version

15:29.360 --> 15:38.000
of huge LLMs, so for example for chatjipiti free dot 5 is good enough, you don't need

15:38.000 --> 15:50.400
go further. So it's a way to know those capabilities teams can forward enhance the effectiveness

15:50.400 --> 16:01.840
of user's relation to identify the communication gaps, and yeah, to sum up, so the tech world

16:01.840 --> 16:09.040
is of all wind, but the quality of the communications is like still a cornerstone for successful

16:09.040 --> 16:17.120
adoption, for successful usability, especially in the open source ecosystem, and AI is true potential

16:17.440 --> 16:27.440
in helping human efforts, not to place that. By using ad-driven user simulation, we can

16:27.440 --> 16:33.280
end-insify gaps, and then consistences in API documentation with remarkable efficiency, I would say,

16:34.320 --> 16:42.000
and this approach not only reduces the manual burden, but also brings a fresh and biased perspective,

16:42.000 --> 16:50.960
so yeah. So, and yeah, what I've shown you is just the beginning, a very raw

16:52.240 --> 17:01.840
and it's already, it can demonstrate how we can use AI as a partner in documentation testing

17:01.840 --> 17:10.320
process, but let it be just a starting point for your oral experiments and customizations.

17:11.200 --> 17:18.720
So, this method, like any other can flourish only when it makes a community collaboration,

17:19.440 --> 17:26.480
so yeah, my call is definitely to try it out, test your own API documentation, and share

17:26.480 --> 17:34.800
your insights, and yeah, let's refine this idea together, after all, every step to a better

17:34.800 --> 17:40.400
documentation is a step toward better software, so yeah, and that collaboration.

17:43.680 --> 17:50.320
So yeah, thank you, and I'll be happy to hear your thoughts and questions, if you have any...

18:05.360 --> 18:13.600
That's a question to you, so it's not a given question, it's a threisen question.

18:15.600 --> 18:21.280
Did you know, I mean, the talk to clearly, that's sort of a rendition is very small,

18:21.280 --> 18:27.040
but did you think about how to use these on our real cases?

18:28.000 --> 18:34.800
So, because sometimes the orientation is very big, so you're going to keep more or less costs

18:34.800 --> 18:41.760
and context problems, you need to search to a rack that sees them, do you ever go about how to

18:41.760 --> 18:49.920
do like this? Yeah, repeating the question, so have I thought or tried to use this

18:50.880 --> 18:58.880
on bigger documentation, whereas more context, so I haven't tried it, but since we extract

18:58.880 --> 19:14.800
needed information, we can make this flow a bit more... we can have more levels and just

19:15.440 --> 19:22.160
adding to the prompt only the needed context at the moment, so we need... we want to test

19:22.160 --> 19:28.560
only this part of the documentation, and here's all the relevant, already extracted information,

19:28.560 --> 19:36.640
yes, it should be possible. So yeah, it's a good thing, it's a good step forward.

19:36.640 --> 19:45.680
I have a little bit of heavy tried using this on the next structure documentation,

19:45.680 --> 19:52.320
but rather like now it's just gone. No, I haven't tried and I doubt that would work,

19:52.320 --> 20:00.000
so even now with APIs there is structured documentation, as you said, it creates this philosophical

20:00.000 --> 20:05.520
sometimes weird questions, and if it's like an element of possibilities of creating that

20:05.520 --> 20:14.080
questions, well, I think it would be like not clean at all, not actionable at all, so it's

20:14.080 --> 20:21.680
yeah, I believe a design for the structural documentation, yeah, here.

20:21.760 --> 20:28.320
And yeah, I'll just do a little bit more about how you enable the company to go down the

20:28.320 --> 20:35.040
prompt, so you have the role of the checkers and so on and play the documentation for.

20:35.040 --> 20:50.880
So let me get back to that slide, oh yeah, so I believe a kind of

20:52.640 --> 21:04.000
good. Because I'm a tech writer, I would have clear structure, and it's easier for me to,

21:04.000 --> 21:09.920
if it's not all messed up together, it's just easier to try, okay, now I change only that part

21:09.920 --> 21:19.520
and the check if I like results more. So all, since I've already selected,

21:20.480 --> 21:27.520
user simulation is the general idea, of course they're going to be real, but, okay, we said,

21:28.960 --> 21:36.240
who's doing what, but what's the objective? We're doing something to achieve something,

21:36.880 --> 21:43.200
and yeah, how restrictions and limitations appear here, I've already discussed and output requirements

21:43.280 --> 21:50.560
while it just makes it easier for us to work with output.

21:56.240 --> 21:58.560
Any more questions or?

21:58.560 --> 22:05.120
Okay, have you got any more roles already that you've tried out, or is it just a developer?

22:05.440 --> 22:13.760
So yeah, I've tried just leveraging this, like say, level of the developer, so if it's

22:13.760 --> 22:22.000
junior only learning how to do stuff, and if it's senior developer, and with junior it

22:22.960 --> 22:35.200
there were a lot more basic questions about, so I want to say, like, come on, if you're

22:35.200 --> 22:42.400
reading this documentation, why couldn't you google that before? That kind of basic questions,

22:42.400 --> 22:48.560
but at the same time, it's again just maybe my bias, because I already know it for a long time,

22:49.440 --> 22:56.880
and like yes, yes, this problem with expertise and both sides of expertise went low and high.

22:58.160 --> 23:01.920
And so senior questions, senior developer questions, well,