WEBVTT

00:00.000 --> 00:19.120
All right, hello everyone, oh thank you, thank you, my name is Shireen Bellamy, I'm a

00:19.120 --> 00:26.160
senior developer advocate for AI security and quantum over at Cisco, today I'm really

00:26.160 --> 00:32.080
excited to talk to you about beyond MCP servers because last year when I was doing a lot

00:32.080 --> 00:37.760
of my developer advocacy a lot of talks that I went to or spoke about had to do with the

00:37.760 --> 00:46.160
emerging agentic protocols and building MCP servers. So I'm excited to kind of expand beyond

00:46.160 --> 00:52.960
that this year by looking at why network automation's need knowledge graphs. And I'm going

00:52.960 --> 01:02.320
to do that based on an open source multi agent system that I'm a part of that's there's a group

01:02.320 --> 01:08.720
that I'm a part of at Cisco and I'll explain more about that later on, but basically there's an

01:08.720 --> 01:13.680
open source project called coffee agency and that's where I'm going to share my lessons that I learned.

01:15.440 --> 01:20.880
So a little bit about me, I'm not from Belgium, I'm from Boston, Massachusetts,

01:21.840 --> 01:28.720
I went there for grad school and so yeah my background is actually an AI and human computer

01:28.720 --> 01:34.720
interaction, but ever since joining Cisco last year I've learned a lot about network automation

01:34.720 --> 01:43.840
and where AI leaks into the networking world. So the way I was able to do that is through this

01:43.840 --> 01:49.360
agency project which I'll talk about in a minute, but other open source activities that I do,

01:49.360 --> 01:56.960
I do talks and attend meetups a lot for the Pi data community in Boston and New York City.

01:56.960 --> 02:04.400
So if you're over there, say hello. So what we're going to talk about, we're going to start off

02:04.400 --> 02:11.840
by just reflecting on the year of MCP and agency last year, then we're going to talk about an

02:11.840 --> 02:18.800
issue that I encountered while working on the project and how I saw that by learning about the

02:18.800 --> 02:27.120
power of graphs. So after that, I'll talk about how to try out coffee agency yourself and build on

02:28.160 --> 02:41.440
what I already did. Starting off with the year of MCP 2025, so MCP really started in like

02:41.440 --> 02:48.800
the end of 2024, but I think it's fair to say that it clearly was widely adopted

02:48.800 --> 02:56.160
at a very large scale throughout last year. I don't think many people are arguing about whether

02:56.160 --> 03:04.240
or not MCP is a standardized protocol for agents to communicate and that it solved a problem.

03:04.880 --> 03:11.200
That's very clear from the messaging last year, my opinion, and in terms of things that

03:11.200 --> 03:17.760
were built with it, there are so many MCP servers that have been built that I didn't even feel

03:17.760 --> 03:22.720
comfortable putting up a number on the slide because I don't know how accurate the numbers are

03:22.720 --> 03:29.440
that I found online. There's so many out there's one, like I feel like every day, there's a couple

03:29.440 --> 03:35.360
of hundred or something or a couple of tens or I don't know. So yeah, MCP, large growing.

03:37.920 --> 03:42.480
But I also want to talk about the year of agency, right? I spoke about how we all know

03:42.480 --> 03:50.320
what MCP is. A show of hands is anyone know what agency is in here? In the front, and that's because

03:50.320 --> 03:57.120
there are part of Team Cisco, so clearly I have some advocacy work to do here. The way that I learned

03:57.120 --> 04:05.280
about agency is through a reference application that I worked on. Now, what is agency? It's not just

04:05.280 --> 04:10.240
a Cisco project. It's actually a collaboration of different companies that came together and said,

04:10.240 --> 04:18.720
hey, with all of this agency AI talk that's happening, what does the internet of agents really look like?

04:18.720 --> 04:25.920
When everyone's talking about these protocols, what do these systems look like as they scale out?

04:25.920 --> 04:31.760
What can we mix and match and change in the in between layers? That's what the internet of

04:31.760 --> 04:40.240
agents is about. So agency team, we have a reference application, we call it coffee agency, which is

04:40.240 --> 04:50.080
based on like a coffee trade system, and there's two demos in there. There's a Gordo demo,

04:50.080 --> 04:56.720
which is a simple tuition system communicating, and what that's showing is a very simple example,

04:57.440 --> 05:02.000
not only of a protocol working for these agents to discover each other, but it's showing this

05:02.000 --> 05:09.760
concept, like I mentioned, of mix and match, so it's showing that if you have, if you look at the

05:09.760 --> 05:17.280
messaging layer and the transport layer that's involved in the protocol, you can use an A2A server

05:17.280 --> 05:23.600
and have a layer of slim, and then have these agents talk to each other, so that goes into

05:23.600 --> 05:32.400
that and shows that at a very micro level. Then what I'll be focusing on today is the Lungo application,

05:33.280 --> 05:40.960
which is a full multi-agent simulation with different patterns of communication. So it's extended a lot

05:40.960 --> 05:47.760
after this presentation, so I don't think we'll have time to answer all the questions about

05:47.760 --> 05:52.400
like updates on it, but I do have a QR code from my link to it at the end if you want to like

05:52.400 --> 06:00.160
message me through there. But for today, I'm going to be talking about the very base layer of Lungo,

06:00.160 --> 06:11.760
and how I got started, how I learned about what was the problem with how I learned what was the

06:11.760 --> 06:22.160
problem with protocols. Well, not about protocols, but about like MCP. So before I get into that,

06:22.160 --> 06:28.320
again, this is what the diagram of the structure of it looks like. So the way that the coffee

06:28.320 --> 06:35.040
multi-agent system works is that you start off with the auction supervisor agent as the buyer,

06:36.080 --> 06:41.200
and all these agents are designed by Lungo. By the way, there's structure through Lungo.

06:42.560 --> 06:48.880
But the supervisor agent is able to communicate with all the farm agents from different companies

06:49.440 --> 06:57.760
and request information about their supply of beans and the resources that they have.

06:58.320 --> 07:06.960
So it also includes, like on top of the A2A over a slim transport, which allows for the

07:06.960 --> 07:16.320
pubs of broadcasts and it allows for point to point communication as well. There's also two MCP

07:16.320 --> 07:22.960
servers already existing in the repo, which are the weather service, Python file, and the payment

07:23.120 --> 07:30.720
service Python file. There is a walkthrough available. I uploaded this online, so there should

07:30.720 --> 07:34.400
be publicly available after the talk, like all the QR codes and links that I have in here.

07:36.000 --> 07:42.720
Of how this works more in depth, this is just a quick presentation, so I'm going to fire through there.

07:44.400 --> 07:50.800
Here's just a quick screenshot from very far away. What it looks like, the base, I will zoom in

07:50.800 --> 07:57.040
later and show you what I did in the same UI. So the problem that I encounter while working with

07:57.040 --> 08:03.600
coffee agency is that there is an issue of infrastructure awareness among the agents. If I asked

08:03.600 --> 08:11.680
it a simple question, like how many beans do you have, then yeah, it'll answer, but sometimes

08:11.760 --> 08:22.000
if I answered something that required a little bit more depth, why did this order fail?

08:22.800 --> 08:28.480
It wouldn't be able to tell me specifically why or it would give me a very long paragraph

08:30.320 --> 08:37.600
in response. What I decided the issues were, based on what I was going through, is that the context

08:37.680 --> 08:44.960
window was filling up really fast because every query that I was sending in was basically giving

08:44.960 --> 08:51.920
the whole graph and in turn returning way more than I needed. And that's because when in the typical

08:51.920 --> 08:58.800
structure, the LLLM was doing all of the reasoning and because of that, it was talk, it was using

08:58.800 --> 09:07.200
way more tokens than needed. So in my research of how to fix this issue, I learned a lot about

09:08.160 --> 09:13.840
folks adding graphs to MCP servers and reference applications that exist in that round.

09:14.640 --> 09:21.280
Anthropic has reference application for knowledge graph memories and an MCP server.

09:21.280 --> 09:26.080
There's an MCP server for a Neo4j going to what I said earlier. There's an MCP server for a lot of

09:26.080 --> 09:34.800
different things. But specifically when it comes to IT infrastructure issues, LinkedIn was able to

09:34.800 --> 09:41.520
resolve their amount of tickets that were solved due to integrating knowledge graph with their

09:41.520 --> 09:47.760
MCP server. And Cisco also has a couple of case studies to where they were able to do some

09:47.760 --> 09:53.120
deep network troubleshooting in that manner and they were able to implement Jarvis. And that's

09:53.120 --> 10:02.320
sorry. So given that, I'm going to call attempt one that issue that I spoke about, where the LLM

10:02.400 --> 10:08.960
is dumping all of this JSON and in return, I'm getting all of these tokens used up because

10:08.960 --> 10:16.240
look how many characters are there, right? It's a lot to block. So what if instead of this situation

10:16.240 --> 10:22.320
here where you have the supervisor asking a question to the MCP server and the MCP server

10:22.320 --> 10:29.200
clearing directly to the LLM to parse that JSON, what if the MCP server answered it directly?

10:29.200 --> 10:36.640
So I know I said that the talk is beyond MCP servers, but the key is what can we do to enhance it?

10:36.640 --> 10:43.840
Not, you know, we did MCP last year. Let's get over it this year. I'm not on that boat.

10:44.720 --> 10:54.560
So attempt two, which is my actual first attempt, is that looking at knowledge graph extraction,

10:54.640 --> 10:59.200
because that's all of those case studies from different companies. I look more into knowledge graphs

10:59.200 --> 11:06.640
and I learned about retrieval mechanisms that were out there, like CAG and GraphRag. So

11:09.200 --> 11:15.440
the reason why these both interested me was because CAG was a logical form of guided reasoning

11:15.440 --> 11:21.040
with multi-hop Q&A. So I figured network topology has a lot of hops and you get usually to solve

11:21.040 --> 11:30.560
queries, so sounds useful, and then for GraphRag, because it's known for extracting entities and

11:30.560 --> 11:36.640
relationships. Again, I just felt like because of that natural graph traversal kind of property

11:36.640 --> 11:42.240
that network topology is naturally have, these two would be a good place to start in terms of looking

11:42.240 --> 11:50.320
out what it would look like if I could extract a knowledge graph. So what that procedure looked like was

11:50.400 --> 11:56.160
I collected the Lungo deployment logs and network diagrams as text, all the information

11:56.160 --> 12:01.840
fed it into both approaches, and then I generated a knowledge graph. And with that knowledge graph,

12:01.840 --> 12:08.960
I would just query different things like what services depend on the slim gateway, which is the

12:08.960 --> 12:17.360
transport that I mentioned. What I found was that CAG was still missing some implicit connections,

12:17.440 --> 12:24.640
even though they claim to be better at the multi-hopbing and stuff. So that wasn't a good idea.

12:24.640 --> 12:34.080
And then for GraphRag, it was better at summarizing, but it was very, very slow and it kept

12:34.080 --> 12:40.720
rebuilding every topology, which is the same issue that I came across in attempt one. And take

12:40.720 --> 12:48.400
it one step further, I learned that both of these are actually two extract graphs from documents,

12:48.400 --> 12:56.640
which are, you know, just deterministic static modes of information. So since Lungo's network

12:56.640 --> 13:00.560
topology is already structured data technically because it's already structured in that

13:00.560 --> 13:10.080
JSON, why extract what we already have, which led me to my final attempt, which is having a graph

13:10.080 --> 13:18.720
backed MCP server. And what that looked like was I ended up creating my own MCP server specifically

13:18.720 --> 13:30.400
compatible with that knowledge graph, but then I also attached a graph database to it.

13:30.560 --> 13:39.520
So I'll get more into that, but the goal for this was to have the tokens more aligned with the

13:39.520 --> 13:45.280
output to reduce the token below, have a native traversal instead of generating the same thing twice

13:45.280 --> 13:51.360
and having more direct queries. So I figure if we're in a situation where the supervisor has a

13:51.360 --> 13:57.600
question for the server, the server is running directly a query, a cipher query, which is what you

13:57.600 --> 14:04.000
can use to traverse the database, then the supervisor would receive exactly the right amount of tokens.

14:05.600 --> 14:10.000
This is the proposed stack of what I used to do that. The orchestration was with land graph,

14:10.640 --> 14:19.040
the interface through the MCP server, the retrievals through the cipher, and the data is done with

14:19.120 --> 14:27.120
a knowledge graph in Neo4j. This is what a knowledge graph in Neo4j looks like, and it's the visual

14:27.120 --> 14:33.920
for the topology demo. So again, the two big things that I generated, it's the knowledge graph

14:33.920 --> 14:39.360
and the MCP server was several different features that align with network topology.

14:40.960 --> 14:48.960
And ultimately with that, the capability that I pitched to the team is this idea of impact analysis.

14:49.360 --> 14:57.760
Identifying which visual node in the knowledge graph would be impacted if other servers go down,

14:57.760 --> 15:03.840
and that was done with the cipher query, like something like this, which would be something that

15:03.840 --> 15:11.920
you can solve a question like, what is your blast radius? So here's an example of the approach

15:11.920 --> 15:17.920
that I did in the same UI as the beginning, and inches to time, I'm just going to skip around a

15:18.000 --> 15:29.840
little bit, but basically what I'm asking is for the first question is to show the approach that

15:29.840 --> 15:38.080
I implemented, and you'll see there's a very precise answer, and how many characters it is,

15:38.080 --> 15:46.480
which would the token amount differs depending on what tokenizer is being used. So I just measured

15:46.640 --> 15:52.560
it in characters, and then there's also an example, if you say, can you use a approach one,

15:52.560 --> 15:57.920
which is the bad approach, you'll see the problem that I ran into, where it was generating

15:57.920 --> 16:05.280
just a large amount of text. So you can check that out on your own time, but what the UI is doing

16:05.280 --> 16:12.560
is recreating the story that I went through and giving you the opportunity to go through the code

16:12.560 --> 16:18.640
and expand upon it yourself. What about other use cases like routing dependencies? Well for that,

16:18.640 --> 16:23.920
it would be an example of a data change, maybe adding obviously routing information, and that

16:23.920 --> 16:29.680
physical topology we discussed, the questions would be something like, what prefixes are affected

16:29.680 --> 16:37.840
if router x fails. Policy dependencies, that would look like something like the data would be security

16:37.840 --> 16:44.960
groups, or ACLs, as graphs, or firewall rules. And the question would be like, what breaks if we

16:44.960 --> 16:54.000
remove this one firewall rule? So there are a lot of growth opportunities. Again, this is available

16:54.000 --> 16:59.840
online, so I'm not going to go to too much into depth, but I truly have a feeling that in the

16:59.840 --> 17:04.320
realm of context engineering, all of these are greater approaches to play around with the reference

17:04.400 --> 17:10.800
application. And today, what we spoke about, this is the full realm of agency and that

17:10.800 --> 17:16.400
Internet of Agents. We spoke about specifically the syntactic layer and the messaging layer.

17:16.400 --> 17:23.280
Syntactic being A2A, MCP and messaging layer, we didn't speak about it, but it was it's in the demo

17:23.280 --> 17:32.560
for like the mix and match concept we spoke about swim. So getting started, when you get to the QR code

17:32.800 --> 17:35.920
for a long ago, you're going to see again, there's two options. Remember, long ago, there's

17:35.920 --> 17:44.960
one we spoke about today. And then my demo is on my repo. So with that, I spent through the end.

17:44.960 --> 17:51.040
I'm sorry if it seemed a little rush towards the end, but thank you very much. If you're interested

17:51.040 --> 17:56.320
in further questions about agency, the work that we do, or how to extend this, that's the QR

17:56.320 --> 18:02.960
code for my LinkedIn. And then the QR code for the demo that I showed the quick UI and about this

18:02.960 --> 18:17.840
presentation is on the other side. Great. So we do have a few minutes for questions. Some of the issues

18:17.840 --> 18:24.400
which you touched upon are expected to be solved. Also for example, with this A2A protocol,

18:24.400 --> 18:30.960
do you have any take on this kind of how matured this and so on from your side?

18:33.360 --> 18:39.520
Yes. So if I understand your question correctly, you're asking if there's room to mature beyond

18:39.520 --> 18:42.320
the components that I showed you today, like the A2A.

18:42.320 --> 18:59.680
Yes. So A2A, MCP, they both are doing essentially the same thing in terms of being

18:59.680 --> 19:05.680
two different communication patterns. However, what I'm showing today is it's more of a reference

19:05.680 --> 19:10.880
application for looking in between that. So us as network people interested right and how things

19:10.960 --> 19:19.840
are communicating under, that's what this is. Yeah. So if you were looking to like mod out A2A with

19:19.840 --> 19:27.680
other components, I've seen here, like maybe you want to add like age and identity, age and

19:27.680 --> 19:35.120
directory, other things. Yeah. It's more aligned with this. And one more question.

19:35.760 --> 19:48.480
Hi, thanks for the talk. How much maintenance would it be to keep this effectively in knowledge

19:48.480 --> 19:55.680
graph, but like a customized one, with an MCP server as the topology updates, as the

19:55.680 --> 20:02.480
like consider that if we talk about something more dynamic than just the network of physical devices,

20:02.480 --> 20:07.840
what we're talking about, the policy changes that you mentioned, or the cloud native deployment

20:07.840 --> 20:16.640
that may change every minute. How much high maintenance would it be to implement this approach

20:16.640 --> 20:25.360
in this case? Well, in that case, it depends on the architecture that you built. So, for example,

20:25.360 --> 20:29.440
if the policy changes, right policies are changing everyday, especially like, you know, you're up.

20:30.400 --> 20:37.760
So, when you deal with that, maybe if you alter any of these other areas,

20:37.760 --> 20:43.200
you can automate a certain section of that, and that way you're not working to keep up with the

20:43.280 --> 20:45.200
zones of suicide.

20:48.480 --> 20:52.480
We have time for maybe one more question, anyone else? Yeah.

21:00.880 --> 21:06.160
Well, thank you for the presentation. First of all, it was very interesting, but my main

21:06.240 --> 21:13.680
question is, what do you think it's the most useful strategy to fill up the graph that the

21:13.680 --> 21:22.560
base of the knowledge graph, whatever it is? What do you mean by fill up? But, I mean, in this case,

21:22.560 --> 21:28.400
you mentioned that we already have a structured data for the network topologics, etc, but let's

21:28.400 --> 21:33.120
imagine that we are doing this, let's say in real life, so we're starting from an existing project,

21:33.120 --> 21:39.120
or an existing architecture, what do you think will be the best way to start building this

21:39.120 --> 21:46.160
knowledge graph, or what do you use or suggest what techniques we can use to build the graph

21:46.160 --> 21:54.320
before you can be able to use it? Um, would this existing architecture already have a jented properties

21:54.320 --> 22:02.880
within it, to any capacity? Yes. Yes, like what? I mean, we may have

22:02.960 --> 22:09.920
lots of documents that may define parts of the architecture, and I want to build the graph,

22:09.920 --> 22:15.920
including, let's say, all this knowledge that it's shot the red and in different sources. Okay,

22:16.640 --> 22:22.800
so I would say for that, again, this is a very small scale, but you're talking about something

22:22.800 --> 22:28.960
that's existing, that for that, keep in mind that chart that I just had on the screen,

22:29.040 --> 22:36.960
but then your focus would be more so on the messaging layer, because that's in charge of

22:36.960 --> 22:41.680
like transporting the message that you would need to, like, for these agents to come in and share

22:41.680 --> 22:47.840
the data and join the broadcast through either pub sub or request response or whatever it is.

22:47.840 --> 22:53.120
So yeah, in that sense, if you want to automate it, I would say that all of the enhancement would

22:53.120 --> 22:57.120
be done in that layer. Thank you. Mm-hmm.

23:00.960 --> 23:03.920
Great, well, thank you very much for this insightful talk.

23:04.960 --> 23:06.960
Okay.