WEBVTT 00:00.000 --> 00:19.120 All right, hello everyone, oh thank you, thank you, my name is Shireen Bellamy, I'm a 00:19.120 --> 00:26.160 senior developer advocate for AI security and quantum over at Cisco, today I'm really 00:26.160 --> 00:32.080 excited to talk to you about beyond MCP servers because last year when I was doing a lot 00:32.080 --> 00:37.760 of my developer advocacy a lot of talks that I went to or spoke about had to do with the 00:37.760 --> 00:46.160 emerging agentic protocols and building MCP servers. So I'm excited to kind of expand beyond 00:46.160 --> 00:52.960 that this year by looking at why network automation's need knowledge graphs. And I'm going 00:52.960 --> 01:02.320 to do that based on an open source multi agent system that I'm a part of that's there's a group 01:02.320 --> 01:08.720 that I'm a part of at Cisco and I'll explain more about that later on, but basically there's an 01:08.720 --> 01:13.680 open source project called coffee agency and that's where I'm going to share my lessons that I learned. 01:15.440 --> 01:20.880 So a little bit about me, I'm not from Belgium, I'm from Boston, Massachusetts, 01:21.840 --> 01:28.720 I went there for grad school and so yeah my background is actually an AI and human computer 01:28.720 --> 01:34.720 interaction, but ever since joining Cisco last year I've learned a lot about network automation 01:34.720 --> 01:43.840 and where AI leaks into the networking world. So the way I was able to do that is through this 01:43.840 --> 01:49.360 agency project which I'll talk about in a minute, but other open source activities that I do, 01:49.360 --> 01:56.960 I do talks and attend meetups a lot for the Pi data community in Boston and New York City. 01:56.960 --> 02:04.400 So if you're over there, say hello. So what we're going to talk about, we're going to start off 02:04.400 --> 02:11.840 by just reflecting on the year of MCP and agency last year, then we're going to talk about an 02:11.840 --> 02:18.800 issue that I encountered while working on the project and how I saw that by learning about the 02:18.800 --> 02:27.120 power of graphs. So after that, I'll talk about how to try out coffee agency yourself and build on 02:28.160 --> 02:41.440 what I already did. Starting off with the year of MCP 2025, so MCP really started in like 02:41.440 --> 02:48.800 the end of 2024, but I think it's fair to say that it clearly was widely adopted 02:48.800 --> 02:56.160 at a very large scale throughout last year. I don't think many people are arguing about whether 02:56.160 --> 03:04.240 or not MCP is a standardized protocol for agents to communicate and that it solved a problem. 03:04.880 --> 03:11.200 That's very clear from the messaging last year, my opinion, and in terms of things that 03:11.200 --> 03:17.760 were built with it, there are so many MCP servers that have been built that I didn't even feel 03:17.760 --> 03:22.720 comfortable putting up a number on the slide because I don't know how accurate the numbers are 03:22.720 --> 03:29.440 that I found online. There's so many out there's one, like I feel like every day, there's a couple 03:29.440 --> 03:35.360 of hundred or something or a couple of tens or I don't know. So yeah, MCP, large growing. 03:37.920 --> 03:42.480 But I also want to talk about the year of agency, right? I spoke about how we all know 03:42.480 --> 03:50.320 what MCP is. A show of hands is anyone know what agency is in here? In the front, and that's because 03:50.320 --> 03:57.120 there are part of Team Cisco, so clearly I have some advocacy work to do here. The way that I learned 03:57.120 --> 04:05.280 about agency is through a reference application that I worked on. Now, what is agency? It's not just 04:05.280 --> 04:10.240 a Cisco project. It's actually a collaboration of different companies that came together and said, 04:10.240 --> 04:18.720 hey, with all of this agency AI talk that's happening, what does the internet of agents really look like? 04:18.720 --> 04:25.920 When everyone's talking about these protocols, what do these systems look like as they scale out? 04:25.920 --> 04:31.760 What can we mix and match and change in the in between layers? That's what the internet of 04:31.760 --> 04:40.240 agents is about. So agency team, we have a reference application, we call it coffee agency, which is 04:40.240 --> 04:50.080 based on like a coffee trade system, and there's two demos in there. There's a Gordo demo, 04:50.080 --> 04:56.720 which is a simple tuition system communicating, and what that's showing is a very simple example, 04:57.440 --> 05:02.000 not only of a protocol working for these agents to discover each other, but it's showing this 05:02.000 --> 05:09.760 concept, like I mentioned, of mix and match, so it's showing that if you have, if you look at the 05:09.760 --> 05:17.280 messaging layer and the transport layer that's involved in the protocol, you can use an A2A server 05:17.280 --> 05:23.600 and have a layer of slim, and then have these agents talk to each other, so that goes into 05:23.600 --> 05:32.400 that and shows that at a very micro level. Then what I'll be focusing on today is the Lungo application, 05:33.280 --> 05:40.960 which is a full multi-agent simulation with different patterns of communication. So it's extended a lot 05:40.960 --> 05:47.760 after this presentation, so I don't think we'll have time to answer all the questions about 05:47.760 --> 05:52.400 like updates on it, but I do have a QR code from my link to it at the end if you want to like 05:52.400 --> 06:00.160 message me through there. But for today, I'm going to be talking about the very base layer of Lungo, 06:00.160 --> 06:11.760 and how I got started, how I learned about what was the problem with how I learned what was the 06:11.760 --> 06:22.160 problem with protocols. Well, not about protocols, but about like MCP. So before I get into that, 06:22.160 --> 06:28.320 again, this is what the diagram of the structure of it looks like. So the way that the coffee 06:28.320 --> 06:35.040 multi-agent system works is that you start off with the auction supervisor agent as the buyer, 06:36.080 --> 06:41.200 and all these agents are designed by Lungo. By the way, there's structure through Lungo. 06:42.560 --> 06:48.880 But the supervisor agent is able to communicate with all the farm agents from different companies 06:49.440 --> 06:57.760 and request information about their supply of beans and the resources that they have. 06:58.320 --> 07:06.960 So it also includes, like on top of the A2A over a slim transport, which allows for the 07:06.960 --> 07:16.320 pubs of broadcasts and it allows for point to point communication as well. There's also two MCP 07:16.320 --> 07:22.960 servers already existing in the repo, which are the weather service, Python file, and the payment 07:23.120 --> 07:30.720 service Python file. There is a walkthrough available. I uploaded this online, so there should 07:30.720 --> 07:34.400 be publicly available after the talk, like all the QR codes and links that I have in here. 07:36.000 --> 07:42.720 Of how this works more in depth, this is just a quick presentation, so I'm going to fire through there. 07:44.400 --> 07:50.800 Here's just a quick screenshot from very far away. What it looks like, the base, I will zoom in 07:50.800 --> 07:57.040 later and show you what I did in the same UI. So the problem that I encounter while working with 07:57.040 --> 08:03.600 coffee agency is that there is an issue of infrastructure awareness among the agents. If I asked 08:03.600 --> 08:11.680 it a simple question, like how many beans do you have, then yeah, it'll answer, but sometimes 08:11.760 --> 08:22.000 if I answered something that required a little bit more depth, why did this order fail? 08:22.800 --> 08:28.480 It wouldn't be able to tell me specifically why or it would give me a very long paragraph 08:30.320 --> 08:37.600 in response. What I decided the issues were, based on what I was going through, is that the context 08:37.680 --> 08:44.960 window was filling up really fast because every query that I was sending in was basically giving 08:44.960 --> 08:51.920 the whole graph and in turn returning way more than I needed. And that's because when in the typical 08:51.920 --> 08:58.800 structure, the LLLM was doing all of the reasoning and because of that, it was talk, it was using 08:58.800 --> 09:07.200 way more tokens than needed. So in my research of how to fix this issue, I learned a lot about 09:08.160 --> 09:13.840 folks adding graphs to MCP servers and reference applications that exist in that round. 09:14.640 --> 09:21.280 Anthropic has reference application for knowledge graph memories and an MCP server. 09:21.280 --> 09:26.080 There's an MCP server for a Neo4j going to what I said earlier. There's an MCP server for a lot of 09:26.080 --> 09:34.800 different things. But specifically when it comes to IT infrastructure issues, LinkedIn was able to 09:34.800 --> 09:41.520 resolve their amount of tickets that were solved due to integrating knowledge graph with their 09:41.520 --> 09:47.760 MCP server. And Cisco also has a couple of case studies to where they were able to do some 09:47.760 --> 09:53.120 deep network troubleshooting in that manner and they were able to implement Jarvis. And that's 09:53.120 --> 10:02.320 sorry. So given that, I'm going to call attempt one that issue that I spoke about, where the LLM 10:02.400 --> 10:08.960 is dumping all of this JSON and in return, I'm getting all of these tokens used up because 10:08.960 --> 10:16.240 look how many characters are there, right? It's a lot to block. So what if instead of this situation 10:16.240 --> 10:22.320 here where you have the supervisor asking a question to the MCP server and the MCP server 10:22.320 --> 10:29.200 clearing directly to the LLM to parse that JSON, what if the MCP server answered it directly? 10:29.200 --> 10:36.640 So I know I said that the talk is beyond MCP servers, but the key is what can we do to enhance it? 10:36.640 --> 10:43.840 Not, you know, we did MCP last year. Let's get over it this year. I'm not on that boat. 10:44.720 --> 10:54.560 So attempt two, which is my actual first attempt, is that looking at knowledge graph extraction, 10:54.640 --> 10:59.200 because that's all of those case studies from different companies. I look more into knowledge graphs 10:59.200 --> 11:06.640 and I learned about retrieval mechanisms that were out there, like CAG and GraphRag. So 11:09.200 --> 11:15.440 the reason why these both interested me was because CAG was a logical form of guided reasoning 11:15.440 --> 11:21.040 with multi-hop Q&A. So I figured network topology has a lot of hops and you get usually to solve 11:21.040 --> 11:30.560 queries, so sounds useful, and then for GraphRag, because it's known for extracting entities and 11:30.560 --> 11:36.640 relationships. Again, I just felt like because of that natural graph traversal kind of property 11:36.640 --> 11:42.240 that network topology is naturally have, these two would be a good place to start in terms of looking 11:42.240 --> 11:50.320 out what it would look like if I could extract a knowledge graph. So what that procedure looked like was 11:50.400 --> 11:56.160 I collected the Lungo deployment logs and network diagrams as text, all the information 11:56.160 --> 12:01.840 fed it into both approaches, and then I generated a knowledge graph. And with that knowledge graph, 12:01.840 --> 12:08.960 I would just query different things like what services depend on the slim gateway, which is the 12:08.960 --> 12:17.360 transport that I mentioned. What I found was that CAG was still missing some implicit connections, 12:17.440 --> 12:24.640 even though they claim to be better at the multi-hopbing and stuff. So that wasn't a good idea. 12:24.640 --> 12:34.080 And then for GraphRag, it was better at summarizing, but it was very, very slow and it kept 12:34.080 --> 12:40.720 rebuilding every topology, which is the same issue that I came across in attempt one. And take 12:40.720 --> 12:48.400 it one step further, I learned that both of these are actually two extract graphs from documents, 12:48.400 --> 12:56.640 which are, you know, just deterministic static modes of information. So since Lungo's network 12:56.640 --> 13:00.560 topology is already structured data technically because it's already structured in that 13:00.560 --> 13:10.080 JSON, why extract what we already have, which led me to my final attempt, which is having a graph 13:10.080 --> 13:18.720 backed MCP server. And what that looked like was I ended up creating my own MCP server specifically 13:18.720 --> 13:30.400 compatible with that knowledge graph, but then I also attached a graph database to it. 13:30.560 --> 13:39.520 So I'll get more into that, but the goal for this was to have the tokens more aligned with the 13:39.520 --> 13:45.280 output to reduce the token below, have a native traversal instead of generating the same thing twice 13:45.280 --> 13:51.360 and having more direct queries. So I figure if we're in a situation where the supervisor has a 13:51.360 --> 13:57.600 question for the server, the server is running directly a query, a cipher query, which is what you 13:57.600 --> 14:04.000 can use to traverse the database, then the supervisor would receive exactly the right amount of tokens. 14:05.600 --> 14:10.000 This is the proposed stack of what I used to do that. The orchestration was with land graph, 14:10.640 --> 14:19.040 the interface through the MCP server, the retrievals through the cipher, and the data is done with 14:19.120 --> 14:27.120 a knowledge graph in Neo4j. This is what a knowledge graph in Neo4j looks like, and it's the visual 14:27.120 --> 14:33.920 for the topology demo. So again, the two big things that I generated, it's the knowledge graph 14:33.920 --> 14:39.360 and the MCP server was several different features that align with network topology. 14:40.960 --> 14:48.960 And ultimately with that, the capability that I pitched to the team is this idea of impact analysis. 14:49.360 --> 14:57.760 Identifying which visual node in the knowledge graph would be impacted if other servers go down, 14:57.760 --> 15:03.840 and that was done with the cipher query, like something like this, which would be something that 15:03.840 --> 15:11.920 you can solve a question like, what is your blast radius? So here's an example of the approach 15:11.920 --> 15:17.920 that I did in the same UI as the beginning, and inches to time, I'm just going to skip around a 15:18.000 --> 15:29.840 little bit, but basically what I'm asking is for the first question is to show the approach that 15:29.840 --> 15:38.080 I implemented, and you'll see there's a very precise answer, and how many characters it is, 15:38.080 --> 15:46.480 which would the token amount differs depending on what tokenizer is being used. So I just measured 15:46.640 --> 15:52.560 it in characters, and then there's also an example, if you say, can you use a approach one, 15:52.560 --> 15:57.920 which is the bad approach, you'll see the problem that I ran into, where it was generating 15:57.920 --> 16:05.280 just a large amount of text. So you can check that out on your own time, but what the UI is doing 16:05.280 --> 16:12.560 is recreating the story that I went through and giving you the opportunity to go through the code 16:12.560 --> 16:18.640 and expand upon it yourself. What about other use cases like routing dependencies? Well for that, 16:18.640 --> 16:23.920 it would be an example of a data change, maybe adding obviously routing information, and that 16:23.920 --> 16:29.680 physical topology we discussed, the questions would be something like, what prefixes are affected 16:29.680 --> 16:37.840 if router x fails. Policy dependencies, that would look like something like the data would be security 16:37.840 --> 16:44.960 groups, or ACLs, as graphs, or firewall rules. And the question would be like, what breaks if we 16:44.960 --> 16:54.000 remove this one firewall rule? So there are a lot of growth opportunities. Again, this is available 16:54.000 --> 16:59.840 online, so I'm not going to go to too much into depth, but I truly have a feeling that in the 16:59.840 --> 17:04.320 realm of context engineering, all of these are greater approaches to play around with the reference 17:04.400 --> 17:10.800 application. And today, what we spoke about, this is the full realm of agency and that 17:10.800 --> 17:16.400 Internet of Agents. We spoke about specifically the syntactic layer and the messaging layer. 17:16.400 --> 17:23.280 Syntactic being A2A, MCP and messaging layer, we didn't speak about it, but it was it's in the demo 17:23.280 --> 17:32.560 for like the mix and match concept we spoke about swim. So getting started, when you get to the QR code 17:32.800 --> 17:35.920 for a long ago, you're going to see again, there's two options. Remember, long ago, there's 17:35.920 --> 17:44.960 one we spoke about today. And then my demo is on my repo. So with that, I spent through the end. 17:44.960 --> 17:51.040 I'm sorry if it seemed a little rush towards the end, but thank you very much. If you're interested 17:51.040 --> 17:56.320 in further questions about agency, the work that we do, or how to extend this, that's the QR 17:56.320 --> 18:02.960 code for my LinkedIn. And then the QR code for the demo that I showed the quick UI and about this 18:02.960 --> 18:17.840 presentation is on the other side. Great. So we do have a few minutes for questions. Some of the issues 18:17.840 --> 18:24.400 which you touched upon are expected to be solved. Also for example, with this A2A protocol, 18:24.400 --> 18:30.960 do you have any take on this kind of how matured this and so on from your side? 18:33.360 --> 18:39.520 Yes. So if I understand your question correctly, you're asking if there's room to mature beyond 18:39.520 --> 18:42.320 the components that I showed you today, like the A2A. 18:42.320 --> 18:59.680 Yes. So A2A, MCP, they both are doing essentially the same thing in terms of being 18:59.680 --> 19:05.680 two different communication patterns. However, what I'm showing today is it's more of a reference 19:05.680 --> 19:10.880 application for looking in between that. So us as network people interested right and how things 19:10.960 --> 19:19.840 are communicating under, that's what this is. Yeah. So if you were looking to like mod out A2A with 19:19.840 --> 19:27.680 other components, I've seen here, like maybe you want to add like age and identity, age and 19:27.680 --> 19:35.120 directory, other things. Yeah. It's more aligned with this. And one more question. 19:35.760 --> 19:48.480 Hi, thanks for the talk. How much maintenance would it be to keep this effectively in knowledge 19:48.480 --> 19:55.680 graph, but like a customized one, with an MCP server as the topology updates, as the 19:55.680 --> 20:02.480 like consider that if we talk about something more dynamic than just the network of physical devices, 20:02.480 --> 20:07.840 what we're talking about, the policy changes that you mentioned, or the cloud native deployment 20:07.840 --> 20:16.640 that may change every minute. How much high maintenance would it be to implement this approach 20:16.640 --> 20:25.360 in this case? Well, in that case, it depends on the architecture that you built. So, for example, 20:25.360 --> 20:29.440 if the policy changes, right policies are changing everyday, especially like, you know, you're up. 20:30.400 --> 20:37.760 So, when you deal with that, maybe if you alter any of these other areas, 20:37.760 --> 20:43.200 you can automate a certain section of that, and that way you're not working to keep up with the 20:43.280 --> 20:45.200 zones of suicide. 20:48.480 --> 20:52.480 We have time for maybe one more question, anyone else? Yeah. 21:00.880 --> 21:06.160 Well, thank you for the presentation. First of all, it was very interesting, but my main 21:06.240 --> 21:13.680 question is, what do you think it's the most useful strategy to fill up the graph that the 21:13.680 --> 21:22.560 base of the knowledge graph, whatever it is? What do you mean by fill up? But, I mean, in this case, 21:22.560 --> 21:28.400 you mentioned that we already have a structured data for the network topologics, etc, but let's 21:28.400 --> 21:33.120 imagine that we are doing this, let's say in real life, so we're starting from an existing project, 21:33.120 --> 21:39.120 or an existing architecture, what do you think will be the best way to start building this 21:39.120 --> 21:46.160 knowledge graph, or what do you use or suggest what techniques we can use to build the graph 21:46.160 --> 21:54.320 before you can be able to use it? Um, would this existing architecture already have a jented properties 21:54.320 --> 22:02.880 within it, to any capacity? Yes. Yes, like what? I mean, we may have 22:02.960 --> 22:09.920 lots of documents that may define parts of the architecture, and I want to build the graph, 22:09.920 --> 22:15.920 including, let's say, all this knowledge that it's shot the red and in different sources. Okay, 22:16.640 --> 22:22.800 so I would say for that, again, this is a very small scale, but you're talking about something 22:22.800 --> 22:28.960 that's existing, that for that, keep in mind that chart that I just had on the screen, 22:29.040 --> 22:36.960 but then your focus would be more so on the messaging layer, because that's in charge of 22:36.960 --> 22:41.680 like transporting the message that you would need to, like, for these agents to come in and share 22:41.680 --> 22:47.840 the data and join the broadcast through either pub sub or request response or whatever it is. 22:47.840 --> 22:53.120 So yeah, in that sense, if you want to automate it, I would say that all of the enhancement would 22:53.120 --> 22:57.120 be done in that layer. Thank you. Mm-hmm. 23:00.960 --> 23:03.920 Great, well, thank you very much for this insightful talk. 23:04.960 --> 23:06.960 Okay.