WEBVTT 00:00.000 --> 00:18.880 And so now for the first real talk of the day, out, which is Victor, which is not 00:18.880 --> 00:26.120 boo, with amazing, who we're talking about monoliths and very old monoliths almost as 00:26.120 --> 00:31.400 all does this death room, which I'm very curious to hear about, because I'm also dealing 00:31.400 --> 00:35.600 with very old codes around with a blast. 00:35.600 --> 00:43.160 Welcome to my talk and modularizing a 10-year monolith. 00:43.160 --> 00:44.480 My name is Victor Libaslowski. 00:44.480 --> 00:48.240 I've been in tech for over 25 years and I'm currently a principal software engineer at 00:48.240 --> 00:52.040 fleet device management at fleet, we make endpoint telemetry for corporate security 00:52.040 --> 00:55.880 teams and device management for corporate IT teams. 00:55.880 --> 00:59.440 With my current job, it took me roughly a year to figure out the code base, get familiar 00:59.440 --> 01:02.760 with a programming language, and understand our business domain. 01:02.760 --> 01:06.760 So around that time, I started to notice a consistent issue whenever working on our service 01:06.760 --> 01:07.760 package. 01:07.760 --> 01:13.200 The service package was the biggest one in our code base, and whenever I was recompiling a test 01:13.200 --> 01:15.880 in this package, it took roughly 30 seconds. 01:15.880 --> 01:20.320 So if I modified one character in a string, it took roughly 30 seconds to recompile the 01:20.320 --> 01:23.080 package and its unit tests. 01:23.080 --> 01:27.320 So I did a little digging, the gold programming language is known for fast compile times. 01:27.320 --> 01:31.840 However, the linking of all the packages was dominating the compile. 01:31.840 --> 01:33.640 From this analysis, they usually was obvious. 01:33.640 --> 01:38.200 The service package was just too big and had too many dependencies. 01:38.200 --> 01:41.880 I asked myself a simple question, what am I actually trying to do here? 01:41.880 --> 01:47.320 I'm not trying to fight the build system or wrestle with a giant package. 01:47.320 --> 01:50.160 I'm just trying to write good software. 01:50.160 --> 01:55.440 And it shouldn't take 30 seconds of self therapy every time I change a string literal. 01:55.440 --> 01:58.680 So if I just want to write good software, what needs to change? 01:58.680 --> 02:02.200 I want a code base that feels light, understandable, and fast. 02:02.200 --> 02:05.080 One that helps me think instead of slowing me down. 02:05.080 --> 02:09.680 And I realized the only way to get there was to make the code base itself smaller. 02:09.680 --> 02:12.560 So we tried modularizing. 02:12.560 --> 02:17.000 Next, let's discuss our first attempt of modularization, whereas you might guess things 02:17.040 --> 02:20.480 didn't quite go as smooth as we hoped. 02:20.480 --> 02:22.280 Let's talk about our code base. 02:22.280 --> 02:26.360 Fleet device management is primarily MIT license open source. 02:26.360 --> 02:30.080 We also maintain a small set of advanced features in the clearly separated directory 02:30.080 --> 02:31.720 under a different license. 02:31.720 --> 02:38.600 Our code base is roughly 10 years old with over 500,000 lines of go back and code. 02:38.600 --> 02:42.480 This is a much simplified picture of the core of our server code. 02:42.480 --> 02:45.920 We have a single huge go package for the API layer. 02:45.920 --> 02:47.120 It's our service package. 02:47.120 --> 02:50.560 It includes controllers and the service containing our business logic. 02:50.560 --> 02:52.800 The API layer calls the persistence layer. 02:52.800 --> 02:55.200 The persistence layer is another huge go package. 02:55.200 --> 02:59.800 It constructs the SQL queries and executes them against our my SQL database. 02:59.800 --> 03:04.480 We also have an in memory cache, a React front end, a CLI, and a bunch of other components 03:04.480 --> 03:06.520 which are not pictured here. 03:06.520 --> 03:09.880 So at one of our engineering old hands meetings, I gave a presentation. 03:09.880 --> 03:15.720 I described how we could scale our code base to be a more modular architecture. 03:15.720 --> 03:19.720 The core idea of my presentation was that when working on a new major feature, we could 03:19.720 --> 03:22.720 create a new module of vertical slice. 03:22.720 --> 03:26.600 This new vertical slice would mirror the structure of the legacy code. 03:26.600 --> 03:30.480 We would actually organize it exactly like the legacy code that way engineers would still 03:30.480 --> 03:33.640 roughly know where things are. 03:33.640 --> 03:40.200 This actually looks exactly like a microservice but still within a single compiled binary. 03:40.200 --> 03:45.680 My presentation seemed to land and I felt confident enough to try it on a real feature. 03:45.680 --> 03:49.120 So shortly afterwards, we started working on a new feature and this new feature seemed 03:49.120 --> 03:52.640 like an ideal candidate to try the new modularization approach. 03:52.640 --> 03:56.200 The feature was Android MDM, mobile device management. 03:56.200 --> 03:58.240 I created the new structure for the Android support. 03:58.240 --> 04:01.120 I created new packages and a dedicated directory. 04:01.120 --> 04:04.920 Most of the work was actually on tangling several common parts so they could be reused 04:04.920 --> 04:08.680 between the legacy code and the new Android feature. 04:08.680 --> 04:13.200 I put up the PR, the pull request, and it just sat there. 04:13.200 --> 04:17.720 Our guidance is to review and merge PRs within 24 hours but no one wanted to touch this 04:17.720 --> 04:18.720 one. 04:18.720 --> 04:22.600 In retrospect, I should have done a better job communicating to the team that I was actually 04:22.600 --> 04:26.160 going to do what I proposed in my earlier presentation. 04:26.160 --> 04:28.600 Some engineers felt surprised by the changes. 04:28.600 --> 04:32.480 They didn't feel like they had a voice in their approach, they didn't have an opportunity 04:32.480 --> 04:33.800 to speak up. 04:33.800 --> 04:36.800 So facing this resistance, I'd backtracked. 04:36.800 --> 04:41.960 I decided to take a step back and unsplit the my SQL layer so all requests would still 04:41.960 --> 04:44.920 go to the same my SQL schema. 04:44.920 --> 04:51.920 I also wrote two ADRs, architectural decision records, want to split the API layer and want 04:51.920 --> 04:56.680 to split the persistence layer for this Android feature. 04:56.680 --> 05:00.760 The ADR to split the API layer was approved but the ADR to split the persistence layer 05:00.760 --> 05:03.520 was rejected. 05:03.520 --> 05:06.600 Engineers voiced their concerns, their feedback was consistent. 05:06.600 --> 05:11.760 This felt risky, confusing, and too big to do all at once. 05:11.760 --> 05:16.920 So I reverted the persistence layer changes, then you API layer now talk to the same 05:16.920 --> 05:21.160 large-share persistent package as the rest of the code base. 05:21.160 --> 05:25.760 And at that point, I moved off the feature and delivery pressure to cover, without sustained 05:25.760 --> 05:31.480 architectural ownership, the Android code slowly seat back into the legacy system. 05:31.480 --> 05:33.040 So what do we have now? 05:33.040 --> 05:38.880 The picture roughly shows what we have, certainly at a high level, the code base doesn't 05:38.880 --> 05:45.880 appear very modular. 05:45.880 --> 05:50.760 Arguably, it's even more complex and harder to maintain than it was before. 05:50.760 --> 05:56.560 In practice, there's no clear separation between the legacy code and the Android feature. 05:56.560 --> 06:00.560 So next, let's discuss what we learned from our first attempt. 06:00.560 --> 06:05.760 I keep rediscovering that talking to people takes time and there's no shortcut. 06:05.760 --> 06:10.760 Real-time, meetings, follow-ups, side conversations, and the bigger the organization, 06:10.760 --> 06:12.960 the slower that loop gets. 06:12.960 --> 06:18.720 We also can't rely only on who speaks up first, early voices are often smart and well-intentioned, 06:18.720 --> 06:20.880 but they don't represent everyone. 06:20.880 --> 06:25.480 To build real consensus, we have to actively solicit feedback, especially in one of the 06:25.480 --> 06:28.000 ones and smaller forums. 06:28.000 --> 06:33.160 To succeed, we need strong commitment, both from engineers and managers, to the architecture 06:33.160 --> 06:36.240 and to the specific boundaries we're proposing. 06:36.240 --> 06:42.120 Without that commitment, engineers naturally revert to old patterns and architectural compromises 06:42.120 --> 06:47.480 creep back in, and without management support, short-term delivery pressure wins over a long-term 06:47.480 --> 06:48.480 structure. 06:48.480 --> 06:50.560 Here's a small example. 06:50.560 --> 06:56.480 When I said modules, I meant architectural modules, cohesive code with clear boundaries, 06:56.480 --> 07:00.160 but some people thought I meant go modules. 07:00.160 --> 07:05.160 Nobody was wrong, we just weren't using the same language and it took a face-to-face conversation 07:05.160 --> 07:07.280 to realize that. 07:07.280 --> 07:12.320 We saw the hard part was the architecture, it turned out the hard part were the conversations. 07:12.320 --> 07:14.600 Let's move on to the second lesson. 07:14.600 --> 07:16.800 What is an architectural change anyway? 07:16.800 --> 07:20.400 That sounds like a straightforward question, but in practice, it turns out to be one of 07:20.400 --> 07:23.800 the most political questions in engineering. 07:23.800 --> 07:26.000 Here's what we saw in reality. 07:26.000 --> 07:30.080 Social changes in our code base rarely showed up as architecture work. 07:30.080 --> 07:32.680 Instead, they went in as part of feature work. 07:32.680 --> 07:36.960 If you framed your changes part of feature work and included it in a PR with other feature 07:36.960 --> 07:42.000 changes, it moved, even if it broke existing architectural patterns. 07:42.000 --> 07:45.880 After all, we needed to get all the feature changes in quickly, features of what make money 07:45.880 --> 07:47.320 for the company. 07:47.320 --> 07:52.080 But if you framed your changes as a standalone architectural improvement, it's stalled. 07:52.080 --> 07:55.080 It's stalled because it was prioritized behind feature work. 07:55.080 --> 07:58.640 It's stalled because engineers wanted to have bigger discussions around it, and it's 07:58.640 --> 08:03.320 stalled because it wasn't providing immediate value to the company. 08:03.320 --> 08:06.040 So what's the main lesson here about architectural changes? 08:06.040 --> 08:11.040 If you don't define how architecture is allowed to change, it changes anyway and often it 08:11.040 --> 08:12.800 changes behind your back. 08:12.800 --> 08:15.520 Let's move on to the third lesson. 08:15.520 --> 08:18.280 One thing we're missing was clear ownership. 08:18.280 --> 08:23.880 In a modular system, not every API endpoint belongs cleanly to a single service. 08:23.880 --> 08:28.240 Some endpoints mostly orchestrate work across multiple services, and this was true for 08:28.240 --> 08:32.640 us because the new Android feature still needed to talk to legacy code. 08:32.640 --> 08:36.680 The mistake was leaving that orchestration on owned. 08:36.680 --> 08:41.320 Every API point still needs a clear home, a specific module and a specific team that's 08:41.320 --> 08:43.080 responsible for it. 08:43.080 --> 08:48.720 One of the endpoints is just glue, someone has to own that glue. 08:48.720 --> 08:50.200 So where are we at this point? 08:50.200 --> 08:52.240 The first attempt didn't quite succeed. 08:52.240 --> 08:56.760 We still have the pain points of a large, tightly-coupled code base, but hopefully we learned 08:56.760 --> 08:57.760 a few things. 08:57.760 --> 08:59.400 Now we have experience. 08:59.400 --> 09:01.680 So where to go from here? 09:01.680 --> 09:05.520 At this point, I realized the importance of the people part in architecture. 09:05.520 --> 09:09.200 And I felt like I couldn't convince the company on my own. 09:09.200 --> 09:11.960 I still didn't feel like an architecture expert. 09:11.960 --> 09:13.920 The first attempt was kind of a mess. 09:13.920 --> 09:18.960 So why would other engineers take my next proposal seriously and not see it as just another crazy 09:18.960 --> 09:19.960 idea? 09:19.960 --> 09:23.960 I needed someone else's help, I needed something like an appeal to authority, I needed 09:23.960 --> 09:27.680 to find someone else that had already done this transition. 09:27.680 --> 09:30.520 And I needed to analyze the way they approached it. 09:30.520 --> 09:35.760 Unfortunately, I did not find any large open-source go-projects making a transition to a 09:35.760 --> 09:41.400 module on the left, so as far as I know, where the first large open-source go-project 09:41.400 --> 09:42.400 doing this. 09:42.400 --> 09:44.160 Yay, us. 09:44.160 --> 09:48.080 The next best reference was GitLab. 09:48.080 --> 09:53.120 GitLab runs a Ruby on Rails model list, which so they decided to break up into a modular 09:53.120 --> 09:54.120 model list. 09:54.120 --> 09:59.320 They documented the decision publicly, and it turned my proposal from a crazy idea into 09:59.320 --> 10:00.880 a proven pattern. 10:00.880 --> 10:02.720 We weren't trying to invent something new. 10:02.720 --> 10:06.880 We were trying to follow a trail that was already there. 10:06.880 --> 10:08.600 Next, how to start? 10:08.600 --> 10:13.680 This time there was no obvious new feature to anchor the modularization work. 10:13.680 --> 10:17.000 So we had several internal discussions with the engineering team. 10:17.000 --> 10:20.720 We aligned on the main-driven design and bounded context. 10:20.720 --> 10:25.160 But the hard question was where to draw that first boundary. 10:25.160 --> 10:31.240 I looked at our code organization, had a few conversations with my AI friend, and proposed 10:31.240 --> 10:33.960 several ways to slice the model list. 10:33.960 --> 10:37.280 As you can imagine, this process is a lot to get your head around, trying to tease 10:37.280 --> 10:40.800 a part of a 10-year-old system into self-contained modules. 10:40.800 --> 10:43.160 I felt like I was barely scratching the surface. 10:43.160 --> 10:47.360 I felt like I was truly missing a lot of the details and edge cases. 10:47.360 --> 10:51.560 There was a lot of unknowns starting a modularization effort with anything beyond a bare 10:51.560 --> 10:53.360 minimum seemed risky. 10:53.360 --> 10:58.480 Although all the engineers generally agreed that we wanted to do an incremental approach, 10:58.480 --> 11:01.960 we ultimately agreed on activity audit. 11:01.960 --> 11:05.760 This context is about recording things that had already happened in our system. 11:05.760 --> 11:09.480 For example, recording when a new user was created, when a new host enrolled, when 11:09.480 --> 11:13.440 a key configuration was changed, and a ton of other activities. 11:13.440 --> 11:18.680 Next, I wrote up a detailed architectural decision record and put it up for review by the 11:18.680 --> 11:19.680 team. 11:19.680 --> 11:23.160 Here's the link in case someone can't wait to dive into the details. 11:23.160 --> 11:25.880 Again, this ADR was framed as a pilot. 11:25.880 --> 11:31.160 This is the first bounded context we were creating before a bigger rollout. 11:31.160 --> 11:33.800 Several times, I wonder where this whole thing was going to go anywhere. 11:33.800 --> 11:38.200 Sometimes I felt like everything was on board, other times I felt like I kept running 11:38.200 --> 11:40.040 into a brick wall. 11:40.040 --> 11:44.520 I also felt I was deep in the weeds and details of this re-architecture. 11:44.520 --> 11:48.680 I didn't quite see the full picture myself, and I had a suspicion that other engineers 11:48.680 --> 11:49.680 didn't either. 11:49.680 --> 11:54.320 I needed to convince myself and others why we were doing this. 11:54.320 --> 12:00.280 I needed an argument, and I needed to use it not just once, but every chance I got. 12:00.280 --> 12:04.080 So when I talk to the team, I roughly framed it like this. 12:04.080 --> 12:07.160 Every successful engineering org eventually hits the same wall. 12:07.160 --> 12:11.720 The monolith grows beyond what humans can safely understand. 12:11.720 --> 12:16.280 At that point, teams start colliding to each other's changes, ownership gets fuzzy, and 12:16.280 --> 12:18.680 small changes become risky. 12:18.680 --> 12:22.280 We're already seeing that, and that's not the failure of the people, it's a failure of 12:22.280 --> 12:24.200 the structure. 12:24.200 --> 12:28.520 Modularization isn't about process or slowing anyone down, it's about creating clear 12:28.520 --> 12:33.600 boundaries and ownership, so teams can move fast with confidence as the system continues 12:33.600 --> 12:34.600 to grow. 12:34.600 --> 12:38.600 All right, now let's talk about the actual architecture. 12:38.600 --> 12:42.600 The high level idea is almost the same as where we started with the first attempt, then 12:42.600 --> 12:48.080 you feature or new bonded context will be in its own module with its own controller, service, 12:48.080 --> 12:49.600 and persistent layers. 12:49.600 --> 12:54.280 The controllers and service will be able to call methods on other services in other bounded 12:54.280 --> 12:55.280 contexts. 12:55.280 --> 13:01.040 However, they will not be able to access the persistence layer of another bounded context. 13:01.040 --> 13:05.720 Here's the directory structure, directly out of the bounded context, the service, and 13:05.720 --> 13:12.000 my SQL directories match the API layer and the persistence layer from the previous diagram. 13:12.000 --> 13:17.800 The integration test down here should be able to fully test this bounded context without 13:17.800 --> 13:21.200 requiring parts from the rest of the application. 13:21.200 --> 13:25.760 Using this bounded context with other bounded contexts will be done at a higher level integration 13:25.760 --> 13:26.760 test. 13:26.760 --> 13:31.040 I can talk about the decisions that went into this for a while, so catch me after the session 13:31.040 --> 13:33.160 if you'd like to dive deeper. 13:33.160 --> 13:37.640 Now let's talk about the decision not to split the database schema. 13:37.640 --> 13:42.720 When people here module a monolith, they often assume that means multiple databases. 13:42.720 --> 13:46.400 We deliberately did not do that, we kept the single schema. 13:46.400 --> 13:50.520 Our problems were database scale or load, they were unclear ownership, broad service 13:50.520 --> 13:53.800 layers, and code that was hard to reason about. 13:53.800 --> 13:57.680 Splitting the database wouldn't have fixed that, it just would have added operation 13:57.680 --> 14:01.080 and complexity and new failure modes. 14:01.080 --> 14:05.320 Splitting a database is one of the most expensive architectural decisions you can make, and 14:05.320 --> 14:06.920 it freezes boundaries. 14:06.920 --> 14:10.080 We were still learning where those boundaries should be. 14:10.080 --> 14:14.840 So modular, so we modularized the code first, preserved optionality, and decided with 14:14.840 --> 14:20.680 only split the database later if scaling actually required it. 14:20.680 --> 14:24.640 Now everywhere we turn, we heard recommendations to use ports and adapters. 14:24.640 --> 14:30.080 However, a lot of our business logic lives in SQL, so pretending the database is just 14:30.080 --> 14:32.880 an adapter doesn't make sense for us. 14:32.880 --> 14:37.120 Ports and adapters doesn't work for our system as a whole, but one idea for mid-fits 14:37.120 --> 14:39.080 extremely well. 14:39.080 --> 14:43.640 That idea bounded context do not share the main models directly. 14:43.640 --> 14:49.320 When you cross-context communication must go through an explicit own contract, in practice 14:49.320 --> 14:51.720 that means explicit module boundaries. 14:51.720 --> 14:55.960 Now, earlier I talked about how we ended up with large packages. 14:55.960 --> 15:00.840 When a shared common package is used for cross-context communication, it creates a specific 15:00.840 --> 15:01.840 problem. 15:01.840 --> 15:06.640 When two bounded contexts share the same common types, they are no longer independent, 15:06.640 --> 15:11.240 a change in one context silently changes the behavior of the other, at that point the 15:11.240 --> 15:14.880 boundary only exists in our heads, not in the code. 15:14.880 --> 15:19.600 So for cross-context communication, shared packages aren't reuse instead they're hidden 15:19.600 --> 15:21.480 coupling. 15:21.480 --> 15:26.480 That's why we require explicit own contracts at context boundaries. 15:26.480 --> 15:31.200 When one bounded context needs something from another, it doesn't reach into its internals, 15:31.200 --> 15:36.680 instead the providing context exposes a small explicit interface along with the types 15:36.680 --> 15:39.360 that belong to that interface. 15:39.360 --> 15:45.040 Those interfaces and types are owned by the provider, they live with the bounded context 15:45.040 --> 15:47.520 and they evolve on its terms. 15:47.520 --> 15:51.440 Other contexts can depend on the contracts but not on the implementation. 15:51.440 --> 15:55.840 This is what makes the boundary real in code not just in our heads. 15:55.840 --> 16:00.960 Now sometimes a new bounded context still needs data or behavior that lives in legacy code. 16:00.960 --> 16:05.680 In those cases, we don't let the new code talk to legacy directly, instead we introduce 16:05.680 --> 16:09.080 an anti-corruption layer or ACL. 16:09.080 --> 16:12.400 The ACL is the only place that understands both worlds. 16:12.400 --> 16:17.320 It translates legacy concepts, types and quirks into something that new contexts can work 16:17.320 --> 16:18.800 with safely. 16:18.800 --> 16:23.280 Notice that this ACL package is outside the new bounded context. 16:23.280 --> 16:27.240 It keeps legacy semantics from leaking into the new code and then gives us a clear 16:27.240 --> 16:31.280 scene we can replace or delete later. 16:31.280 --> 16:34.920 The goal of the ACL isn't elegance, it's containment. 16:34.920 --> 16:37.160 Now let's continue with the story. 16:37.160 --> 16:46.200 Before we change the new code, we had one big issue. 16:46.200 --> 16:52.680 By and if this was going to work, we didn't want this to feel like something bigger 16:52.680 --> 16:57.160 implemented and we just had to live it. 16:57.160 --> 17:00.720 Our software engineers needed to feel like the end result was theirs. 17:00.720 --> 17:04.600 Not something imposed, not something slipped in quietly. 17:04.600 --> 17:07.720 Something they actually owned. 17:07.720 --> 17:11.320 We left the ADR open for discussion for about four weeks. 17:11.320 --> 17:14.120 This was longer than any ADR we'd done before. 17:14.120 --> 17:18.640 We wanted time for people to read it, question it, disagree with it and sit with it. 17:18.640 --> 17:23.480 The goal wasn't past the approval, the goal was shared understanding. 17:23.480 --> 17:26.840 Next we made a deliberate decision about reviews. 17:26.840 --> 17:30.600 Every pull request related to this ADR would be reviewed by the four tech leads from 17:30.600 --> 17:31.600 each product group. 17:31.600 --> 17:32.880 They did two things. 17:32.880 --> 17:37.280 Of course, if make sure they were a genuinely unbored, second it meant their teams were 17:37.280 --> 17:40.120 represented, not surprised later. 17:40.120 --> 17:44.280 Surprise is a great for birthdays, not for architecture. 17:44.280 --> 17:49.040 Architecture stopped being something decided by one person and started being something carried 17:49.040 --> 17:51.080 by the org. 17:51.080 --> 17:53.720 We also didn't want the implementation to be rushed. 17:53.720 --> 17:58.640 The work followed our normal design process and we made progress visible to everyone. 17:58.680 --> 18:04.920 Visibility turned a risky architectural shift into a series of small understandable steps. 18:04.920 --> 18:08.720 Now at this point, it may seem like we were putting a lot of process in place. 18:08.720 --> 18:12.360 It probably sounds like we were saying, we need to have everything figured out before we 18:12.360 --> 18:13.360 start. 18:13.360 --> 18:16.040 And to be honest, that felt a little backwards. 18:16.040 --> 18:21.120 This first bounded context was supposed to be a pilot, not a well-engineered masterpiece. 18:21.120 --> 18:25.800 And a pilot is supposed to discover things, which means sometimes taking a step backward, 18:25.800 --> 18:28.560 only to backtrack when things aren't working. 18:28.560 --> 18:33.320 If we already had all the answers, if we were able to lay out all the steps from this 18:33.320 --> 18:38.160 for this refactoring top-to-bottom, then we wouldn't need a pilot, right? 18:38.160 --> 18:42.760 This means we needed to do a POC, a proof of concept, to discover things that we didn't 18:42.760 --> 18:43.760 know about. 18:43.760 --> 18:48.160 The goal of the POC was to expose hidden coupling and understand which boundaries were 18:48.160 --> 18:50.200 actually possible. 18:50.200 --> 18:55.000 So I started with a proof of concept, not a full implementation, just a scaffold, the idea 18:55.000 --> 18:56.000 was simple. 18:56.000 --> 18:58.600 I made the basic shape of a bounded context. 18:58.600 --> 19:03.840 That means using some shared utilities for HTTP, some middleware, basic my SQL access, 19:03.840 --> 19:08.480 nothing fancy, just enough structure for a dummy, hello world, and point. 19:08.480 --> 19:12.720 Then I ran a dependency analysis and that's where things got interesting. 19:12.720 --> 19:19.200 That simple scaffold was pulling in half our monolith, not because the domain needed it, 19:19.200 --> 19:23.000 this hello world domain of the scaffold didn't mean anything yet, but because all of 19:23.000 --> 19:27.640 these utility packages did, the glue code was the monolith. 19:27.640 --> 19:32.840 The things we thought were harmless helpers were actually acting like dependency magnets. 19:32.840 --> 19:36.720 And there was another issue hiding in plain sight, a God package. 19:36.720 --> 19:43.160 Anytime someone needed a new type, I did nowhere, belonged, it went into this fleet package 19:43.160 --> 19:45.000 at the bottom. 19:45.000 --> 19:49.160 Over time that large fleet package picked up its own dependencies, which were not shown 19:49.160 --> 19:54.040 on the previous slide, which meant everything that imported it, picked those up too, 19:54.040 --> 19:58.520 and that's made circular dependencies incredibly easy to create and incredibly hard to 19:58.520 --> 19:59.920 reason about. 19:59.920 --> 20:04.960 That fleet package existed to make things easier, but it ended up doing the opposite. 20:04.960 --> 20:09.360 At that point it became clear we couldn't just extract a bounded context right away, 20:09.360 --> 20:14.880 we were missing a layer, before we could pull anything out, we needed a platform layer, 20:14.880 --> 20:20.320 code that handled infrastructure concerns, HTTP helpers, database wiring, middleware, 20:20.320 --> 20:23.640 without dragging the main assumptions along with it. 20:23.640 --> 20:27.600 We needed to differentiate between two types of code. 20:27.600 --> 20:32.400 The main code should express business rules and platform code should make those rules possible 20:32.400 --> 20:33.800 to run. 20:33.800 --> 20:39.280 Before we could modularize the domain, we had to modularize the foundation. 20:39.280 --> 20:43.280 Another issue we had to solve from our first attempt was that even the good documentation, 20:43.280 --> 20:49.360 even the ADRs, even the buy-in, nothing stops coupling from sneaking back in, especially 20:49.360 --> 20:54.000 in a large code base, and especially nowadays when changes aren't only written by humans, 20:54.000 --> 21:00.400 but also by AI coding agents that optimize for make it work and not preserve architecture. 21:00.400 --> 21:05.840 Architecture that lives in people's heads doesn't scale, and it doesn't survive time. 21:05.840 --> 21:11.280 That's why we realized we needed architectural checks. 21:11.280 --> 21:16.880 Sometimes known as fitness functions. 21:16.880 --> 21:22.960 Automated checks that run continuously check that don't care who wrote the code, human or AI. 21:22.960 --> 21:28.080 Specifically, we needed checks to enforce boundaries, which packages can depend on which, 21:28.080 --> 21:33.120 what the platform layers around allow to import, what a bounded context is not allow to 21:33.120 --> 21:37.600 import, because once those rules are explicit, they stop being simply opinions. 21:37.600 --> 21:39.800 They become executable. 21:39.800 --> 21:46.600 Now all of that tooling matters, but let's circle back to the core problem we're trying to solve. 21:46.600 --> 21:52.040 Looking back, the real constraint wasn't go, it wasn't my sequel, and it wasn't even the monolith. 21:52.040 --> 21:54.200 The real constraint was human attention. 21:54.200 --> 21:59.560 How much of a system, one person can hold in their head and how confidently they can make a change. 21:59.560 --> 22:06.600 Once a system grows beyond that limit, progress slows, no matter how good the engineers are. 22:06.600 --> 22:10.680 We often talk about architecture as it's as if it's for computers. 22:10.680 --> 22:14.920 But computers don't care how big your code base is, humans do. 22:14.920 --> 22:17.560 Architecture is really an interface for people. 22:17.560 --> 22:23.400 It defines what you need to understand, what you can safely ignore, and what you're allowed to change. 22:23.400 --> 22:27.080 When that interface is unclear, every change feels risky. 22:27.080 --> 22:30.680 When it's clear, people move faster, even in a large system. 22:31.640 --> 22:36.920 For a long time, I thought the hard part was architecture or code or tooling. 22:36.920 --> 22:42.440 It was the hardest part was alignment, shared language, shared understanding, and shared ownership. 22:43.640 --> 22:48.760 Modularization isn't really about modules, it's about respecting the limits of human attention, 22:48.760 --> 22:52.840 and designing systems people can understand, trust, and change without fear. 22:53.560 --> 22:59.000 Tools matter, patterns matter, but architecture lasts only when it's owned by more than one person. 23:01.000 --> 23:03.800 So before I wrap up, I want to pause in these. 23:03.800 --> 23:06.440 These are some of the teams using fleet introduction today. 23:06.440 --> 23:09.160 They're very different organizations with very different constraints. 23:09.720 --> 23:13.720 We learned a lot from working with customers like these, and many of the lessons I talk about 23:13.720 --> 23:17.640 come from operating in this kind of scale under real conditions. 23:19.080 --> 23:20.200 That's it for this talk. 23:20.200 --> 23:23.320 If this story resonated with you, I'd love to hear about your own. 23:24.040 --> 23:26.520 Problems are rarely neat, even when the code bases are. 23:27.400 --> 23:31.960 Here's a few links about fleet. We are hiring go developers right now, 23:33.320 --> 23:35.960 and you can find some links about me. 23:37.000 --> 23:38.040 And thanks for listening. 23:39.000 --> 23:39.800 Thank you very much.