<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[An engineers brain dump ~ Connor Avery]]></title><description><![CDATA[Everything's an engineering problem]]></description><link>https://cavery.dev</link><generator>RSS for Node</generator><lastBuildDate>Thu, 23 Apr 2026 00:16:49 GMT</lastBuildDate><atom:link href="https://cavery.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Architecting for "X"]]></title><description><![CDATA[Background
In recent years, I've worked across many varying architectures on AWS in a consulting capacity.
These architectures and the applications which ran on them were built to do different things. All of them come with fundamental trade-offs towa...]]></description><link>https://cavery.dev/architecting-for-x</link><guid isPermaLink="true">https://cavery.dev/architecting-for-x</guid><category><![CDATA[AWS]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Devops articles]]></category><category><![CDATA[Azure]]></category><category><![CDATA[GCP]]></category><category><![CDATA[engineering]]></category><category><![CDATA[Solutions architecture]]></category><dc:creator><![CDATA[Connor Avery]]></dc:creator><pubDate>Wed, 13 Dec 2023 18:49:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1702493274959/68c25145-bbbe-4dac-aa51-3f92dcc20508.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-background">Background</h2>
<p>In recent years, I've worked across many varying architectures on AWS in a consulting capacity.</p>
<p>These architectures and the applications which ran on them were built to do different things. All of them come with fundamental trade-offs towards what they were building for. Some organisations might want to be cost-effective, whereas others desire to be highly performant and accept a higher cost.</p>
<p>What you are building for is your "X" and architecting for that will inevitably come with trade-offs. Without knowing your "X", your architecture won't satisfy your requirements. Allow me to take you through the understanding of building good architecture for your "X".</p>
<p>What I'd like you to keep in mind when continuing to read this blog is that there is unfortunately no hard or fast set of rules you can apply to your architectural design: everything you decide has a cost in some regard and it's not always monetary.</p>
<p>This blog is inspired by a series of talks I've recently given at AWS Events in the UK. In these talks, I revisited concepts that, while familiar to some, are crucial for all engineers—whether they're just starting out, mid-career, or highly experienced. My aim is to encourage learning, provide a grounding in fundamentals, and offer deeper insights into the field.</p>
<p>Use this post to encourage discussion with this blog - Share it with your fellow engineers &amp; be ready to have productive conversations around your understanding of architecture and even how your current architecture is built.</p>
<h2 id="heading-architecture-who">Architecture, who?</h2>
<p>When I'm talking about Architecture, I mean your system's underpinning structure. Architecture covers a wide range of aspects, from how your services are hosted, where your data is stored, how you access data, who has access and your network topology.</p>
<p>Think of a skyscraper in your nearest city, without a good foundation, you can't build anything. Without an initial understanding of what you require upon that foundation, you'll struggle to build the next Burj Khalifa with severe issues and potential risks.</p>
<h3 id="heading-why-you-should-care">Why you should care</h3>
<p>Your architecture governs quite a lot without you realising, from simple things like your developer experience to more important aspects such as testability and security.</p>
<p>Whether you are starting from scratch, looking to implement new features or re-evaluating current architecture to better fit your "X", these three elements should cement (excuse the pun) the reason your architecture is important to you.</p>
<ul>
<li><p>Defines the 'work' ahead - Whether building from scratch or refactoring, any changes to architecture require work to achieve them.</p>
</li>
<li><p>Defines your domain boundaries - Not everyone cares about or needs access to every part of your system. Equally, neither does any one particular part of your system.</p>
</li>
<li><p>Defines the balance of your "X" - Whether that be performance, cost, success (fault tolerance) or other.</p>
</li>
</ul>
<h3 id="heading-architecture-software">Architecture  🤝 Software</h3>
<p>There is an interconnected nature between architecture and software. Think about a hypothetical application for a moment: the application has been programmed to read and write data from a MySQL database.</p>
<p>The decision as to how that data will be stored has largely been determined already, that is, it needs to be MySQL schema compliant or any changes will have a cost associated with changing it. Refactoring the software to work with another type of database and 'moving' the data to its new location.</p>
<p>In a parallel universe, that piece of software has yet to be written and the questions are just being asked of how and where it stores data, evaluating the use cases ahead of time.</p>
<p>How the data is stored, its access patterns and availability requirements may warrant a NoSQL database as the 'better' choice.</p>
<p>Other examples of this interconnected nature include:</p>
<ul>
<li><p>Highly available application - Data must be ephemeral in case of failure. Web 2.0 websites relied on local server storage a lot of the time and therefore weren't compatible with horizontal scaling: where the same data wasn't present on the server of other servers behind the loadbalancer.</p>
</li>
<li><p>Fault-tolerant application - Any individual part of the architecture could fail and the system still functions or "fails gracefully" as retry functionality, dead letter queues or other failure scenarios are handled by the architecture and software.</p>
</li>
<li><p>Whether actions are synchronous or asynchronous - You wouldn't want an API call (service A) waiting on an application that only runs on a schedule (service B). In this example, you might want to architect service B to not run on a schedule and in fact, be available 24/7. Or better yet, question why service A is synchronous - does it need to be?</p>
</li>
</ul>
<p>I won't touch into too much detail here, but the architecture &amp; the software that resides upon it, both influence each other when deciding many aspects, such as Server vs. Serverless or Monolith vs Micro-service. Your "X" plays a part in this decision.</p>
<p>I've written a blog on exactly those trade-offs recently, take a read (later of course) of <a target="_blank" href="https://cavery.dev/monoliths-in-the-21st-century-necessity-or-nostalgia">Monoliths in the 21st Century: Nostalgia or Necessity.</a></p>
<h3 id="heading-diagrams">Diagrams</h3>
<p>Architecture is often visualised to make it easier to understand and follow the flow of actions through the system as a whole. You'll have likely seen a diagram like the one below before [created on Diagrams.net]:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689098611688/4875a635-a0ad-4e41-96c8-76db79efea19.png" alt="A hypothetical ecommerce website systems architectural design using AWS services." class="image--center mx-auto" /></p>
<p>A lot could be desired from the diagram I've created for a hypothetical e-commerce website, however, the quality of the diagram and what it must include is like linting - everyone has an opinion and nothing will ever satisfy everyone.</p>
<p>What matters is that your architecture, or architectural diagram is understandable for yourself, your team and the people around you. Change it if not!</p>
<h2 id="heading-architectural-influences">Architectural Influences</h2>
<p>When looking to build your architecture, you must first understand the landscape you find yourself in.</p>
<p>There are elements you control, others you can influence and some you must simply abide by.</p>
<h3 id="heading-understand-your-landscape">Understand your landscape</h3>
<p>You might be working within a heavily regulated industry, where considerations such as deanonymisation of data is a legal requirement, or encryption can only be completed by a hardware security module and not by software-defined methods because the utmost security is required. This is the regulatory impact.</p>
<p>Your organisation may wish to impose mandatory requirements and restrictions, for instance, in this blog post I've referenced Amazon Web Services, your organisation may make it mandatory to use Microsoft Azure as the cloud solution of choice. The difference mightn't same great but there are pros and cons to consider on all sides. For instance, in a hypothetical situation, AWS offers more 'free' usage of managed services such as Lambda functions vs. Azure where Azure Functions offer no free invocations (this difference immediately affects your financial cost). This is the business decision impact.</p>
<p>Knowledge, previous experience and skills of yourself, your team and your organisation as a whole will also need to be understood. If you are designing or extending existing architecture, you're also restricted <em>slightly</em> by the existing infrastructure and applications if you can't re-design or refactor OR it mightn't be 'owned' by you and therefore can't be influenced. This is the restriction impact.</p>
<h3 id="heading-whats-your-x">What's your X?</h3>
<p>With your landscape understanding, you now have an empty canvas, where the borders have been defined and understood. Next is to understand your X, and what are you architecting towards.</p>
<p>Your 'Y' is the application you wish to make available and its infrastructure requirements - Which mightn't be defined yet.</p>
<p>The X comes in many shapes and sizes; but it is balanced between and not exclusive to any individual attribute: Cost, Performance and Success (the ability to handle failure).</p>
<p>You might find similarities between this pyramid of architecture with something else, the CAP theorem. An advantage for any particular side is a disadvantage for one or two of the others.</p>
<p>How you architect your system will continuously bring forward the questioning and reasoning against your X.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1701965544379/652a9209-3ed1-431a-b6f4-0484fb3e7406.png" alt class="image--center mx-auto" /></p>
<p>For example, a fully serverless application with relatively stable traffic may be the cheapest cost to run monthly, but due to the knowledge and skills required may cost more to develop vs. a conventional server (serverful) design.</p>
<p>In conjuction you don't need to be fully embedded in any particular 'camp', ie micro-service vs. monolithic, serverless vs. serverful. You can and should use each tool appropriately for how it best serves you and fits your X. Use the right tool for the job.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1701965782097/f86e0832-3e4a-4dca-a066-a239b69a0683.png" alt="Credit: Yan Cui, LinkedIn." class="image--center mx-auto" /></p>
<p>Credit: <a target="_blank" href="https://www.linkedin.com/in/theburningmonk/">Yan Cui on LinkedIn.</a></p>
<h3 id="heading-the-well-architected-framework">The Well-Architected Framework</h3>
<p>It'd be remiss of me to not call out Amazon Web Services' very own framework for creating robust and stable architectures for systems and the decisions you need to weigh up.</p>
<p>To not repeat what is already available, please visit the <a target="_blank" href="https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html">well-architected framework website</a>.</p>
<h2 id="heading-principles-of-great-architecture">Principles of Great Architecture</h2>
<h3 id="heading-the-best-thing-at-the-time">The best thing at the time</h3>
<p>As has been mentioned, your architecture can and will change depending on various factors not limited to what skills or knowledge you hold at the moment, the imposed SLAs and current requirements on your system.</p>
<p>What you build today may be perfectly suitable for now but the future has a vote and may warrant a change. That doesn't mean you've violated this principle if done correctly. If you've considered all factors possible, you've taken a step forward with the information you have at hand. The reflection principle will offer more information and offer insight to improve.</p>
<p>Don't get stuck in analysis paralysis, real usage will provide enough insight to confirm suspicions.</p>
<ul>
<li><p>What skills and knowledge do you have right now?</p>
</li>
<li><p>Do you know your real service level agreements?</p>
</li>
<li><p>What is your backup policy? Do you need one?</p>
</li>
<li><p>Are you repeating yourself?</p>
</li>
</ul>
<h3 id="heading-explainable-and-justifiable">Explainable and justifiable</h3>
<p><strong>"Brain before build" -</strong> Designing your system before writing any code is strongly advised!</p>
<p>Having resources such as documentation or diagrams for the design of your architecture and all associated reasonings for the choices you've made, will allow you to collaborate with others, validate assumptions and ensure you are architecting towards your X.</p>
<p>Don't underestimate the wildcard questions when designing architecture, which can be overlooked without thought-provoking questions.</p>
<ul>
<li><p>Are you re-inventing something that already exists? It could be that a SaaS product already does as you require.</p>
</li>
<li><p>Is it worth refactoring OR build new?</p>
</li>
<li><p>What could your future use cases be? How are you ensuring you aren't 'trapping' yourself?</p>
</li>
</ul>
<h3 id="heading-causes-reflection">Causes reflection</h3>
<p>Great architecture design, the aspect of designing and the final product will hopefully cause reflection on existing system processes. You may identify inefficiencies or areas for real improvement in some way because you understand your architectural influences and landscape.</p>
<p>Reflection happens at all stages of the design.</p>
<ul>
<li><p>Is what we're building achieving the original requirements?</p>
<ul>
<li>Are the requirements fixing the problem or masking it?</li>
</ul>
</li>
<li><p>Is it maintainable?</p>
</li>
<li><p>Is it testable?</p>
</li>
<li><p>Are you using the correct thing at the correct time?</p>
</li>
</ul>
<h2 id="heading-whats-to-come">What's to come?</h2>
<p>With everything we've discussed above, it's very likely that in the future we could see AI tooling which could calculate an approximate cost both monetarily and engineering timewise. Tooling such as this could heavily influence decisions of which architecture to choose for your "X".</p>
<p>Something else to consider is the growth of Software-as-a-Service products which will complete a wide range of requirements from applications and architecture and it may become more beneficial to purchase "off the shelf" than to build your own. It's also important when considering SaaS products that it's someone else's problem, you are simply gaining value.</p>
<h2 id="heading-still-confused-where-should-you-start">Still confused, where should you start?</h2>
<p>As mentioned earlier, it's very difficult to understand how your "X" will define your architecture without having a full context of your situation and the problem you are trying to solve.</p>
<p>Some people reside strongly in opposing thought camps; when starting with your architecture, some believe in going fully Serverless and therefore more likely "micro-service" architecture.</p>
<p>Whereas others believe it's quicker to produce a monolith and work backwards.</p>
<p>In my personal opinion, I don't care for either - I use what I feel fits best, with the available knowledge, tools at hand and my "X". I may be inclined to build a monolith if my "X" is speed to market (an "X" I've not spoken of so far) or be inclined to build a hybrid server-serverless architecture if the system requires it.</p>
<p>Whichever you feel comfortable with right now, start with that. You'll soon reflect and change for new requirements going forward and have a better understanding when you revisit your design. With more information, you'll be able to choose the best thing at that time once more.</p>
]]></content:encoded></item><item><title><![CDATA[The Engineers Playbook: Handling Incidents]]></title><description><![CDATA[Depending on your working environment, you may experience incidents differently from someone else. Every organisation has their way of dealing with incidents, how they triage them and what paperwork is required (hopefully not literally).
Despite the ...]]></description><link>https://cavery.dev/the-engineers-playbook-handling-incidents</link><guid isPermaLink="true">https://cavery.dev/the-engineers-playbook-handling-incidents</guid><category><![CDATA[engineering]]></category><category><![CDATA[incident response]]></category><category><![CDATA[incident management]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Connor Avery]]></dc:creator><pubDate>Thu, 27 Jul 2023 19:30:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1690390038060/69716149-259a-4276-8f9f-56ed37b28f2f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Depending on your working environment, you may experience incidents differently from someone else. Every organisation has their way of dealing with incidents, how they triage them and what paperwork is required (hopefully not literally).</p>
<p>Despite the differences, there's one thing in common, it's usually the responsibility of engineers &amp; DevOps to get back to business as usual as quickly as possible.</p>
<p>In my time as a Consultant Engineer, I've experienced various ways in which teams are built, route-to-lives operate and how incidents are found, reported and dealt with. In all of these various teams, I've had to deal with incidents, ie. something has unfortunately gone offline or users have suffered a degradation of the service they expect.</p>
<p>Incidents happen - Don't think for a moment that 'firefighting' doesn't happen in the likes of Google and Amazon. Heck, Amazon created the Well-Architected Framework after a disaster!</p>
<p>I want to offer up my thought process for dealing with incidents, from the perspective of the engineers and DevOps. As much as I'd not wish for an incident to happen to you, I know at some point it unfortunately will.</p>
<p>I believe this playbook will bring order to the chaos. Don't take it gospelly if you don't wish to, build upon it so it suits your needs, but whatever you do, don't create an unwieldy monster. Keep it sharp, consistent and useful.</p>
<h2 id="heading-elect-an-owner">Elect an owner</h2>
<p>As often in the chaos, various people are 'pulled in' to offer their expertise and insight. They could be members from within your team or wider depending on the incident.</p>
<p>Ultimately it doesn't matter who is there from the beginning, but in my experience the moment an incident is raised, it should be owned. Some incident management software requires owners to be specified, for the sake of the paperwork, this owner doesn't need to be an engineer.</p>
<p>When I'm saying an owner, I'm meaning the engineer leading the charge to handle the incident. Electing an owner doesn't have to be the individual with the most context in any particular area, but someone able to adapt their calendar and other obligations to resolve the most pressing matter.</p>
<blockquote>
<p>Too many cooks spoil the broth</p>
</blockquote>
<p>I love the quote 'Too many cooks spoil the broth', it sums up nicely that everyone is operating on an agenda of their own based on their experiences and knowledge, but ultimately that could be to the detriment of the goal. Gordon Ramsey elects a head chef to make the calls and final decisions for a reason.</p>
<p>Electing an owner isn't about giving someone an ego boost, it's about having a spearhead individual who can be at the receiving end of the hypothetical funnel of all of the context surrounding the incident.</p>
<h2 id="heading-gather-the-facts">Gather the facts</h2>
<p>As the owner, you need to gather the facts relating to the incident, paint a picture of the system in your mind and understand the effects that have happened. With this knowledge, you'll be primed to assess the impact and mitigate the problem.</p>
<p>A take on the typical who, what, where and when won't leave much to be desired when approaching the fact collection methodically.</p>
<ul>
<li><p>What has or hasn't happened that shouldn't or should have? - Journey through the system</p>
</li>
<li><p>What can you prove quickly? Using metrics, statuses and logs.</p>
</li>
<li><p>What are the SLAs?</p>
</li>
<li><p>What is the risk? To the business, to regulatory requirements or to security.</p>
</li>
<li><p>When could the action happen again?</p>
</li>
<li><p>When did it start? What was the condition of the system before this?</p>
</li>
<li><p>Who is affected?</p>
</li>
<li><p>...</p>
</li>
</ul>
<h2 id="heading-assess-the-impact">Assess the impact</h2>
<p>An issue may arise anywhere in a system, an incident could be raised for an effect caused as a byproduct. With the facts at hand, you are probably already narrowing your scope of investigation - but hold fire. If you narrow your view too much, you may lose sight of the root cause.</p>
<p>You must understand the system's chain of events and all available 'check points' - places where you can validate conditions and expected behaviour; to best equip yourself to identify the root cause.</p>
<p>In a micro-service architecture, one such checkpoint could be your service logs between every micro-service or a queue with messages waiting to process etc.</p>
<p>Here is an example to illustrate what I mean a little better.</p>
<details><summary>Example</summary><div data-type="detailsContent"><strong>System</strong>: An image hosting website. <strong>Action</strong>: A user has uploaded an image in the correct format. <strong>Effect</strong>: A scheduled job which compresses new images to save storage space has encountered an error as it is unable to process the file and compress the image appropriately, as the file is not actually an image.</div></details>

<p>In the example, we can see that a bug in an 'upstream' service was the cause but the effect has been felt in a downstream process.</p>
<p>Let's use the example and assess the incident with a narrow scope:</p>
<ul>
<li><p>What journey was affected? The image compression journey.</p>
</li>
<li><p>How many users were affected? Just the one, confirmed by monitoring and logs.</p>
</li>
<li><p>How was the business affected and any known risks? Low risk as there is negligible cost implications.</p>
</li>
<li><p>What SLA was affected? No SLA is expected in this process.</p>
</li>
</ul>
<p>Given the example provided you may be assuming the scheduled compression job is to blame.</p>
<p>What if I told you that the file was originally an image but in transfer during upload bytes of data were lost resulting in file corruption? You can begin to understand why a narrow scope shouldn't be your immediate reaction and the effect is not always the cause.</p>
<p>Retracing the steps of the system to the point of issue is a much better approach, checking all information available to you to check the status of the system is operating as it 'should'.</p>
<p>Using the example, if we were to retrace the steps, we may not have concluded <em>yet</em> what the cause of the issue was, but using the 'check points' available to you, you will have an idea whether the system operated as intended or didn't and up to what part of the journey.</p>
<p>Issues come from a wide range of causes, so unfortunately there isn't a concise enough decision tree to find the cause for you.<br />However, you can ask the following questions at every step you retrace to further shrink the scope.</p>
<p>With the facts you've gathered earlier, ask yourself the following:</p>
<ul>
<li><p>What journey is affected? Preliminary scope setting - Don't get too fixated though.</p>
</li>
<li><p>What high-level 'state' was the system in before and after the issues started? ie. Available vs. Offline</p>
</li>
<li><p>What metrics are abnormal? What (if any) alerts were fired? ie. Error rate increased.</p>
</li>
<li><p>Under what conditions would lead to the high-level state change and metrics becoming abnormal?</p>
</li>
</ul>
<p>As the owner, you might want to lean on other teams with context to areas outside of your knowledge or understanding, to best grasp the answers to such questions.</p>
<p>Communication is key and even better when done often!</p>
<h3 id="heading-also-have-you-checked">Also, have you checked</h3>
<ul>
<li><p>A recent release - Check your CI / CD pipelines for recent activity, the release calendar for dependency releases etc.</p>
</li>
<li><p>Configuration properties - Check recent commits for config in code changes. Does your environment config match?</p>
</li>
<li><p>Connectivity - DNS, Network routing, ACLs</p>
</li>
<li><p>Availability - A system was unavailable at the time, and dependent services had no failure tolerance</p>
</li>
</ul>
<p>Once you've identified the issue, even at a high level, you can begin to mitigate it.</p>
<h2 id="heading-mitigate-the-problem">Mitigate the problem</h2>
<p>There unfortunately isn't a magic wand here, your system is unique to you (and your company) and therefore only you (and them together) will be able to fully comprehend the problem to resolve it.</p>
<p>However, in my experience of dealing with incidents, these guidelines have worked well in the past to keep everyone informed, work the problem and resolve the issue swiftly.</p>
<ul>
<li><p>Who do you need to notify (stakeholders, users)?</p>
</li>
<li><p>Is the problem getting worse? ie. Continually happening or one-off.</p>
<ul>
<li><p>Can you stop the system safely temporarily to reduce the impact? 'Stop the bleed'</p>
</li>
<li><p>Can you disable a subset of functionality so that the whole system isn't compromised? Graceful degredation.</p>
</li>
</ul>
</li>
<li><p>Is a fix required to resume normal service behaviour?</p>
<ul>
<li><strong>Suggestion</strong>: Fix forward if it can be repaired in hours, and roll back if in days.</li>
</ul>
</li>
<li><p>Can a privileged user manually correct behaviour? (Breakcase scenario)</p>
</li>
</ul>
<p>If you've correctly assessed the situation and can begin processing the requirements to resolve the issue, you'll be out of the woods in no time.</p>
<h2 id="heading-retrospective-view">Retrospective view</h2>
<blockquote>
<p>Escape the forest so that you can see the wood from the trees.</p>
</blockquote>
<p>Once you've gotten past the issue, and mitigations have been completed to return the system to working order and correct any missteps, it is important to realise it isn't over. Luckily the chaos and pressure are behind you.</p>
<p>Just as retrospectives hold value at the end of agile sprints, I believe they hold value when looking back at incidents. They offer a no-blame, honest and open forum to highlight the multiple areas of improvement to the system, testing and release structure.</p>
<p>Anybody can organise a retrospective for incidents - but I believe it is best for the elected owner mentioned prior, to be available and in attendance. That valuable full-picture context will pay its dues here. Schedule this as soon as reasonably possible.</p>
<ul>
<li><p>Monitoring and alerting - Was it comprehensive enough for this incident? What about other teams and systems?</p>
</li>
<li><p>Testing - Are there any missing scenarios (automated) that could have identified this?</p>
</li>
<li><p>Release process - Could it be better controlled or simplified? Could you implement A/B releases &amp; testing?</p>
</li>
<li><p>Application fault tolerance - Retries, DLQ, graceful degradation</p>
</li>
<li><p>Architecture or application self-healing - Could your system automatically recover?</p>
</li>
<li><p>Synchronous vs. Asynchronous - Is something synchronous that would perform better asynchronously or vice-versa?</p>
</li>
</ul>
<p>Whatever you identify, you should categorise with a strong bias towards the highest priority. By that, I mean prioritising long-term improvements higher than future feature work. If your system can fail, for any known situation, you risk monetary, reputation and time losses.</p>
<h3 id="heading-fail-with-grace">Fail with grace</h3>
<blockquote>
<p>Everything fails all the time - Werner Vogels</p>
</blockquote>
<p>The philosophy of 'graceful degradation', which I aptly rephrase to 'fail with grace', is to build architecturally sound applications with built in redundancy and graceful degradation of service that minimises the likelihood of complete system failure.</p>
<p>In essence, maintaining the minimum functionality that can be supported in a system in the event of a failure while also ensuring recovery from such.</p>
<p>Even in your existing system, you can begin building applications and architecture to fail with grace today - it doesn't have to have been a decision at the inception of the system.</p>
<h2 id="heading-tldr-from-incident-inception-to-resolution">TL;DR From incident inception to resolution</h2>
<ul>
<li><p><strong>Elect an owner quickly</strong> - Too many cooks spoil the broth. Someone with a technical understanding to orchestrate the actions and remediations.</p>
</li>
<li><p><strong>Gather the facts, methodically</strong> - What hasn't happened that should? What can we prove quickly (metrics or errors)? What are the SLAs?</p>
</li>
<li><p><strong>Assess the impact</strong> - Which journeys on the system are affected? How many users? What is the risk? What teams need engaging for a 'context champion'?</p>
</li>
<li><p><strong>Mitigate the problem</strong> - Who do you need to escalate or notify (users)? How can you stop the 'bleed' right now? Is a fix required to get back to normal service behaviour?</p>
</li>
<li><p><strong>Retro &amp; build for 'fail with grace'</strong> - Your highest priority should be re-assessing the problem, how could you improve resiliency? Build systems to 'fail with grace'.</p>
</li>
</ul>
<h2 id="heading-read-further">Read further</h2>
<p><a target="_blank" href="https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html">10 Lessons from 10 Years of Amazon Web Services</a> - Werner Vogels</p>
<p><a target="_blank" href="https://sre.google/workbook/incident-response/">Google SRE - Incident Response</a></p>
<p><a target="_blank" href="https://principlesofchaos.org/">Principles of Chaos Engineering</a></p>
]]></content:encoded></item><item><title><![CDATA[Monoliths in the 21st Century: Necessity or Nostalgia]]></title><description><![CDATA[Allow me to set the scene, on Wed 17th May at the DTX expo, I was a panel member discussing the place and presence of monolithic applications within systems, now and going forward. There were a lot of interesting points raised by the other panel memb...]]></description><link>https://cavery.dev/monoliths-in-the-21st-century-necessity-or-nostalgia</link><guid isPermaLink="true">https://cavery.dev/monoliths-in-the-21st-century-necessity-or-nostalgia</guid><category><![CDATA[AWS]]></category><category><![CDATA[Microservices]]></category><dc:creator><![CDATA[Connor Avery]]></dc:creator><pubDate>Fri, 19 May 2023 08:35:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1690390900159/e7684144-f172-4fae-9487-9d18121413cd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Allow me to set the scene, on Wed 17th May at the <a target="_blank" href="https://dtxevents.io/manchester/en/page/dtx-manchester">DTX expo</a>, I was a panel member discussing the place and presence of monolithic applications within systems, now and going forward. There were a lot of interesting points raised by the other panel members and myself, regarding the huge momentum shift towards serverless micro-services and whether or not monoliths should be sentenced to /dev/null.</p>
<p>Allow me to cover a few of the headline points and express my opinion, in words, as to why I feel we are looking at the original question slightly incorrectly.</p>
<h2 id="heading-what-is-a-monolith">What is a monolith?</h2>
<p>Within the tech industry, there are many acronyms and terms banded around with the assumption that the receiving audience knows what it means. Monolith is one such term where everyone 'kind of' knows what it means but it can look and feel slightly different to everyone.</p>
<p>Previously, when everyone practised waterfall, monolithic applications were often the result of the engineering team's efforts. Now, they aren't too much different. Monoliths often have a single codebase, where all functionality, dependencies and modules are stored. Additionally, the modules are interdependent and therefore can't be deployed independently.</p>
<p>So thinking back to that time, or if you are like me and were still in nappies, think back to the tales your parents told you of the software in that era.</p>
<p>The software of that time was revolutionary when spreadsheet software was pioneering the way everyone did, well, everything. Each year to gain the newest version, you'd have to purchase that version outright and install it whole. This was the 'traditional' monolith - everything that was required came in one package.</p>
<p>Fast forward to the 21st century and the world has changed, produced some fantastic movies in the process (Fast &amp; Furious Tokyo Drift, bby!) and morphed the definition of a monolith slightly. We no longer build everything to be a 'desktop' application, we build for the web first (exceptions apply). This means that our users only need a browser to be able to use our product or service. As such, the definition of a monolith is now a designation of the applications and services which are the sum total or a large part of a wider system, for that web product or service.</p>
<p>For me, I would classify a monolith as any application which has more than a single responsibility, where responsibility is the smallest unit of a task. What do I mean by the term 'smallest unit of a task'? Take for example an API. For each API endpoint, that does something slightly different, I would class that as a task. It is the smallest unit of value available (note: not the same as a unit test). Monolithic API applications have, and will always exist. However, they also make for great candidates for micro-services which I'll touch on shortly.</p>
<h2 id="heading-every-monolith-isnt-a-monolith">Every 'monolith' isn't a monolith</h2>
<p>Now that we are working with a common understanding of the historical thought of what a monolith is. Over my years in the industry, what people class as a monolith is everything from 0 to it being a micro-service. I disagree that everything up-to full-fledged micro-service applications is monoliths. There is a middle ground.</p>
<p>Think of Monolith &lt;-&gt; Micro-service as a linear scale of application depth and complexity. You can say that additional names or stages are missing from that linear scale. However, naming anything on a scale would require it to have defined conditions as to when it is 'something' - not something I want to do right now. For the sake of this blog, let's imagine that the linear scale looks like this (no definition supplied - your warranty may vary):</p>
<p>Monolith &lt;-&gt; Macro-service &lt;-&gt; Micro-service</p>
<h2 id="heading-what-are-the-differences">What are the differences?</h2>
<p>Monoliths differ greatly compared to macro or microservices. Monoliths are typically deployed as a single unit, whereas microservices are independently deployed and there could be hundreds of units.</p>
<p>When it comes to scalability, you must scale the entire monolithic application to meet your needs, even if only a small subset of functionality is receiving high demand. As you've probably guessed, this is wasteful of resources. Microservices in contrast are already, loosely coupled and designed to be independently scalable. Allowing for much quicker responsive scaling.</p>
<p>As mentioned prior, it is often the case that monolithic applications are held in a single codebase. A microservice approach would house each part of the functionality in a separate codebase.</p>
<h2 id="heading-what-place-do-they-have-in-the-21st-century">What place do they have in the 21st Century?</h2>
<p>Defining where a monolith truly belongs would not be the right thing to do. Your system, the problem you are trying to solve and the way you work are all factors in the equation and will always be (slightly) different to everyone else. Therefore there is no hard and fast rule of 'Monoliths should only be created for X problem'.</p>
<p>However, some problem cases lean more towards building an application of a monolithic nature than a micro-service nature. You may want to ask yourself the following, "For the problem I am trying to solve...":</p>
<ul>
<li><p>What are the various different stages or tasks it must do?</p>
</li>
<li><p>What are the SLAs on the completion of those tasks?</p>
</li>
<li><p>How resilient to failure should it be?</p>
</li>
<li><p>Could any of its tasks be done in parallel?</p>
</li>
<li><p>Are the tasks event-driven in nature or simply timed / batch events?</p>
</li>
<li><p>What is the team size, skills and knowledge?</p>
</li>
</ul>
<p>These high-level questions are not meant to determine whether the result should be monolithic or micro-service, but they do indicate which direction to lean. If you lean toward the micro-service side but your application doesn't fit the definition of a micro-service, it would be classified as a macro-service. It's a step in the right direction!</p>
<h2 id="heading-i-have-a-monolith-what-can-i-do">I have a monolith, what can I do?</h2>
<p>You may already have a monolith(s) in your system. If it works perfectly fine, tests pass and it achieves the value you want it to do - that is perfectly fine! (For now) You should not be in a hurry to move to the latest technological trends, you need to evaluate every trend for it's pros vs. cons and consider the cost and effort required in order to achieve those.</p>
<p>You should reflect on your monolith and ask yourself these questions:</p>
<ul>
<li><p>Are you able to maintain your application easily? (Adding new features, fixing bugs)</p>
</li>
<li><p>Does your application scale easily?</p>
</li>
<li><p>Are you able to quickly deliver new value to your customers?</p>
</li>
</ul>
<p>If any of your answers is no, it might be time to consider giving the application some attention. The journey a monolith must take to become a micro-service is difficult to define, your problem and obstacles will be different to anyone else.</p>
<p>You could consider the following when determining the journey ahead:</p>
<ul>
<li><p>Splitting the monolith into a modulated monolith</p>
</li>
<li><p>Look to reduce the scope of what the monolith is to achieve</p>
</li>
<li><p>Consequently, make smaller applications/services for the scope you've moved out of the monolith</p>
</li>
<li><p>Re-build the application from the ground up in a micro-service approach</p>
</li>
</ul>
<h2 id="heading-i-only-use-micro-services-i-dont-even-need-a-monolith">I only use micro-services, I don't even need a monolith</h2>
<p>That statement may well be the case. The problem being solved can be done so purely by micro-services alone. It is still important, however, to understand when and where micro-services meet their limits.</p>
<p>Micro-services are fantastic but do come with their cons. It is always important to continually re-evaluate your system, how it is built and whether it continues to be fit for purpose for the value trying to be achieved. As you may have read recently, Amazon Prime Video recently re-evaluated its services and infrastructure and recognised that there were both performance and cost improvements to be made, despite using micro-services and serverless infrastructure. You can read more about that on the <a target="_blank" href="https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90">Prime Video Tech Blog</a>.</p>
<p>Both a blessing and a curse as you build a distributed system is that you build a 'spiders' web of loosely coupled, inter-connected services that resolve a task at a time. Often these tasks are part of a chain of events, once one task has been completed, the task service picks it up where it was left off and continues. Like a conveyer belt. To facilitate this, infrastructure such as object storage or a database is used to safely 'facilitate' the next step, as micro-services are built ephemeral. All of this incurs network calls, storage read and writes and ultimately service capacity limitations - in the form of quotas, maximum throughput or latency.</p>
<p>Some of these issues may never impact your system, as the problem you are aiming to solve doesn't introduce them.</p>
<p>As the Prime Video team recognised, by consolidating already 'small' tasks together, they can reduce the effects highlighted above. They may have moved away from being 'fully' micro-services, but by no means have they created a monolith like we know from history. They've moved ever so slightly left on the scale, towards a macro-service to reduce the effect of the cons. of micro-service and serverless architecture. They have retained many pros that micro-services offered them originally, as they could remain 'serverless'.</p>
<h2 id="heading-a-monolith-can-be-serverless">A Monolith can be Serverless</h2>
<p>One common misconception is that a monolithic application can only be deployed on infrastructure classified as 'serverful'. That is, you deploy it to a traditional server and must maintain that server or fleet and are unable to benefit from the many advantages Serverless infrastructure offers. This isn't the case.</p>
<p>There is no obstacle preventing a monolithic application from being containerized and hosted on a serverless container service like <a target="_blank" href="https://docs.aws.amazon.com/AmazonECS/latest/userguide/what-is-fargate.html">AWS ECS Fargate</a>. If your application exceeds the size limit to benefit from on-demand Serverless infrastructure like <a target="_blank" href="https://aws.amazon.com/lambda/">AWS Lambda</a>, you won't be able to take advantage of the additional benefits Lambda provides.</p>
<p>This is where micro-services shine, they are super small in their size as they've been built to complete a single unit or task of work. They are one of many applications, that make up a total of a service, system or product.</p>
<p>Ultimately you must consider the pros vs. cons of the infrastructure and what it can offer you, against what value you are trying to achieve.</p>
<h2 id="heading-monoliths-are-here-to-stay">Monoliths are here to stay</h2>
<p>Hopefully, the points highlighted here, which were covered (some briefly) in the panel discussion on Wednesday, offer food for thought when you think about the architectural structure of both your application and infrastructure. Additional, you should be able to form an opinion, based on my opinion that there is a linear scale between monolith &lt;-&gt; micro-service and we should stop thinking of them being so clean-cut.</p>
<p>In my opinion, monoliths are a necessity or a necessary evil. Their shape may be much different from that of the early '90s applications but we've all still got work to do. They require modernisation as much as possible in the form of modularisation, separation of concerns and shrinking the overall scope and size. These efforts will move the applications more towards the centre line of the linear scale and begin to reduce the impact of the cons and even gain some pros usually enjoyed by micro-services.</p>
<p>I would always advocate considering micro-services and Serverless first and foremost.</p>
]]></content:encoded></item></channel></rss>