I just released software and it’s riddled with bugs. What can I do?

on March 11, 2020

Picture this: You just released code into the production environment, but shortly after, realize that you’ve also injected a number of defects into production. (Yeah. We know. That never happens 😊) So, how do you recover? At Lighthouse, we recommend taking a 3-phased approach. Let’s take a quick look, then dive deeper into each phase.

Phase 1: Emergency Triage of Critical Bugs (system crashes or major functionality is broken) – What can you do quickly to ensure your customers can still use your software?
Phase 2: Understand Scope and Severity of Major Bugs – What tasks do you and your team need to complete to resolve the defects before attempting another deployment into production?
Phase 3: New processes – What measures and systems do you need in place to ensure this doesn’t happen again?

Phase 1: Emergency Triage

First thing’s first; ask yourself, “Is it in our best interest to revert back to the previous version of this software, or keep the defects live and fix in the present?” More often than not, leadership will need to be involved in this decision, or at least notified so they can provide recommendations.

Once there’s leadership buy-in, take action as quickly as possible. If you choose to fix the live version, we recommend alerting your developers and testers and bring them all into a “Tiger Team” emergency meeting. Start by clearly communicating that this was a failure of process, not people, and everyone is here to focus on fixing these known issues. Don’t allow anyone to play the “blame game”; it will kill your morale. You’ll have plenty of time later to figure out how it happened.

Now, clearly identify/demonstrate the bugs, and work collaboratively on real-time solutions. Determine if the bugs can be fixed very quickly or if you need to publish a workaround. Get buy-in from your product owner/leadership on the proposed solution or workaround before implementing. Then, go to town and make it happen. While the developers are coding the updates, the testers should be building test cases that can ensure the fix is correct.

This will generally result in an emergency patch. Do your best to avoid incorporating lower priority bugs in this patch, as this will often complicate the release, cause other bugs, and confuse your customers.

Phase 2: Understand Scope of Major Bugs

Now that you are done profusely sweating over the critical bugs, take a breath and begin to assess the remaining major and minor bugs. We recommend running a quick and dirty impact assessment so your leadership is well informed and customer support teams can effectively communicate the impact to your customers.

Work with your product owner and plan these into an upcoming point release, and quickly work them into a sprint. If you can, avoid the temptation to disrupt your active sprint because it often just causes churn in your sprint team. Just like all other backlog, estimate the story points (or t-shirt size them).

Phase 3: New Processes

This final phase boils down to learning from mistakes so you can prevent the same ones in the future. It’s about moving from being reactive to proactive. It’s a core belief for us at Lighthouse that people are not the problem when something goes awry; it’s the process that failed the people. In all likelihood, the issue came about because there wasn’t a trigger or mechanism in place to prevent it.

Here are some things that we recommend you put in place to ensure that your production deployments are always successful (OK, that might have been a stretch, but you know what we mean).

Test Automation: Before you build automated test cases, make sure you understand why you chose to automate certain aspects of your application. Don’t apply a broad brush to test automation before understanding the investment and how to maximize that investment with test automation. Typically, test automation maximizes ROI in areas that are weak or have a large number of dependencies.
Test Sooner: Involve your testers earlier in your SDLC process. Have them do quality checks on user stories that are still in your dev team’s runway and haven’t made it into the Sprint yet. Also, have your testers work with your business analysts and Scrum Masters to ensure that they are thinking of alternatives when writing acceptance criteria, business rules, and when considering how the individual user stories contribute to the overall feature. High-quality user stories result in fewer bugs and rework. Utilize tools and processes to assure high-quality code is being developed. These include cyclomatic complexity analysis, code reviews, regular lessons learned, and training.
Always Test:Give your testers access to a production-like environment where they can continuously test the application for usability and performance. Doing this early in the process and within this type of environment is a big investment, but worth it when your application remains scalable and reliable.

We’ve seen many clients struggle to improve the quality of their production releases. To help paint a picture, here’s a quick example of how we helped our client successfully triage their production issues, while also putting in place control measures.

This client had continuous problems with bug-laden releases, and their leadership was fed up, demanding they correct the problems. So, their solution was to delay their release until all bugs were corrected. This caused their planned monthly release to extend to 2-3 months, and as they extended the schedule, the product owner and leadership kept requesting they “slip some easy changes into the release.” As a result, the scope continued to increase, the complexity increased, the confusion increased, the time to deliver increased, and trust eroded between the leadership and the development team. Almost everyone was finger-pointing and doing CYA.

We did a quick assessment of the organization and discovered a number of challenges, most of which were related to people, culture, and poor leadership. As we helped peel back the layers of their challenges, we discovered they had really good people who wanted to do a good job and were reasonably technically competent.

We worked with them to ensure they could continue to deliver new functionality while we were helping them improve their approach. We helped them standardize how they prioritized defects and took them through Phase 1 – Emergency Triage, Phase 2 – Understand Scope or Major Bugs, and much of Phase 3 – Improve Processes (as described above).

When we weren’t helping on specific Sprints and releases, we helped them by educating the team and leadership on core software principles. We helped define their governance and provided some much-needed structure to their requirements management processes, so they began to capture all user stories and bugs in Jira. We taught them the importance of sizing their backlog items and established a process to prioritize their backlog into Sprints, point releases, and major releases. We delivered our True North™ metrics for the test engineers so they would know when the application is stable and ready for release.

Probably most importantly, we were successful in educating the leadership on the criticality of allowing the development team to be successful by focus on delivering specific (and pre-agreed) backlog items, executing the planned work, achieve a no-kidding “Done Done” state, which resulted in stable Sprints and eventually stable releases. It didn’t happen overnight, but as they all started tasting success, they wanted more success, and are continuing to improve their processes and holding firm to their priorities, governance, and structure. We continue to provide them reach-back software consulting to help them work through new issues, ultimately moving from a very reactive organization to one becoming proactive.

Frankly, we could write a book (or two) on this topic. So, if you’re putting out fires that made it into production or your software development feels really chaotic, reach out to us! We’ve helped organizations as small as 3 developers and as large as over 600 developers. We’re happy to help you think through how the above phases and tips apply to your organization.

{ 0 comments… add one }