Rules to Better DevOps - 23 Rules
If you still need help, visit our DevOps consulting page and book in a consultant.
You should know what's going on with your errors and usage.
The goal should be:
A client calls and says: "I'm having problems with your software."
Your answer: "Yes I know. Each morning we check the health of the app and we already saw a new exception. So I already have an engineer working on it."
Take this survey to find out your DevOps index:https://docs.google.com/forms/d/e/1FAIpQLSeYdMVMuWo1onEr688-BbGviCwJQjecgqAqi8-Bf91IotOOCw/viewform
Before you begin your journey into DevOps, you should assess yourself and see where your project is at and where you can improve.
Take this survey to find out your DevOps index: http://goo.gl/forms/NrJhwHeGu1
Once you’ve identified the manual processes in Stage 1, you can start looking at automation. The best tool for build and release automation is Azure DevOps.
Now that your team is spending less time deploying the application, you’ve got more time to improve other aspects of the application, but first you need to know what to improve.
Here are a few easy things to gather metrics on:
Application Logging (Exceptions)
See how many errors are being produced, aim to reduce this as the produce matures:
- Application Insights
- Visual Studio App Center(for mobile)
But it's not only exceptions you should be looking at but also how your users are using the application, so you can see where you should invest your time:
- Application Insights - https://rules.ssw.com.au/why-you-want-to-use-application-insights
- Google Analytics
- RayGun.io (Pulse)
Application/Server performance – track how your code is running in production, that way you can tell if you need to provision more servers or increase hardware specs to keep up with demand
Collecting stats about the application isn't enough, you also need to be able to measure the time spent in the processes used to develop and maintain the application. You should keep an eye on and measure:
- Sprint Velocity
- Time spent in testing
- Time spent deploying
- Time spent getting a new developer up to speed
- Time spent in Scrum ceremonies
- Time taken for a bug to be fixed and deployed to production
The last set of metrics you should be looking at revolves around the code and how maintainable it is. You can use tools like:
- Code Analysis
Now that you’ve got the numbers, you can then make decisions on what needs improvement and go through the DevOps cycle again.
Here are some examples:
For exceptions, review your exception log (ELMAH, RayGun, HockeyApp)
- Add the important ones onto your backlog for prioritization
- Add an ignore to the exceptions you don't care about to reduce the noise (e.g. 404 errors)
- You can do this as the exceptions appear, or prior to doing your Sprint Review as part of the backlog grooming
- You don't have to get the exception log down to 0, just action the important ones and aim to reduce the noise so that the log is still useful
- For code quality, add getting Code Auditor and ReSharper to 0 on files you’ve changed to your Definition of Done
- For code quality, add SonarQube and identify your technical debt and track it
- For application/server performance, add automated load tests, add code to auto scale up on Azure
- For application usage, concentrate on features that get used the most and improve and streamline those features
Often an incorrect process is the main source of problems. Developers should be able to focus on what is important for the project rather than getting stuck on things that cause them to spin their wheels.
- Are devs getting bogged down in the UI?
- Do you have continuous integration and deployment?
- Do you have a Schema Master?
- Do you have a DevOps Master?
- Do you have a Scrum Master?
Note: Anyway keep this brief since it is out of scope. If this step is problematic, there are likely other things you may need to discuss with the developers about improving their process. For example, are they using Test Driven Development, or are they checking in regularly, but all this and more should be saved for the Team & Process Review.
DevOps and Scrum compliment each other very well. Scrum is about inspecting and adapting with the help of the Scrum ceremonies (Standup, Review, Planning and Retro). With DevOps it's all about Building, Measuring and Improving with the help of tools and automation.
With DevOps, we add tools to help us automate slow process like build and deployment then add metrics to give us numbers to help quantify our processes. Then we gather the metrics and figure out what can be done to improve.
For example with Exception Handling, you may be using a tool like Raygun.io or Elmah and have 100s of errors logged in them. So what do you do with these errors? You can:
- Add each one to your backlog
- Add a task to each sprint to "Get exceptions to 0"
The problem with the above is that not all exceptions are equal, and most of the time they are not more important than the planned PBIs being worked on. No developers like working a whole sprint just looking at exceptions. What should happen is:
- Have the exceptions visible in your development process (i.e. using Slack, adding as something to check before Sprint Planning)
- Triage the exceptions, either add them to the backlog if they are urgent and important
- Add ignore filters to the exception logging tool to ignore errors you don't care about (e.g. 404s)
- Prioritize the exceptions on the backlog
The goal here is to make sure you're not missing important and to reduce the noise. You want these tools to help support your efforts and make your more productive and not just be another time sink.
Knowing the holistic health of your application is important once it has been deployed into production. Getting feedback on your Availability, errors, performance, and usage is an important part of DevOps.We recommend using Application Insights, as getting it set up and running is quick, simple and relatively painless.
Application Insights will tell you if your application goes down or runs slowly under load. If there are any uncaught exceptions, you'll be able to drill into the code to pinpoint the problem. You can also find out what your users are doing with the application so that you can tune it to their needs in each development cycle.
- You need a portal for your app
- You need to know spikes are dangerous
You need to monitor:
To add Application Insights to your application, make sure you follow the rule Do you know how to set up Application Insights?
Can't use Application Insights? Check out the following rule Do you use the best exception handling library ?
You've set up your Application Insights as per the rule 'Do you know how to set up Application Insights.
Your daily failed requests are down to zero & You've tightened up any major performance problems.
Now you will discover that understanding your users' usage within your app is child's play.
The Application Insights provides devs with two different levels of usage tracking. The first is provided out of the box, made up of the user, session, and page view data. However, it is more useful to set up custom telemetry, which enables you to track users effectively as they move through your app.
It is very straightforward to add these to an application by adding a few lines of code to the hot points of your app. Follow Application Insights API for custom events and metrics to learn more.
Feel constricted by the Application Insights custom events blade? Then you can export your data and display it in PowerBI in a number of interesting ways.
Previously we would have had to perform a complicated set up to allow Application Insights and Power BI to communicate. Follow How to connect Application Insights to Power BI via Azure Stream Analytic to learn more. Now it is as easy as adding the Application Insights content pack.
Once you have set up your Application Insights as per the rule 'Do you know how to set up Application Insights' and you have your daily failed requests down to zero, you can start looking for performance problems. You will discover that uncovering your performance related problems are relatively straightforward.
The main focus of the first blade is the 'Overview timeline' chart, which gives you a birds eye view of the health of your application.
Developers can see the following insights:
- Number of requests to the server and how many have failed (First blue graph)
- The breakdown of your page load times (Green Graph)
- How the application is scaling under different load types over a given period
- When your key usage peaks occur
Always investigate the spikes first, notice how the two blue ones line up? That should be investigated, however, notice that the green peak is actually at 4 hours. This is definitely the first thing we'll look at.
As we can see that a single request took four hours in the 'Average of Browser page load time by URL base' graph, it is important to examine this request.
It would be nice to see the prior week for comparison, however, we're unable to in this section.
At this point, we'll create a PBI to investigate the problem and fix it.
(Suggestion to Microsoft, please allow annotating the graph to say we've investigated the spike)
The other spike which requires investigation is in the server response times. To investigate it, click on the blue spike. This will open the Server response blade that allows you to compare the current server performance metrics to the previous weeks.
In this view, we find performance related issues when the usage graph shows similarities to the previous week but the response times are higher. When this occurs, click and drag on the timeline to select the spike and then click the magnifying glass to ‘zoom in’. This will reload the ‘Average of Server response time by Operation name’ graph with only data for the selected period.
Looking beyond the Average Response Times
High average response times are easy to find and indicate an endpoint that is usually slow - so this is a good metric to start with. But sometimes a low average value can contain many successful fast requests hiding a few much slower requests.
Application insights plots out the distribution of response time values allowing potential issues to be spotted.
Application Insights can provide an overwhelming amount of errors in your web application, so use just-in-time bug processing to handle them.
The goal is to each morning check your web application's dashboard and find zero errors. However, what happens if there are multiple errors? Don't panic, follow this process to improve your application's health.
Once you have found an exception you can drill down into it to discover more context around what was happening. You can find out the user's browser details, what page they tried to access, as well as the stack trace (Tip: make sure you follow the rule on How to set up Application Insights to enhance the stack trace).
It's easy to be overwhelmed by all these issues, so don't create a bug for each issue or even the top 5 issues. Simply create one bug for the most critical issue. Reproduce, fix and close the bug then you can move onto the next one and repeat. This is just-in-time bug processing and will move your application towards better health one step at a time.
Your team should always be ensuring that the health of the application is continually improving.
The best way to do that is to check the exceptions that are being logged in the production application. Every morning, fix the most serious bug logged over the last week. After it is fixed then email yesterday's application health to the Product Owner.
There's traditional error logging software like Log4Net or Elmah, but they just give you a wall of errors that are duplicated and don't give you the ability to mark anything as complete. You'll need to manually clear out the errors and move them into your task tracking system (Azure DevOps/VisualStudio.com).
This is where RayGun or Application Insights comes into the picture. RayGun gives you the following features:
- Grouping exceptions
- Ignoring/filtering exceptions
- Triaging exceptions (mark them as resolved)
- Integrations to TFS/VisualStudio.com to create a Bug, Slack
- Tracking the exceptions to a deployment
- See which errors are occurring the most often
To: Adam Subject: Raygun Health Check for TimePro
Please find below the Raygun Health Check for TimePro:
<This email is from https://rules.ssw.com.au/how-to-handle-errors-in-raygun/>
Figure: Email with Raygun application health report
Use Microsoft's Exploratory Testing - Test & Feedback extension - to perform exploratory tests on web apps directly from the browser.
Capture screenshots, annotate them and submit bugs as you explore your web app - all directly from Chrome (or Firefox) browser. Test on any platform (Windows, Mac or Linux), on different devices. No need for predefined test cases or test steps. Track your bugs in the cloud with Azure DevOps.Video: Ravi walks Adam through the exploratory testing extension - You can also watch on SSW TV Video: Ravi Shanker and Adam Cogan talk about the test improvements in Azure DevOps and the Chrome Test & Feedback extension - You can also watch on SSW TV
- Go to Visual Studio Marketplace and install "Test & Feedback".
- Click "Add to Chrome" to add the extension to the browser on your computer.
- Go to Chrome.
- Start a session by clicking on the Chrome extension and then click start a session.
- Upload the screenshot to a PBI.
When reporting bugs and giving product feedback, it is essential that you are as descriptive as possible.
In the case of bugs, the goal is enough detail so the developer can reproduce the error to find out what the problem is.
In the case of suggested features it is best to:
- Draft your suggestion
- Call the Product Owner sharing screens, then add the text “checked by XXX”
- If a backlog exists, save the Issue/PBI and @mention relevant people (they should get an email) as per https://www.ssw.com.au/rules/when-you-use-mentions-in-a-pbi
- If the client will not get an automatic nicely formatted email with all the text, then send the email with the URL of the Issue/PBI
Try to have one issue/PBI/email per bug/suggestion, but if the bugs/suggestions are related or very small (e.g. they are all on the same page) then you should group them together in a single email.
To: email@example.com Subject: Your software
I'm having a problem with your PerformancePro software. When I run it, it says something about registration and then exists.
Can you tell how to fix this?
Figure: Bad Example - This email isn't going to help the developer much - it is vague and has no screen capture, and gives no alternate way for the developer to contact the user regarding the issue
To: firstname.lastname@example.org Subject: Your software
I'm having a problem with your PerformancePro software. When I run it, this is what happens:
I have the latest version of all my software. I am running Windows 10 and Office365.
Can you please investigate and let me know how to proceed?
Figure: Good Example - This email includes the product name and version, the category of the issue (BUG), a screen capture and contact number, and shows that the user's system is up to date
A great template to follow is the Functional Bug template from the ASP.NET open-source project. Spending time to provide as much detail as possible, by ensuring you have the three critical components of: Steps to reproduce, Expected outcome, and Actual outcome, will save the both you and the developer time and frustration in the long run.
Also, make sure your descriptions are detailed and useful as that can make finding the solution quicker and easier.
Make sure you always explain and give as many details as you can of how you got an error or a bad experience.
To: Rebecca Subject: SSW TV
Where is SSW TV on the navigation?
Figure: Bad example - Lack of details
To: Rebecca Subject: Can't find SSW TV link on SSW website
Navigated to ssw.com.au
Scrolling down looking for a big graphic like "CHECK OUT SSW TV! CLICK HERE!" (Nothing) Me, thinking… "Hmm... let's try the menu at the top..."
About Us? Nope. Services? Nope. Products and Support? Nope. Training? Nope. User Group? Nope. Rules? Nope.
Me, thinking... "OK. Now where? Most likely, the SSW company description will list it..." Navigates to About Us... scrolling down... nothing.
Me, thinking... "OK. Weird. Let's go back." Me, goes back to homepage. Me, thinking... "Is there a site map?" Scrolls to bottom of page. Clicks sitemap link. Me, thinking... "Ctrl+F for TV? Nope." Me, gives up... types tv.ssw.com.au to try and get lucky. Huzzah!
- Can you help users to get to SSW TV from ssw.com.au
Figure: Good example - We can easily identify more the one way to improve the UX
Figure: Good example - Recording bug reports in a video can make the issue clearer to see
Figure: Good example - Giving feature requests via video
Who should you email, the Product Owner or the Tech Lead?
It depends on the team, but often the Product Owner is busy. If you know the Tech Lead and your suggestion is obviously a good one and not too much work, then you should email the Tech Leader and Cc the Product Owner.
The Product Owner can always respond if he doesn’t like the suggestion:
- For a bug email:
Subject: Bug - xxx (or use PBI @mention)
- For a new feature email:
Subject: Suggestion - xxx (or use PBI @mention)
Note: You may have a group email such as email@example.com, You would only Cc this email for greater visibility.
Do you use emojis for PBI titles?
When you create a bug/suggestion to a backlog, it's a good idea to add emoji in the title. Not only does it look nicer, but people can look at the item and take in the necessary information quickly.
This means that anyone looking at the backlog can glean its nature at a glance, rather than having to read each item to know what category it is (5 bugs, 2 features, etc.):
- 🐛 Bug - Calendar is not showing on iOS devices
- ✨Feature - Add 'Back to menu' item to top navigation
Whenever you are writing code, you should always make sure it conforms to your team's standards. If everyone is following the same set of rules; someone else’s code will look more familiar and more like your code - ultimately easier to work with.
No matter how good a coder you are, you will always miss things from time to time, so it's a really good idea to have a tool that automatically scans your code and reports on what you need to change in order to improve it.
Visual Studio has a great Code Analysis tool to help you look for problems in your code. Combine this with Jetbrains' ReSharper and your code will be smell free.
The levels of protection are:
Get ReSharper to green on each file you touch. You want the files you work on to be left better than when you started. See Do you follow the boyscout rule?
Tip: You can run through a file and tidy it very quickly if you know two great keyboard shortcuts:
- Alt + [Page Down/Page Up] : Next/Previous Resharper Error / Warning
- Alt + Enter: Smart refactoring suggestions
Is to use Code Auditor.
Note: Document any rules you've turned off.
Is to use Link Auditor.
Note: Document any rules you've turned off.
Is to use StyleCop to check that your code has consistent style and formatting.
Run Code Analysis (was FxCop) with the default settings or ReSharper with Code Analysis turned on
Ratchet up your Code Analysis Rules until you get to 'Microsoft All Rules'
Is to document any rules you've turned off.
All of these rules allow you to disable rules that you're not concerned about. There's nothing wrong with disabling rules you don't want checked, but you should make it clear to developers why those rules were removed.
Create a GlobalSuppressions.cs file in your project with the rules that have been turned off and why.
The gold standard is to useSonarQube, which gives you the code analysis that the previous levels give you as wells as the ability to analyze technical debt and to see which code changes had the most impact to technical debt
Code Coverage shows how much of your code is covered by tests and can be a useful tool for showing how effective your unit testing strategy is. However, it should be looked at with caution.
- You should focus on *quality* not *quantity* of tests.
- You should write tests for fragile code first and not waste time testing trivial methods
- Remember the 80-20 rule - a very high-test coverage is a noble goal but there are diminishing returns.
- If you're modifying code, write the test first, then change the code, then run the test to make sure it passes (AKA red-green-refactor).
- You should run your tests regularly (see Do you follow a Test Driven Process). Ideally, they'll be part of your build (see Do you know the minimum builds to create for your project?
Figure: See how Slack can be setup to improve your Devops
With all these different tools being used to collect information in your application, a developer will frequently need to visit many different sites to get information like:
- Was the last build successful?
- What version is in production?
- What errors are being triggered on the app?
- Is the server running slow?
- What is James working on?
This is where a tool like Slack comes in handy. It can help your team aggregate this information from many separate sources into one dedicated channel for your project. The other benefits also include a new team member instantly having access to the full history of the channel as well so no conversations are lost.
At SSW we integrate Slack with:
- Octopus Deploy
- Visual Studio
Even better, you can create bots in slack to manage things like deployments and updating release notes.
(Before you configure continuous deployment) You need to ensure that the code that you have on the server compiles. A successful CI build without deployment lets you know the solution will compile.
When naming documents, use kebab-case to separate words to make your files more easily discoverable.
Bad example: File name uses a space to separate words
As far as search goes, using spaces is actually a usable option. What makes spaces less-preferable is the fact that the URL to this document will have those spaces escaped with the sequence %20. E.g. sharepoint/site/library/Monthly%20Report.docx. URLs with escaped spaces are longer and less human-readable.
Know more on Do you remove spaces from your folders and filename?
Bad example: CamelCase - File name doesn't have spaces but also doesn't contain any separators between words
This is a popular way to combine words as a convention in variable declarations in many coding languages, but shouldn't be used in document names as it is harder to read. Also, a file name without spaces means that the search engine doesn't know where one word ends and the other one begins. This means that searching for 'monthly' or 'report' might not find this document.
Figure: OK example - underscored (Snake_Case) URLs have good readability but are not recommended by Google
Underscores are not valid word separators for search in SharePoint, and not recommended by others. Also, sometimes underscores are less visible to users, for example, when a hyperlink is underlined. When reading a hyperlink that is underlined, it is often possible for the user to be mistaken by thinking that the URL contains spaces instead of underscores. For these reasons it is best to avoid their use in file names and titles.
Good Example: kebab-case - File name uses dashes to separate words
A hyphen (or dash) is the best choice, because it is understood both by humans and all versions of SharePoint search.
You may use Uppercase in the first letter in Kebab-Case, however it's important to keep consistency
- Add relevant metadata where possible
If a document library is configured with metadata fields, add as much relevant information as you can. Metadata is more highly regarded by search than the contents within documents, so by adding relevant terms to a documents metadata, you will almost certainly have a positive effect on the relevance of search results.
- Use descriptive file names and titles
The file name and title is regarded more highly by search than the content within documents. Also, the title or file name is what is displayed in the search results, so by making it descriptive, you are making it easier for people who perform searches to identify the purpose of your document.
TFS and Windows Azure work wonderfully together. It only takes a minute to configure continuous deployment from Visual Studio Online (visualstudio.com) to a Windows Azure Web Site or Cloud Service.
This is by far the most simple method to achieve continuous deployment of your websites to Azure.But, if your application is more complicated, or you need to run UI tests as part of your deployment, you should be using Octopus Deploy instead according to the Do you use the best deployment tool rule.
Suggestion to Microsoft: We hope this functionality comes to on-premise TFS and IIS configurations in the next version.
When a new developer joins a project, there is often a sea of information that they need to learn right away to be productive. This includes things like:
- Who the Product Owner is and who the Scrum Master is
- Where the backlog is
- Where the automated builds are
- Where the staging and production environments are
- How to set up the development environment for the project
Make it easy for the new developer by putting all this information in a central location like the Visual Studio dashboard.
The dashboard should contain:
- Who the Product Owner is and who the Scrum Master is
- The Definition of Ready and the Definition of Done
- When the daily standups occur and when the next sprint review is scheduled
- The current sprint backlog
- Show the current build status
Show links to:
- Staging environment
- Production environment
- Any other external service used by the project e.g. Octopus Deploy, Application Insights, RayGun, Elmah, Slack
Your solution should also contain the standard _Instructions.docx to your solution file for additional details on getting the project up and running in Visual Studio.
For particularly large and complex projects, you can use an induction tool like SugarLearning to create a course for getting up to speed with the project.
Continuous deployment is a set of processes and systems in place where every change is proven to be deployable to production and then deployed to production. E.g. DB migrations, code changes, metadata changes, scripts, etc.
At minimum teams needs to ensure that (a) All changes are sanitized by an automated continuous deployment pipeline (b) changes at end of each sprint are deployed to production.
View more detailed rules at Rules to Better Continuous Deployment with TFS
Often, deployment is either done manually or as part of the build process. But deployment is a completely different step in your lifecycle. It's important that deployment is automated, but done separately from the build process.
There are two main reasons you should separate your deployment from your build process:
- You're not dependent on your servers for your build to succeed. Similarly, if you need to change deployment locations, or add or remove servers, you don't have to edit your build definition and risk breaking your build.
- You want to make sure you're deploying the *same* (tested) build of your software to each environment. If your deployment step is part of your build step, you may be rebuilding each time you deploy to a new environment.
The best tool for deployments is Octopus Deploy.
Octopus Deploy allows you to package your projects in Nuget packages, publish them to the Octopus server, and deploy the package to your configured environments. Advanced users can also perform other tasks as part of a deployment like running integration and smoke tests, or notifying third-party services of a successful deployment.
Version 2.6 of Octopus Deploy introduced the ability to create a new release and trigger a deployment when a new package is pushed to the Octopus server. Combined with Octopack, this makes continuous integration very easy from Team Foundation Server.
What if you need to sync files manually?
Then you should use an FTP client, which allows you to update files you have changed. FTP Sync and Beyond Compare are recommended as they compare all the files on the web server to a directory on a local machine, including date updated, file size and report which file is newer and what files will be overridden by uploading or downloading. you should only make changes on the local machine, so we can always upload files from the local machine to the web server.
This process allows you to keep a local copy of your live website on your machine - a great backup as a side effect.
Whenever you make changes on the website, as soon as they are approved they will be uploaded. You should tick the box that says "sync sub-folders", but when you click sync be careful to check any files that may be marked for a reverse sync. You should reverse the direction on these files. For most general editing tasks, changes should be uploaded as soon as they are done. Don't leave it until the end of the day. You won't be able to remember what pages you've changed. And when you upload a file, you should sync EVERY file in that directory. It's highly likely that un-synced files have been changed by someone, and forgotten to be uploaded. And make sure that deleted folders in the local server are deleted in the remote server.
If you are working on some files that you do not want to sync then put a _DoNotSyncFilesInThisFolder_XX.txt file in the folder. (Replace XX with your initials.) So if you see files that are to be synced (and you don't see this file) then find out who did it and tell them to sync. The reason you have this TXT file is so that people don't keep telling the web
NOTE: Immediately before deployment of an ASP.NET application with FTP Sync, you should ensure that the application compiles - otherwise it will not work correctly on the destination server (even though it still works on the development server).