How DRY should you be?
Heads Up!
This article is several years old now, and much has happened since then, so please keep that in mind while reading it.
Before we explore the trade-offs between coding options, I want to define for the purposes of the article what DRY is and why developers should apply it.
The DRY principle (or Don’t Repeat Yourself) was first documented by Andy Hunt in “The Pragmatic programmer” and states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
Applying this principle we create methods and call them, this code reuse makes development and maintenance of software easier. Rather than copy and pasting code we want to reuse, creating a method reduces the maintenance burden. By creating one single point of truth, it means that instead of fixing bugs and changing the code in a number of locations, we fix them in one.
Dry code has a number of advantages:
- It is easy to maintain – rather than having multiple lines of code that do the same thing, having a single routine for that functionality means bug fixing is in one place
- It is easy to read – abstracted routines with small signatures can be easier to read than long repeating lines of code.
- Code in a routine is easier to reuse.
- Less unit tests – unit tests only have to be written for the one routine, not to test identical code.
- Less lines of code.
When you combine the above, DRY code saves developers time.
The DRY principle is sometimes violated in the following situations:
- If code analysis indicates there is duplicate code, a developer may decide the effort of refactoring a working code base costs more in terms of development effort and testing than leaving duplicates within the base.
- Whilst two routines may look identical, it is known that at some point in the future they will diverge.
- Passing too many variables into a routine can make the routine harder to read and introduces a burden on developers for understanding the context of the variables.
Libraries
Libraries are an extension of the DRY methodology. They allow code to be reused across projects, and developers to concentrate on tasks key to the project they are working on.
Umbraco is a great example of this by using Umbraco to content manage the website and apps, developers can focus on how the pages and apps are built. Developers no longer need to build the use interface and database interactions that would be required for administrators to manage their website and applications.
Using recognised libraries in a project is excellent:
- They are tested by the software teams developing them.
- Development teams which use them round the world become testers for the libraries, ironing out issues faster.
- Documentation for recognised libraries tends to be high, code samples are generally easy to find, and meetups and communities such as https://our.umbraco.com/ exist.
- Potential new hires recognise the libraries.
In-House Libraries
In-house libraries allow developers to collect together knowledge and share it across the team. They prevent repetition of tasks and tend to build on commercially available and open source platforms.
There are generally two approaches to building core libraries
- A technical architect has reviewed the requirements of a number of projects, found the duplicated functionality and designed an architecture around it.
- Where a developer wants to reuse functionality from one project in another, and writes a library to do this.
In either case in-house libraries should always reside in their own source control repository allowing bug fixes and new requirements to be implemented independently of existing projects.
Creation of in-house libraries introduces technical debt for the in-house development team:
- They must be tested, maintained and documented.
- On boarding of new team members must include introduction to the new libraries – if team members don’t know about them they are not going to use them.
- New developers must appreciate the quirks the libraries have due to the iterations they have been through and the implementations they have been used in.
- Addition and modification of new features must be controlled. When developing functionality it is important to remember that if a feature is only being used in one project, or it requires a lot of dependencies, it will be available in all projects that reference that library. Likewise deleting a feature or amending its signature will impact all projects referencing that code.
A great way to manage features is to build the libraries into packages and host them on a NuGet server. This means projects can reference known versions of the in-house libraries allowing a managed roll-out of changes to older projects. If methods are to become obsoleted, and removed, mark older routines with the obsolete attribute; this allows programs to compile, and makes the developer aware of code changes at compile time.
It is important that the roll-out of bug fixes is managed, as testing the impact of the changes will need to be reviewed on a case by case basis.
Libraries and Tech Debt – A History Lesson
In-house libraries are born from a point in time, and many years ago I worked on an information management system that pushed data to a number of websites. It started life as a VB6 project, the data it used was stored in XML format in a SQL Server database and accessed via ODBC.
For data access a wrapper was written that dealt with such things as SQL injection attacks. The XML was transformed into objects that could then be manipulated on the website. And a web class based architecture was written which allowed front end developers to white label websites easily.
Then VB.NET was released. VB.NET obsoleted web classes and replaced them with web forms. Rather than rewrite the website to work with VB.NET. A new layer was introduced which mimicked web classes. The data layer was rewritten to use ADO.NET.
Over time the rest of the components were replaced with VB.NET equivalents – but the in-house web class architecture stayed.
Around the late 2000s it became hard to recruit VB.NET developers and the decision was made to migrate to C#, This involved writing wrappers for the libraries and integrating into C# apps. The faux “web class” architecture remained so new functionality like entity frameworks and MVC was never exposed at the website end. Resulting in developers never working with the latest functionality.
New developers and contractors that joined the team would therefore need to be trained up in older methods rather than using the more recent functionality.
The lesson learned was that core libraries save time and the white labeling of websites was quick. Developers could move between projects easily with minimal learning time. After the initial teething troubles the systems were robust. The trade-off was that developers had no space to learn or evolve.
No Libraries
When no libraries exist, the downside is that code including bugs, is copied from one project to another. One the upside developers are free to pick the libraries of their choice, meaning they can use the latest tools.
The impact of no libraries is:
- Knowledge is not easily reused.
- Testers can no longer assume that because something was implemented in one project one way it will be implement the same in the new one, so testing effort increases.
- Potential duplication of bugs across projects.
- Unit testing effort is increased.
- Support effort can increase.
Risks around no libraries can be managed to some extent by knowledge sharing.
The impact of developers being allowed to choose their own libraries is:
- The required levels of knowledge increases as colleagues also need to learn to use the latest tools, as well as maintain projects using older tools.
- Context switching becomes harder.
- Recruitment becomes harder, when the range of tools becomes longer, the recruitment skill list becomes longer.
Risks around different tools can be managed to some extent with code reviews.
With no in-house libraries the cost of creating similar apps never reduces.
Summary
Allowing developers to choose their own libraries fulfills the agile manifesto as each team picks the tools required for the project. However mandating a set of libraries makes it easier for developers to know what to expect when they move between projects. This is at the risk of stifling innovation and of creating “cookie cutter” developers.
Whether or not in-house libraries are used, development teams need to communicate and share knowledge of the implementations. However when in-house libraries do not exist and there is no agreement on the tools to use on the various projects, more knowledge needs to be shared each time.
Finally when creating libraries a plan must be made for their reuse across projects, their maintenance and their eventual retirement.
Finding Duplicate Code
There are a number of tools available that can help find duplicate code such as Resharper, SonarQube and the tools built into Visual Studio (dependant on version). These will enable the developer to reduce code duplication within a project.
- Visual Studio 2017 Enterprise edition comes with a component for finding duplicates more information on installing the component is found here https://blog.craigtp.co.uk/Post/2017/09/28/Analyze_Solution_For_Code_Clones_missing_in_Visual_Studio_2017 earlier versions of Visual Studio come with Code Clone Detection https://msdn.microsoft.com/en-us/library/hh205279.aspx
- Resharper - https://www.jetbrains.com/help/resharper/dupFinder.html
- Team City - https://confluence.jetbrains.com/pages/viewpage.action?pageId=113084077
- SonarQube - https://blog.sonarsource.com/manage-duplicated-code-with-sonar
For more information on marking code as obsolete see https://docs.microsoft.com/en-us/dotnet/api/system.obsoleteattribute.message?view=netframework-4.7.2
Rachel Breeze
Rachel is on Twitter as @BreezeRachel