Testing in Production with Feature Toggles in .NET Core

- 10 minutes read - 1977 words

I’ve always been a big fan of testing, and often think about ways to improve the testability of our system. The most effective testing is to test what your users actually use, and that is to test in production. This can be quite a scary thought to some, shipping untested code out to our live system, with real users, and hope it doesn’t break!

Of course, the simple solution would be to run two “Production Environments” side by side, and use one exclusively for testing, and only once you are satisfied everything is working, push the tested changes across to the other environment. This doesn’t make much sense though.

There’s a few potential issues with this approach. Firstly, you’ve just double your flat production costs (excluding costs which are accrued by traffic), convincing people with the credit card that this is a good idea, can be quite a hard sell, and rightfully so. We’ve essentially created a beefed-up version of a “test environment”.

Maybe you’re in a position where increasing costs like this isn’t a concern, unfortunately it still doesn’t cover our requirement of “testing in production”, we’ll be testing in a “production-like” environment. We won’t have data generated by users, we won’t have live traffic continuing to use the system as we’re testing our feature. It’s just not the same.

To effectively test in production, it’s important you have the appropriate monitoring and logging in place, as well as a team comfortable supporting a continuously changing system. Continuously deploying to production is something that you will become more comfortable with over time. It’s important to have good observability of your system.

It’s also worth noting that testing in production, and continuously pushing to production isn’t for everyone, if you’re in a highly-regulated field for example, you may not be able to for legal reasons. However, feature toggling may still be of use to you, in slightly different contexts, so hopefully you will still take something away from this post.

Feature Toggling and Testing in Production

So what do we mean by “Feature Toggling”? The most simplistic answer I can provide for this is:

If a pre-set condition is true, where the appropriate value comes from a source, the feature should be enabled, otherwise the feature should be disabled, and the code should not execute.

I’ve purposefully made the description here quite open. I haven’t specified where the “appropriate value” is actually populated. There are so many sources we could use here, but let’s explore a few:

  • A hard-coded configuration value
  • A claim on an authorised user
  • The tenant Id of the current request

In these examples, I’ve picked out some that are particularly useful for testing purposes, but you could expand it to feature toggle on IP Allow/Block lists, or try to lookup the user’s location based on request headers/specified address and only allow certain features in certain regions too, the same concepts apply!

You can also use the same concept to A/B test features, or slowly roll out features by only enabling 10% of all requests to use the new flow, for example.

To go into more detail, it makes sense to show some code samples. Although the following code samples target C# and .NET Core specifically, I hope the concepts are easily translatable to your language of choice.

Setting up Feature Toggling in .NET Core 3.1

For the following samples we’ll be utilising the Microsoft.FeatureManagement package. I was first made aware of this package by the blog post series on feature flags in ASP.NET by Andrew Lock. The library has evolved slightly since then, but there’s still a lot more detail on how it works for those who are interested.

A brief overview of the package though:

  • It uses the Microsoft.Extensions.Configuration bits in to determine if the feature is enabled
  • It has some built in “Filters” which can be very handy for getting up and running
  • It is extensible enough for us to also create our own filters for more specific flagging

Setting up feature toggling and using it on a hard-coded configuration value

This is the simplest way of feature flagging. There are no dynamic values. It’s on or off. True or false.

First off, let’s create a features.json file for handling our feature configuration. As we’ll be using Microsoft.Extensions.Configuration for this, we could equally define our feature configuration in environment variables, ini files, yaml files, Azure, AWS, wherever. My feature config looks like this:

{
    "FeatureManagement": {
        "AdvancedSearch": true
    }
}

Now in our application, we need to load up this configuration ready to be used. You can either include this in your apps wider configuration by adding to the ConfigurationBuilder already being used, but I like keeping my feature config separate, so here’s how I’ll set up the config:

var featureConfig = new ConfigurationBuilder()
                .AddJsonFile($"features.json", optional: true)
                .Build()

Finally, as the bits in Microsoft.Extensions.* can be incredibly viral and end up all over your application, we need to register the feature management components to our container. If you’re using ASP.NET Core here, the following code will fit into your ConfigureServices method, otherwise you may need to create a ServiceCollection and use it elsewhere.


var services = new ServiceCollection(); //omit if using ASP.NET Core
services.AddFeatureManagement(featureConfig);
var serviceProvider = services.BuildServiceProvider(); //omit if using ASP.NET Core

That’s all the setup necessary! If we’re utilising dependency injection (this is true by default in ASP.NET Core), we can just take a dependency on IFeatureManager, if not, we can pull the instance out of our service provider:

var featureManager = serviceProvider.GetRequiredService<IFeatureManager>();

Once we have our featureManager instance available to use, we can begin to do some flagging!

if(await featureManager.IsEnabledAsync("AdvancedSearch")
{
    // execute the advanced search
}
else
{
    // execute the simple search instead
}

This is the simplest way to get up and running with feature toggling, but we’re not able to effectively test in production using this. It will allow us to push code to production and disable it whilst we test elsewhere though, which is a huge step forward. Let’s explore some other ways of feature toggling which we can utilise on our production environment.

Feature Toggling on a user claim

So far, our AdvancedSearch feature will either be enabled for everyone, or not. As we’re relying on UserClaims here, this will only work with ASP.NET Core - we’ll show an example on how to do something similar when we come to featuring by Tenant Id.

Let’s make it so it’s only enabled for users who are our beta testers, or internal users identified by a claim instead. Let’s start my expanding our feature.json

{
  "FeatureManagement": {
    "AdvancedSearch": {
      "EnabledFor": [
        {
          "Name": "Claims",
          "Parameters": {
            "AllowedClaims": ["Employee","BetaTester"]
          }
        }
      ]
    }
  }
}

Now, this is a little more complex than either being enabled or not, so we’ll need to create our own FeatureFilter to handle this. Firstly, let’s create a class which understands the shape of the underlying configuration:

public class ClaimsFilterSettings
{
    public string[] AllowedClaims { get; set; }
}

Secondly, we’ll need to implement the filter. The filter will take a dependency on IHttpContextAccessor so we can grab the current HttpContext and make a decision on whether to allow the current user access:

[FilterAlias("Claims")]
public class ClaimsFeatureFilter : IFeatureFilter
{
    private readonly IHttpContextAccessor httpContextAccessor;

    public ClaimsFeatureFilter(IHttpContextAccessor httpContextAccessor)
    {
        this.httpContextAccessor = httpContextAccessor;
    }

    public Task<bool> EvaluateAsync(FeatureFilterEvaluationContext context)
    {
        var settings = context.Parameters.Get<ClaimsFilterSettings>();

        var user = this.httpContextAccessor.HttpContext.User;

        // IFeatureFilter is async by default
        // and we're not doing anything async here, so we'll need to use Task.FromResult
        return Task.FromResult(user.Claims.Any(x => settings.AllowedClaims.Contains(x.Type)));
    }
}

So here we take the current user on the HTTP Context, and return true if any of their claims match the ones we say are “Allowed”. This concept has described in more detail by Andrew Lock, I have modernised it and made it accept any instead of all.

Once we have our filter all configured, we need to tell the feature manager to use it. For that, we need to update the registration.

services.AddFeatureManagement(this.configuration)
    .AddFeatureFilter<ClaimsFeatureFilter>();

Finally, we can test our advanced search functionality in production without breaking the existing “simple” search for our existing users!

What if we want to be less granular, and rely on mutli-tenancy to help us test our functionality? This actually has more underlying benefits than just testing, we can charge some “tenants” extra and enable functionality for them too, they pay more, and get more features! For the purpose of this example, let’s stick to using tenants for testing purposes.

Feature Toggling on a particular tenant

For this example, I would like to consider 3 separate tenants, all using the same code and the same databases, the tenant separation is at an application-logic level, rathern than infrastructure level. Those tenants will be:

  • Actual Real Live Users, or ARLU as an identifier
  • Manual Annoying Tests, or MAT as an identifier
  • Automatic Awesome Testing, or AAT as an identifier

I’ve made the distinction between MAT and AAT because we may disable certain things for automated testing, like actually purchasing items, or transferring money etc. Whereas we may want to do that for the manual tests because a human can make sensible decisions.

Ok, so we have our tenants defined, now, I only want to enable our new “AdvancedSearch” feature for MAT and AAT, leaving ARLU to be using the old search still.

{
  "FeatureManagement": {
    "AdvancedSearch": {
      "EnabledFor": [
        {
          "Name": "Tenants",
          "Parameters": {
            "AllowedTenants": ["MAT","AAT"]
          }
        }
      ]
    }
  }
}

Similar to our claims testing, we’ll need some settings so that our application can understand this configuration.

public class TenantFilterSettings
{
    public string[] AllowedTenants { get; set; }
}

So far, so good. It’s almost identical to our ClaimsFilterSettings, now, we’ll also need a custom feature filter, but this time we’ll want a ContextualFeatureFilter which will include some additional context when making a decision. For this to work, we’ll actually need to define our context:

public class TenantFeatureContext
{
    public string TenantId { get; set; }
}

This class will be used to tell the feature filter what the current tenant is. Now we can implement our ContextualFeatureFilter:

[FilterAlias("Tenants")]
public class TenantsFeatureFilter : IContextualFeatureFilter<TenantFeatureContext>
{
    public Task<bool> EvaluateAsync(FeatureFilterEvaluationContext featureFilterContext, TenantFeatureContext appContext)
    {
        var settings = ConfigurationBinder.Get<TenantFilterSettings>(featureFilterContext.Parameters);

        //Again, we're not doing anything async here
        return Task.FromResult(settings.AllowedTenants.Contains(appContext.TenantId));
    }
}

You’ll notice we’re now implementing a generic IContextualFeatureFilter interface which makes our EvaluateAsync method take our context as a parameter.

We’ll also need to let the feature manager know about or feature manager before it will be used:

services.AddFeatureManagement(this.configuration)
    .AddFeatureFilter<TenantsFeatureFilter>();

However, unlike the ClaimsFeatureFilter, this won’t work out of the box. The feature manager has no way of knowing what the TenantFeatureContext actually is.

We need to explicitly pass through the context when checking if the feature is enabled.

var tenantId = "MAT"; //May actually be pulled from message metadata, parameters from a HTTP request, a database call, anywhere.
var context = new TenantFeatureContext {TenantId = tenantId};

Finally, we need to change how we check if the feature is enabled by including the context:

if(await featureManager.IsEnabledAsync("AdvancedSearch", context)
{
    // execute the advanced search
}
else
{
    // execute the simple search instead
}

All that’s different now, is we’re using an overload of IsEnabledAsync and including our context.

Wrapping Up

Now we can turn features on/off for specific users, in specific contexts, we can comfortably test in production without our users seeing a broken implementation, half-baked code, or a feature that is solely used for testing purposes. It may add a little noise to your codebase, but having this flexibility in your system is incredibly powerful and worth the extra code.

It’s worth keeping in mind that once a new feature is enabled for everyone, it may be worth going back and removing the feature toggle. If it’s always on, it’s just extra code that you have to maintain. Remember to delete it early if you can.