How to Identify Content Refresh Candidates, In Detail

For my first blog entry on 2024’s ledger (BTW, happy new year!) I’m going to do a little double-duty. As you can see from a little slice of my Asana task list, I owe our account managers an updated SOP for identifying client refresh candidates.

However, since it’s that time of year when everyone’s “best tools of 2023” listicles start to age like milk, I figure our account managers aren’t the only people that might have use for a primer on identifying content refreshes. So this is both an SOP for them and a blog post for everyone because, well, why not?

A Brief Definition of Content Refreshes and Their Value Prop

Before diving in, I’d be remiss not to provide a little context. First, when I talk about refreshing content, I mean quite simply modifying any previously published piece of content on a site. The nature of those edits will generally involve updating the article with additional, more current information (hence “refresh”).

As a quick example, consider this post about content managers that I wrote back in 2020. In it, I talk about being a blogger for ten years, which is no longer true. So I could “refresh” it by going in and correcting that to thirteen years…and then applying for AARP.

As for why you would do this, hobbyists might do it for a purist’s love of correctness. But in the business world, the main motivation is generally to preserve or increase traffic to the content, usually from search engines.

Search Engines and Content “Freshness”

At this point, I’m going to ask you to set aside your preconceived notion that “Google ‘likes’ fresh content,” if you have it. Anthropomorphizing the search engine isn’t helpful for our purposes here. There’s a less magical and more grounded explanation for executing refreshes.

On a long enough timeline, nothing you publish is truly evergreen. If you publish some kind of viral hot take, you’ll get views for a week, then nothing, in what the SEO industry refers to as the “spike of hope and flatline of nope.”

But if you target keyword ranking and search engine traffic, you’ll typically have a lifecycle where traffic to your post gradually increases for maybe a year or two and then gradually decays over several years. I describe that in a lot more detail in this post about modeling organic traffic. The decay will come, and the best you can hope for is a gentle slope when it does.

Enter the refresh.

A content refresh aims to slow the decay or even temporarily restore traffic growth. By routinely modifying posts to maintain the latest information on their topic, you ensure that searchers continue to find the content valuable, prompting the search engine to continue to rank the article and bring traffic to it.

I should note that, for the rest of this post, I’m going to assume that whoever is executing a refresh is doing so in an SEO-best-practice way, without going into detail about what that looks like. This post is already long enough without that digression.

Factors

So why not just routinely and obsessively refresh all of your content, as a sort of traffic decay prophylactic? Well, setting aside the obvious issue of cost/labor, especially for high-volume content sites, let’s now consider the variables involved in deciding when to execute a refresh.

To do this, I’ll do what I specialize in: ruin the art of content with math. I’ll briefly identify the factors to consider when looking for refresh candidates. From there, each section below will get “mathier” as we go, allowing you to bail out when you’ve had enough.

Here are the factors that impact the decision of whether or not to refresh a post:

Content age
Current content accuracy
Content traffic potential
Cost of refreshing
Potential value of refreshing
Traffic trend
Risk tolerance vis a vis current traffic

And generally speaking, we want a process that takes relevant factors into account and first compiles a list of candidates for refresh. There will then generally be a secondary process to evaluate the candidates, prioritizing and/or culling them.

For our account managers, this is generally as simple as gathering refresh candidates and having the client approve. If you control all of the content, the prioritization process will generally account for cost and prioritize based on potential gains.

I realize this is all abstract, but don’t worry. The 101 treatment of this will serve as a simple, concrete example.

Refresh Identification and Curation 101: A Simple Process for Our Account Managers

I’m going to start with the process, and then explain the rationale. Here’s a dead simple summary of the two things we need to do:

Find any obviously outdated blog posts via a title search.
Find any blog posts ranking between positions 4 and 15 for their “best” keyword.

Identification Step-By-Step

You’re going to make this list in two passes and later eliminate any duplicates you find.

First, find the obviously outdated post. You do that with an advanced Google search:

site:{targetsite} intitle:2023|2022|2021|2020|2019

Here’s what this would look like against the New York Times, for instance:

This will give you a list of any pieces of content on the site with a previous year in the title—a common occurrence with listicle-type articles that will need refreshing. You’ll want to add these URLs to a spreadsheet of candidates.

Next up, let’s find articles ranking between 4 and 15. To do this, you’ll log into Ahrefs and use site explorer on the target site, navigating to “top pages.” Here’s what that looks like for our property, makemeaprogrammer.com.

Now, you’re going to click on “+keyword filters” and filter by position, entering 4–15.

Apply the filter, and let ‘er rip. This will result in a list of URLs that occupy a position between 4 and 15 for what Ahrefs considers to be the URL’s “best” keyword. Click export, and this list of URLs is added to our candidates list.

From here, paste the Ahrefs Excel export into a Google sheet and add any URLs from the outdated title search that aren’t already in there. This is our candidate list.

If You Don’t Have Ahrefs

Our account managers all have this, but if you don’t, you can still implement the spirit of the activity. The easiest thing to do would be to comb this list of SERP trackers for an inexpensive one or one with a free trial and use it to find rankings for your site.

But you can also accomplish this manually by taking the pages on your site that earn search traffic and simply googling their best keyword, noting where you come up.

Prioritizing and Culling

With a candidate list in place, we need to cull, then prioritize. For us, it’s ultimately up to the client what to refresh, but we certainly want to go to them with a curated, prioritized list and a set of recommendations.

Here’s how we cull, going through the list one by one:

If the article has been modified in the last six months, remove it as a candidate.
If an article with a year in the date is obviously about some temporal event (e.g., “Our trip to WhateverCon, 2023”), remove it as a candidate.
If the article is unrelated to the keyword (e.g., an article about semi trucks ranks for the name of a random truck driver that the article happens to mention), remove it as a candidate.

With everything culled, we now prioritize, which is also dead simple in the 101 edition. Sort by “Current Top Keyword: Volume,” descending.

We now have a prioritized list of refresh candidates ready to present and/or execute.

The Rationale

Let’s pause now and revisit the factors I mentioned above in considering why this approach makes sense.

Anything with a previous year in the title is now inaccurate and thus going to plummet in traffic, if it hasn’t already. The risk of a refresh here is nil, and the potential upside is high.
We want to ignore any article ranking below position 15, since there’s a very high chance that it has no traffic and isn’t capable of earning any. This way, we don’t waste time or money on a pointless refresh with no upside value.
We want to ignore any article ranking above position 4 since it’s successfully ranking already. As they say, “Don’t mess with success” (the risk factor is high).
We want to hold off on any article touched within the last six months because the effort may be wasted, especially with newly published articles. It takes a while for things to settle into their eventual ranking position, and if you fiddle too much, it might be pointless labor.

So, at the 101 level, we have a generally low-risk, high-upside way to productively identify and execute content refreshes. In other words, you can be confident that following this process will yield good results, without worrying too much about underlying data and probabilities.

Refresh Identification 201: Traffic Trends

To explain the rationale for the more in-depth approaches, I need to explain a little bit about probability and game theory. Enter the math.

Refreshes as Blackjack Hands

Whenever you refresh a live piece of content, you’re expecting it to rank better and earn more traffic. And this is usually what will happen. But sometimes, for whatever reason, it will actually drop in rankings and traffic.

So every time you touch a post, you do so knowing that it will probably help but knowing it might instead hurt. You want to do it anyway because the expected value of the activity is positive. It’s more likely to help than hurt, so you live with the handful of times it hurts while trying to minimize the impact of the “hurts.”

Think of this as a blackjack game where you’re the casino. You play the game knowing that you won’t win every hand but knowing that you will win more than you lose. The idea, then, is to play lots of hands, stacking the “win more than lose” and minimizing the impact of chance.

This is the real reason for the “4–15” ranking heuristic above. A losing hand of a refresh is a lot less of a bummer for an article ranking in position 12 than for one ranking in position 1, but a winning hand can rocket you up the ladder.

Expanding the “Nothing to Lose” Umbrella

But raw ranking is really just a reductive shortcut for evaluating risk (whether we have much to lose or not). There are other scenarios in which we have nothing to lose (or gain):

What if a piece of content ranks #1 for a term but has been steadily declining in traffic?
What if a piece of content ranks, say, 5th for a term, but the term has no volume, and the post earns no traffic?

Our previous approach would ignore the first situation, even though it calls for a refresh, while executing a pointless refresh in the second situation.

To drive this home, let’s return to Make Me a Programmer. At the time of writing, it ranks #1 (below a featured snippet) for “git without github.”

But take a look at its month-over-month traffic. It’s declining, potentially because of that UMich-featured snippet in the mix:

“Don’t mess with success” doesn’t apply here because losing more than 30% of its traffic is hardly success. This is a situation where I’d do a refresh, particularly with a mind toward earning that featured snippet.

On the flip side, at the time of writing, Make Me a Programmer ranked fifth for the term “what non programming skills do programmers need” (counting various SERP widgets as occupying positions). And yet, in spite of this page one appearance, it has earned five visitors in five months. This seems like a waste of time and money to refresh.