Welcome to my first blog post.

Today, I want to discuss the correlation between the phrases someone types into a search engine and where they are in the purchasing funnel. This measurable relationship can provide a significant financial benefit for pay-per-click (PPC) keyword advertising. PPC ads are sold in an auction environment, and you can bid on them using specific search terms. You can outbid your competition for the phrases that yield higher conversion rates if you can calculate search ambiguity and thus correlate it to a particular stage in the buyer’s journey.

Intro to the Purchasing Funnel

First, let’s define the purchasing funnel, courtesy of Wikipedia:

The purchasing funnel defines distinct stages experienced by a shopper. Note that the downward arrow to the left-hand side of the image shows that the number of potential customers will be reduced at each funnel stage. This reduction is an important observation we will return to in a moment.

Text Ambiguity

Now, let us explore search text ambiguity. A phrase can be considered ambiguous if there are multiple interpretations or multiple “correct” answers. For example, the question “What is ACM?” could yield at least 32 unique and unrelated answers that would all be correct due to the overused ACM acronym.

Another variation of text ambiguity may occur when a PPC marketer bids on keyword phrases using matching criteria called broad match. Broad match keyword matching will show your advertisement whenever your keywords are used, in any order, even with non-specified sub-phrases. Consider the PPC keywords “Hammer Drill”. Broad match can match for that exact phrase but also for a large number of variations, including:

  • Dewalt Hammer Drills
  • DCB805B Hammer Drill
  • Milwaukee Hammer Drills
  • What is better a Drill or a Hammer
  • Drill Bits for Hammer Drill
  • Where to Sell a Hammer Drill
  • …..

If you intend to match a user search to the most likely product page that will yield a sale, search ambiguity will make your job much harder.

Real World Search Example

Now, imagine you are about to embark on a home improvement project and need to drill a series of holes into your home’s brick exterior. You are a newbie, so you must first perform research via Google and YouTube to learn the technique. Please take a look at the augmented funnel graph below where I illustrate the progression of search phrases as our DIY searcher moves through the purchase funnel.

For a specific store, potential customers drop out of the purchasing funnel at each stage for various reasons. A store may not sell hammer drills, Dewalt products, the specific model, or their prices may not be competitive.

Here are a few observations about the purchase funnel:

  • Search phrases become less ambiguous and more targeted as the funnel progresses
  • Conversion rates will increase as search phrases become less ambiguous
  • There is an inverse correlation between how often a phrase is used and its specificity

So, let’s break this down a bit, starting with the progression of search phrase specificity. Notice how the number of viable products is reduced with each query, going from every possible method for drilling a hole in brick, to all hammer drills, to highly rated hammer drills, to Dewalt hammer drills, and finally to a specific Dewalt model number.

I often find it fascinating how search phrases can “telegraph” where a searcher is in the purchasing funnel. As they zero in on their preferred product, their search phrases become more precise, indicating they are closer to purchasing. The specificity of search phrases correlates directly to higher conversion rates.

The disappointing aspect of very precise search phrases, often referred to as “the long tail” by search engine marketers, is that they are not encountered as frequently as early-stage phrases. For this reason, many marketers focus on the beginning of the funnel and ignore the money phrases with high conversion rates.

Detecting Specificity in Search

Initially, I mentioned that ambiguity could be measured. I have found great success with a variation of the Inverse Document Frequency (IDF) metric. So, let’s first go over the definition of inverse document frequency. Please note that my IDF calculations exclude stop words such as “a”, “and”, “the”, etc. This stop word list comes from the Lucene codebase and represents common words that are ignored when computing search relevance.

For this discussion, a document contains the title, description, and attributes of a specific retail product such as a Dewalt DCD805B hammer drill. My specific corpus contains about 250 million retail products.

From Wikipedia:

The inverse document frequency is a measure of how much information the word provides, i.e., how common or rare it is across all documents. It is the logarithmically scaled inverse fraction of the documents that contain the word (obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient):

{\displaystyle \mathrm {idf} (t,D)=\log {\frac {N}{|\{d:d\in D{\text{ and }}t\in d\}|}}}

with

  • {\displaystyle N}: total number of documents in the corpus {\displaystyle N={|D|}}
  • {\displaystyle |\{d\in D:t\in d\}|} : number of documents where the term {\displaystyle t} appears (i.e., {\displaystyle \mathrm {tf} (t,d)\neq 0}). If the term is not in the corpus, this will lead to a division by zero. It is therefore common to adjust the numerator {\displaystyle 1+N} and denominator to {\displaystyle 1+|\{d\in D:t\in d\}|}

Please note: t is defined as a single term in the above definition. For my ambiguity calculations, t is defined as all words in a phrase, excluding stop words. The order in which they occur in the document is not taken into account.

So, what is an optimal inverse document frequency when screening PPC keywords? That entirely depends on your marketing goals. A brand marketing campaign might use much looser IDF restrictions, but it should be understood that each stage of the purchasing funnel requires a different type of landing page! The idea that one could algorithmically identify instances where the wrong landing page was used is quite fascinating! A poor keyword conversion rate and ambiguity identification can lead to a fruitful course of action. (or prevent mistakes before a new PPC campaign has launched)

I tend to target instant sales for my business, so my optimal IDF score is tuned via a feedback loop that optimizes for maximum profit. It is important to note that this algorithm does not optimize for the highest conversion rate. Instead, it optimizes for the highest projected profit by taking into account keyword search volume and average cost per click. In marketing, you are always battling the inverse correlation between precision and volume. Precision yields a great user experience, but volume pays the bills.

I mentioned that I had been lucky with a variation of IDF. The trick I employed, specifically when evaluating potential PPC keywords to bid on at scale (e.g., assessing billions of permutations), was to pick a normalized search filter phrase and apply it to my corpus before calculating IDF on the resulting document subset. This technique provided several benefits. By enforcing specificity, it targeted the later stages of the purchasing funnel. It also improved the landing page experience by ensuring more targeted search results. For example, consider the following search phrases where the anchor filters are in bold:

  • Dewalt Hammer Drill
  • 18v Hammer Drill
  • 1/2 Inch Hammer Drill
  • Dewalt 18v 1/2 Inch Hammer Drill

The last example has the highest conversion rate because it represents the smallest document set and the latest stage in the purchasing funnel. It also has poor search volume…

In Conclusion

Being aware of ambiguity in my data and leaning into precise segmentation has paid dividends over the years. In my next article, I will touch on how I used segmentation to precisely predict conversion rates for low-volume long-tail PPC keywords.

Leave a comment

I’m James Krewson

On this blog, I share knowledge gained from my years of experience working on fascinating and complex data science problems.

Let’s connect