How is WDF*IDF calculated?

TF*IDF calculates the term weight W of a given word i in a document j. The formula thus multiplies the frequency of a word i in a particular document j by the frequency of this word within many relevant documents. The formula is therefore Wi,j = WDFi,j * IDFi.

What is the function of IDF and how is it calculated?

The inverse document frequency is not so easy to explain using a translation. In short, this formula sets the keyword in relation to all documents that contain this word. To calculate it, two document collections are needed: one with the value of all relevant documents with the same topic (ND), divided by all documents with the keyword being examined (Ni). Here, too, the logarithm compresses the result to avoid outliers. The formula for calculating IDF is: IDFi = log10 (1 + ND / Ni)

Why is WDF*IDF indispensable for effective SEO?

Keywords and content on a page continue to play an important role in search engine optimisation. Although the influence of these factors has decreased over the years, they are still an important indicator of the relevance of a website. That is why a good WDF*IDF analysis is essential to cover all important keywords and use them in the text. Many WDF*IDF tools show, among other things, which keywords are still missing that occur very frequently in other documents. In addition, WDF*IDF helps in SEO with the holistic creation of content. In addition to keywords, the result shows further terms that often appear in connection with the main keyword. This can further increase the relevance of the content. If the topic is not yet fully described, these relevance-enhancing keywords reveal further topics that have not been considered before. Thus, you can use WDF*IDF tools to ensure that all keywords appear in the text and are sufficiently covered and, in addition, further enrich the text with relevance-enhancing terms. The content then appears very relevant and extensive to the search engine crawler, increasing the likelihood of a good ranking.

What is the difference between TF*IDF and WDF*IDF?

The two formulas are often used synonymously. In fact, however, there is a difference. As we have just learnt, the weighting of a term in relation to the rest of the document is set by the TF-IDF and compressed by the logarithm. TF, on the other hand, is a conventional calculation of the keyword density, i.e. the ‘term frequency’. Formerly a proven tool, keyword density is rarely used today because the mere number of a keyword does not provide any reliable information about the weighting. The formulas are therefore very similar, but lead to different results. While TF*IDF tends to produce more extreme values and reacts more sensitively, HDF*IDF produces a compressed value that is easier to understand and less sensitive to so-called ‘outliers’.

What are the disadvantages of WDF*IDF?

Despite the many advantages of WDF*IDF, this formula is not a panacea for SEO. The content must be well and attractively written even with WDF*IDF and should be supported with graphics or images. After all, content is just one of many factors for effective SEO. The loading time of the website and, above all, mobile optimisation play at least as important a role as content. Last but not least, the competition for the keywords is important and how Google classifies the domain as a whole. Thanks to TF*IDF, the content can be very well optimised to satisfy the user's intention and to cover the topic in detail. TF*IDF analyses words based on their frequency, without fully taking into account the context or semantic meaning. This can lead to a neglect of the overall context and the relevance of the content. It is therefore important to remember that it is no substitute for expertise, knowledge and research. We therefore recommend using TF*IDF for optimisation. You should also be aware that TF*IDF can encourage keyword stuffing, which often makes texts unpleasant to read. It is therefore advisable to use TF*IDF sensibly and deliberately, rather than thoughtlessly pepper your text with the words found.

WDF*IDF

The abbreviation WDFIDF stands for “Within document frequency*Inverse document frequency”. Through WDF*IDF texts can be analyzed and evaluated. WDF determines the relevance of the content and IDF the weighting of a word compared to other documents with similar content.

Calculate and understand WDF*IDF

While the formula WDF*IDF is not particularly difficult, the calculation of the two individual parts WDF and IDF requires significantly more mathematical understanding. First, let’s take a look at the main formula and shed some light on it:

W_i,j= WDF_i,j * IDF_i

The long-winded explanation is the following: WDF*IDF calculates the term weight W of a specified word i in document j. So the formula multiplies the frequency of a word i in a given document j with the frequency of this word within many relevant documents.

An example for easier understanding:

In our wiki there is an article about the topic target group. Now we want to examine how the main keyword “target group” performs in relation to other texts of the same category. To find out this weighting, we calculate the WDF*IDF value of this keyword. The higher the value, the less relevance and weight the keyword has compared to other documents.

The WDF value of our article is 0.5392 and the IDF value is 2.5117. If we multiply the two values, we get the WDF*IDF value of 1.354. This is a good value. We will now take a closer look at how the individual values come about in detail.

The calculation and function of WDF

The name “Within Document Frequency” already describes the purpose of WDF quite well. WDF calculates how often a certain word occurs in the text. Some people might now think of the well-known keyword density and WDF actually works similarly. However, the logarithm in the formula compresses the result, so that a very frequent repetition of the keyword leads to a significantly better score. So the relative frequency of a word is calculated by putting the keyword in relation to the other words. This is to prevent senseless keyword stuffing.

To calculate the WDF value, you need the frequency of the keyword (i) in the document (j). Then divide this value by the total length (L) of the text.
Small tip: If you don’t have a log2 button on your calculator, you can still calculate a log2 with ln(VALUE) / ln(2).

WDF_i,j = log₂ (Freq_i,j + 1) / log₂ (L)

Let’s stick with our example:

In our article, the word “target group” is present a total of 37 times. This corresponds to a freqi,j of 37. The total length of the text (L) is 850 words. If we write this into our formula, it results in:

WDF = log2 (37 + 1) / log2 (850)

WDF = 0,5392

If we increase the keyword count of “target group” by 20, to a total of 57, this would result in a WDF value of 0.6019. Although the keyword count has increased by 54%, the WDF value has only increased by 0.0627 points. If, on the other hand, we had only 17 times the word target audience in the text, i.e. 20 times less, the WDF value would be 0.4285. So it has decreased by more than 0.1. This shows that it does not help much to mention the keyword unnaturally often in a text.

The calculation and function of IDF

The “Inverse Document Frequency” cannot be explained so nicely with the help of a translation. In short, this formula relates the keyword to all documents that contain this word. Two document collections are needed for the calculation: One with the value of all relevant documents with the same topic (ND), divided by all documents with the examined keyword (Ni). Again, the logarithm provides a compression of the result to avoid outliers.

IDF_i = log10 (1 + N_D / N_i)

Again, we take our example to hand:

We have classified the text within the main category “Marketing”. According to Google, there are about 2,970,000,000 contents (ND) for this. Now we still need the content in this area that contains our keyword “target group”. To do this, we simply enter “marketing target group” in Google and get around 9,170,000 results (Ni). Let’s now put these numbers into the formula:

IDF = log10 (1 + 2.970.000.000 / 9.170.000)

IDF = 2,5117

In combination, the WDF*IDF formula thus gives an approximate indication of how relevant the keyword is compared to other documents with the same topic and keyword. Of course, the reliability and accuracy of the results increases with the number of documents analyzed. However, the necessary effort is too high, so it is impossible to include all documents. Furthermore, it is important that this analysis is performed for each important keyword in a text, in order to cover the topic as comprehensively as possible and not to forget anything.

WDF*IDF is indispensable for effective SEO

Keywords and content of a page still play an important role for search engine optimization. Although the influence of these factors has decreased over the years, they are still an important indicator for the relevance of a website. Therefore, a good WDFIDF analysis is essential to cover all important keywords and use them in the text. Because many WDFIDF tools show, among other things, which keywords are still missing, which occur very often in other documents.

Moreover, WDF*IDF helps in SEO for hollistic content creation. Besides keywords, the result shows other terms that frequently appear in connection with the main keyword. This can further increase the relevance of the content. If the topic is perhaps not yet comprehensively described, these relevance-increasing keywords uncover further topic areas that were not previously considered.

Thus, with the help of WDF*IDF tools, you can ensure that all keywords appear in the text and are sufficiently treated and additionally enrich the text with relevance-increasing terms. For the crawler of the search engines, the content then appears very relevant and extensive, so that the probability of a good ranking increases.

The small but subtle difference between WDFIDF and TFIDF

The two formulas are often used synonymously. In fact, however, there is a difference. WDF, as we just learned, puts the weight of a term in relation to the rest of the document and compresses it by the logarithm. TF, on the other hand, is a common calculation of keyword density, or “term frequency.” Formerly a proven tool, keyword density is rarely used today, as the mere number of a keyword does not provide reliable information about its weighting.

The formulas are therefore very similar, but produce different results. WDFIDF produces a compressed value that is easier to understand and less sensitive to so-called “outliers”, while TF*IDF analysis tends to output more extreme values and reacts more sensitively.

WDF*IDF is not the holy content grail

Despite the many advantages of WDFIDF, this formula is not the panacea for SEO. The content must be well and attractively written even with WDFIDF and should be supported with graphics or images. After all, the content is only one of many factors for effective SEO. The loading time of the website and especially mobile optimization play at least as big a role as the content. Last but not least, the competition for the keywords is also important and how Google ranks the domain overall.

Thanks to WDF*IDF, the content can therefore be optimized very well to satisfy the user intention and cover the topic in detail. However, it cannot replace expertise, knowledge and research. Therefore, we recommend using WDF*IDF for optimization. In addition, you should always keep in mind that WDF*IDF can lead to keyword stuffing and texts are often not pleasant to read as a result. It is therefore advisable to use WDF*IDF sensibly and thoughtfully, instead of spiking the text with the found words thoughtlessly.

Questions about WDF*IDF:

What does WDF*IDF mean?

WDF*IDF stands for “Within Document Frequency * Inverse Document Frequency” and indicates the relevance and weighting of a keyword within a document compared to all documents with similar content.

How does the WDF*IDF analysis work?

The WDF*IDF formula evaluates your own text and puts it in relation to all other documents that have the same content. You can then optimize your own text accordingly.

What is the benefit of a WDF*IDF analysis for SEO texts?

It ensures that the most important keywords are mentioned frequently enough and compares their weighting with the content of other websites. It also helps to find keywords that increase relevance.