SEOIt.eu

Search Engine Optimisation Tips

Markov Chains for Text Content

TAGS: None

Markov chains represent the probability of the occurrence of an event given that some other event has occurred.

For example, if we take the distribution of letters from a large corpus of text, such as the complete works of Shakespeare, we can create a frequency table which measures the frequency with which a letter will follow another letter. We might find that the letter ‘b’ follows the letter ‘a’ with a probability p(b), while the letter ‘c’ follows the letter ‘a’, with a certain probability p(c). We can find the probabilities with which every letter follows another letter and build up a matrix. We have 27 characters (including a space) to represent the alphabet, this is known as our event space. Therefore a first order matrix will contain 27* 27 elements.

p(Xi=b|Xi-1=a)= number of occurrences of b following a / total number of occurrences.

Similarly we can build up a second order matrix which contains the probabilities of  a letter occurring after two other letters have occurred. i.e the probability of a letter occurring if we already have the two letters b and e. This matrix has 27 time more entries, since on each row we must have two letters aa, ab, ac, ..,ba,bb,bc,..,ca,cb, cc, .., …, .., za, zb, zc, … , zz.

Given a random variable Xj a Markov chain of order n would be ..

P(Xi+1=c|Xi=a,Xi-1=b,…)

The interesting properties of a Markov matrix is that all the sum of each column must be 1. Also known as transition matrices.

Markov chains are often represented by directed graphs which list the probabilities of following a path along each edge. This is also connection between this and PageRank. With PageRank calculation we use the graph to determine the probability that a page will be visited. Each node of the graph an element of transition matrix.

Finding the eigenvalues and eigenvectors of this matrix is to calculate the PageRank.

Finding the eigenvalues and eigenvectors of a Markov matrix is to calculate the steady state of the matrix.

All this is very interesting, but what the hell does it have to do with SEO? All this came from wanting to perform experiments on search engines. We all know you need content to index pages. I want to be able to create some content which would look like English. I found that content created by random text was not indexed. Presumably there is some sort of filter to remove nonsense, after all it is a relatively well known technique for creating spam emails. Anyone know anything about the filters that search engines apply to content before it is indexed? Markov discriminators such as CRM114 would be able to tell the which spam words follow other words and is better than the Bayesian spam identification.

A second attempt would be to create text based on a Markov chain of probabilities. A text generator could create the content required to help index my experimental pages.

Google Brand Update Hits UK SEOs

Tags: , ,

Back in February, 2009, some SEO people started to notice major changes in the rankings for quite generic terms in the US. Aaron Wall of SEO book wrote about this.

These changes seemed to be preferentially ranking site that belonged to brands name companies.
Matt Cutts, Google’s head of spam, confirmed on video that there had been a minor update to the rankings but rather than brands the update, nicknamed Vince after the engineer that was working on the project, were more about favouring trust, authority and reputation.
A few months on and many SEOs are reporting that sites that have been well ranked in the top 5 positions for many years, were losing their positions  without any possible reason.

In late June a number of people started to notice the Vince update had hit the UK results.  Long-standing rankings were also being lowered and wrote about the changes on their blogs.  Are big brands being given preferential treatment by Google
It seems clear that only a few terms are being affected by the changes. These seem to be quite generic.

No-one is entirely sure at the moment what factors are required for Google to take your site seriously. Speculation is rife over a number of blogs. Shark SEO had some reasonable suggestions as did David Harry at Fire horse trail.

For definitive examples we have to look at the data and on SEOUnique, they showed data for the time period of the Vince update for the phrase travel insurance. They also suggest changes for other highly competitive generic terms such as car insurance and job and domestic heating oil prices related terms.

iCrossing give more examples of keywords that are effected and they seem to be broadly in the travel and finance. It is too early to say whether other sectors will be effected significantly.

If we are dealing with search landscape where big-brands dominate, how can the little guy compete? If the branding affects only the most generic of terms, then realistically these would be largely out of reach of companies without deep pockets.

It is not all doom and gloom. Smaller companies can still target more specific terms. For many queries a generic term is going to drive traffic but it is probably not converting well. Use of more specific queries is better targeted to conversion and over time may lead to an increase in the site’s perceived trust in Google’s eyes.

Energetic Searching

Tags: , ,

An interesting post on the official Google blog estimates the energy used to perform an average search query. Remember the news article which said that two Google searches uses the same amount of  energy as boiling half a kettle of water (post subsequently revised).

Not quite. Calculations by Google engineers say that an average query takes around 1 kJ. To put this into context, the human body radiates around 100W, or 100J/s  doing nothing in radiated heat. A top cyclist can generate around 600W or 600 J/s when pedalling furiously. So that action of typing a query into Google does still take a significant amount of energy. The Carbon-Dioxide produced by a search is around 0.2 g. 

Multiply that by the thousands of queries that are being generated per second and that is an awful lot of energy. Next time you search perhaps you should think a bit more carefully about your search terms before pressing the search button.

From SEO the Game to SEO It

Tags: ,

My new blog starts here. Let me tell you something about myself. I am an search engine optimisation specialist. I used to write for the blog SEOtheGame.com but it was not under my control so I have decided to write my own blog. The aim is still the same, keep an eye on what is happening in the search engine optimisation and provide useful tips on how you can optimize your site.

Hopefully, I will continue where the old blog left off. I can be found on Twitter under the name of drnedflanders

© 2009 SEOIt.eu. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.