The World Wide Web, or www, as it’s commonly referred to, consists of more than 60 trillion web pages and files interlinked together by a complex set of hyperlinks. Search engines help us navigate through this tremendous amount of data to find the most relevant information regarding what we were searching for in a matter of seconds. Search engines are a regular part of daily life for almost everyone but few take the time to understand how they work. If you’re here reading this I assume you want to know so although the actions and computing power required to perform the billions of searches Google, Yahoo! and Bing do everyday is very complex, how they do this can be summarized in 3 simple steps.
Step 1: Crawling – This is the process by which the information is found.
Before Google can show you which website or webpage is most relevant to your search, it has to find the webpage first! As there is no way humans could possibly keep track of the over 60 trillion web pages (and growing everyday) they relay on software to do so called spiders. These spiders “crawl” the web and build lists of words they found on the sites they crawled.
Crawling is used to find 2 types of information:
- New webpages with valuable content which could be relevant to users
- Current webpages which have been updated with new content
Google’s spiders have a huge job in front of them so they will not crawl the same sites or pages unless there is a reason to. The search engines have developed their own unique algorithms to decide which pages they should crawl and how frequently they do so. Some websites are easier for spiders to crawl than others. Websites that are built in flash, javascript or AJAX can be harder to crawl, as can a page which contains lots of images or video. If you work with an Austin search engine optimization company like Neon Ambition, we can help ensure your website’s information architecture is built in such a way to appease the spiders and make it easy for them to index your site.
Step 2: Indexing – The storing of information
Once the spiders have collected the information it needs to be stored. This is happening constantly as the web is expanding rapidly every day, therefore the spiders are always crawling for new information which needs to be indexed. Fundamentally the index is a huge database which is stored on giant servers all over the world. This makes them quickly accessible during a search no matter where in the world you are located. This also means that when you perform a search you are not actually searching the web, but instead the search engine’s index of the web.
The information found by the spider is analysed and a list of words/ phrases within the document is built. During the process the following factors are considered:
Weight or Importance of the word/ phrase:
Depending on where in the document the word is found, the search engines will assigned a value to its importance. They assume that if a word or phrase is mentioned near the top of the document it must be more important than if it was found at the very end. Just like a newspaper has headings so to do websites and this is another area search engines pay more attention and give more weight to than just the words in the article. If a word or phrase is used in a link this is another indicator of importance. The big three search engines all have slightly different algorithms for assigning weight to words in their own index. This is why if you perform a search on Google, and then the search on Bing you will get different results and rankings.
Number of times the word/ phrase appears.
Good quality content on a particular subject will likely contain lots of mentions of the search term, however there are cases of spam where a word is used excessively without adding value in an attempt to achieve a better listing. This is called keyword stuffing and the search engines are onto business owners and seo companies who practice this approach.
Step 3: Ranking – Showing the user the most relevant content
The page the search engine’s show after you make a search tries to present the most relevant and useful pages at the top of the list of search results. This is the part of search engines that most of us are most familiar with – our search results. As we perform our search, the search engines are working to find in their index the pages that are most relevant to our search. Their algorithms take into consideration over 200 different factors to then score each page that is relevant to decide which is most relevant and rank these pages for you. Most of us never look past the first ten results on the first page even though there are often ten or more pages of results, all with ten different web pages on them.
This video from Google helps to explain these 3 steps further:
So there you have it folks. The basic principles behind how search engines work. If you are wondering how easily Google can index your site or if your content is relevant to the searches you want to show up for, please give us a call or fill out our contact form. Our SEO consultants would be happy to help.