Without a doubt, the most important part of search engine optimization is making sure that search engines are actually able to crawl your entire site. In this lesson you will learn a few tips for making sure that your website doesn't have any roadblocks that are going to keep it from being crawled and indexed. Let’s review a few of the ways you can check your crawl status.
Google Site Search Modifier
One quick way to tell if your entire site is being crawled by Google is to use the “site” operator in a search. For example, go to Google and type in site:yourwebsite.com. Google will show you a list of results that accounts for every page that they have crawled on your site. However, this is only useful if you know exactly how many pages you have published. Still, this can be very helpful in determining if Google has crawled far more URLs than your site should have, or if they haven’t crawled nearly enough.
The second way you can get this information from Google is through Webmaster Tools. Log into Webmaster tools > Crawl > Sitemaps. This will quickly show you how many pages you have submitted on your sitemap and how many pages Google has crawled. If it is showing that Google isn’t crawling a significant number of your pages, then there is an issue that you must look into. Also, on the Crawl Stats page you can get an idea of how many pages Google is crawling on your site each day.
If your site is having major crawl issues, then the first thing you should check is you robots.txt file. It is unfortunately very common to have a robots.txt mistake that keeps major sections of a site from being crawled, and this is typically fairly easy to find. Here is a resource for checking against your syntax: RobotsTxt.org
Perhaps the best resource for site crawl information is through your server logs. Your server logs will tell you what spiders are crawling your site and provide a full break down of that activity. If you can see that your site is having significant crawl errors, this type of an in depth server log analysis is necessary. Here is the information that you will need to pull from your server logs in order to find the issue: Host, Date, Page/File, Response Code, Referrers, and User Agent. Here is an in-depth article from Moz.com about finding crawl issues in server logs. Click this link for an incredible resource for finding site crawl issues: In depth guide to finding site crawl issues in server logs at Moz.com