How GoogleBot interacts with your website
Mar 07
If I’m [googlebot] indexing for regular web search, and I see links to MP3s and videos, I probably won’t download those. Similarly, if I see a JPG, I will treat it differently than an HTML or PDF link. For instance, JPG is much less likely to change frequently than HTML, so I will check the JPG for changes less often to save bandwidth. Meanwhile, if I’m looking for links as Google Scholar, I’m going to be far more interested in the PDF article than the JPG file. Downloading doodles (like JPGs) and videos of skateboarding dogs is distracting for a scholar—do you agree?
—
After actually downloading a file, I use the Content-Type header to check whether it really is HTML, an image, text, or something else. If it’s a special data type like a PDF file, Word document, or Excel spreadsheet, I’ll make sure it’s in the valid format and extract the text content. Maybe it has a virus; you never know. If the document or data type is really garbled, there’s usually not much to do besides discard the content.
Very interesting read on how and what GoogleBot will do when accessing your website.
Link:
http://googlewebmastercentral.blogspot.com/…
Check out these posts too:
- What is that file extension?
- See what’s in your Chrome browser’s cache
- Easily add ratings to your website
- How to Convert a .BAT file or .VBS file into .EXE to Enable Pinning to XP Start Menu
- How do you know if a website is safe?
- How fast does your website load for your visitors?
- Speed up your website with Google’s mod_pagespeed
Facebook
RSS
Twitter
Recent Comments