Organizational Research By

Surprising Reserch Topic

Question:Why did Google stop indexing pages from our sitemap.xml?


We're seeing some pages that exist in our sitemap.xml but are inexplicably missing from Google's public search index.

You can't download http://superuser.com/sitemap.xml -- we protect this file because there have been issues with it in the past -- but googlebot can. We have verified via Google Webmaster Tools that the sitemap.xml file was pulled down today and is rated OK with no errors (green checkmark).

alt text

The sitemap.xml contains a list of the last 50,000 questions on our site that were asked. For example, this question ...

http://superuser.com/questions/201610/how-to-see-the-end-of-a-long-chain-of-symbolic-links

... exists in the sitemap.xml as ...


http://superuser.com/questions/201610/how-to-see-the-end-of-a-long-chain-of-symbolic-links
2010-10-20
daily
0.2

Searching for "How to see the end of a long chain of symbolic links" gives only one result to questionhub.com which is scraping our data (a whole different problem).

You can increment the question count number and do an exact search for the question title and you will see this pattern persist.

These urls are in sitemap.xml but they are not showing up in Google's index -- and yet they show up on sites that scrape our creative commons data. Why would that be?

 


asked Sep 13, 2013 in Java Interview Questions by rajesh
edited Sep 12, 2013
0 votes
45 views



Related Hot Questions

2 Answers

0 votes
I think google might be having a hard time indexing your web pages, 50.000 is alot. So my suggestion would be breakdown your sitemap into pieces like so


" rel="nofollow" target="_blank">http://www.sitemaps.org/schemas/sitemap/0.9">
   
      http://www.example.com/sitemap1.xml.gz
      2004-10-01T18:23:17+00:00
   

   
      http://www.example.com/sitemap2.xml.gz
      2005-01-01
   



If you breakdown you will have a better luck of having those 50.000 urls indexed.

Sitemaps.org explanation of the issue

    You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 10MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.

    If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 10MB (10,485,760 bytes) and can be compressed. You can have more than one Sitemap index file. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.
answered Sep 13, 2013 by rajesh
edited Sep 12, 2013
0 votes
I think google might be having a hard time indexing your web pages, 50.000 is alot. So my suggestion would be breakdown your sitemap into pieces like so


" rel="nofollow" target="_blank">http://www.sitemaps.org/schemas/sitemap/0.9">
   
      http://www.example.com/sitemap1.xml.gz
      2004-10-01T18:23:17+00:00
   

   
      http://www.example.com/sitemap2.xml.gz
      2005-01-01
   



If you breakdown you will have a better luck of having those 50.000 urls indexed.

Sitemaps.org explanation of the issue

    You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 10MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.

    If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 10MB (10,485,760 bytes) and can be compressed. You can have more than one Sitemap index file. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.
answered Sep 13, 2013 by rajesh
edited Sep 12, 2013

...