We're seeing some pages that exist in our sitemap.xml but are inexplicably missing from Google's public search index.

You can't download -- we protect this file because there have been issues with it in the past -- but googlebot can. We have verified via Google Webmaster Tools that the sitemap.xml file was pulled down today and is rated OK with no errors (green checkmark).

The sitemap.xml contains a list of the last 50,000 questions on our site that were asked. For example, this question ...

... exists in the sitemap.xml as ...

Searching for "How to see the end of a long chain of symbolic links" gives only one result to which is scraping our data (a whole different problem).

You can increment the question count number and do an exact search for the question title and you will see this pattern persist.

These urls are in sitemap.xml but they are not showing up in Google's index -- and yet they show up on sites that scrape our creative commons data. Why would that be?


