We're seeing some pages that exist in our
sitemap.xml but are inexplicably missing from Google's public search index.
You can't download http://superuser.com/sitemap.xml -- we protect this file because there have been issues with it in the past -- but googlebot can. We have verified via Google Webmaster Tools that the
sitemap.xml file was pulled down today and is rated OK with no errors (green checkmark).
sitemap.xml contains a list of the last 50,000 questions on our site that were asked. For example, this question ...
... exists in the
sitemap.xml as ...
Searching for "How to see the end of a long chain of symbolic links" gives only one result to questionhub.com which is scraping our data (a whole different problem).
You can increment the question count number and do an exact search for the question title and you will see this pattern persist.
These urls are in sitemap.xml but they are not showing up in Google's index -- and yet they show up on sites that scrape our creative commons data. Why would that be?