Web sites have a simple way to tell search engines how to index their site. This consists of a file called robots.txt the web site places on the root of their site specifying the rules on how the "robots" can come in a traverse their site. It was first established in 1994 around the time when search engines were being born.
It recently resurfaced in the news not too long ago when Rupert Murdock was creating waves in the media about how search engines steal and reuse their content -- particularly their news and aggregating them such as news.google.com. Google's shameless response, to Murdock's embarrassment too, was simply "place a robots.txt file and let us know not go there". For the record, Murdock's news sites didn't do this and the story ended there.
Now with the Facebook IPO only a fews days away, I was curious to see the contents of facebook's robots.txt file. In a nutshell, most search engines (explicitly) are told to not index many of the main pages like photos, feeds, etc. So here is Facebook collecting tons of information on people, pictures, preferences, etc. and yet google can't even touch it -- ouch. That's like going to a fight with your best weapon -- just that you're weapon doesn't work.
You can see it here http://www.facebook.com/robots.txt
Google's plus.google.com robots file looks like this and can be viewed directly here plus.google.com/robots.txt