diff options
author | Nick White <git@njw.me.uk> | 2011-08-15 19:08:28 +0100 |
---|---|---|
committer | Nick White <git@njw.me.uk> | 2011-08-15 19:08:28 +0100 |
commit | e6037966d0fc676b78bce9a4dd0b7776ab9f4a7b (patch) | |
tree | ab4dc60b2344d0ee0106070384f6e22a49d905a4 /LEGAL | |
parent | 236fe25f4560e072bcb1a8cc2f5d1f50f30dbfe5 (diff) |
Add lots of TODOs
Diffstat (limited to 'LEGAL')
-rw-r--r-- | LEGAL | 7 |
1 files changed, 4 insertions, 3 deletions
@@ -16,14 +16,15 @@ See section 5.3 of http://www.google.com/accounts/TOS. Their robots.txt allows certain book URLs, but disallows others. -We use two types of URL: +We use three types of URL: +http://books.google.com/books?id=<bookid>&printsec=frontcover http://books.google.com/books?id=<bookid>&pg=<pgcode>&jscmd=click3 http://books.google.com/books?id=<bookid>&pg=<pgcode>&img=1&zoom=3&hl=en&<sig> robots.txt disallows /books?*jscmd=* and /books?*pg=*. However, Google consider Allow statements to overrule disallow statements if they are longer. And they happen to allow /books?*q=subject:*. -So, we append that to both url types (it has no effect on them), -and we are obeying robots.txt +So, we append that to the urls (it has no effect on them), and +we are obeying robots.txt Details on how Google interprets robots.txt are at http://code.google.com/web/controlcrawlindex/docs/robots_txt.html |