summaryrefslogtreecommitdiff
path: root/LEGAL
diff options
context:
space:
mode:
authorNick White <git@njw.me.uk>2011-08-15 19:08:28 +0100
committerNick White <git@njw.me.uk>2011-08-15 19:08:28 +0100
commite6037966d0fc676b78bce9a4dd0b7776ab9f4a7b (patch)
treeab4dc60b2344d0ee0106070384f6e22a49d905a4 /LEGAL
parent236fe25f4560e072bcb1a8cc2f5d1f50f30dbfe5 (diff)
Add lots of TODOs
Diffstat (limited to 'LEGAL')
-rw-r--r--LEGAL7
1 files changed, 4 insertions, 3 deletions
diff --git a/LEGAL b/LEGAL
index d305f90..1e788d9 100644
--- a/LEGAL
+++ b/LEGAL
@@ -16,14 +16,15 @@ See section 5.3 of http://www.google.com/accounts/TOS.
Their robots.txt allows certain book URLs, but disallows
others.
-We use two types of URL:
+We use three types of URL:
+http://books.google.com/books?id=<bookid>&printsec=frontcover
http://books.google.com/books?id=<bookid>&pg=<pgcode>&jscmd=click3
http://books.google.com/books?id=<bookid>&pg=<pgcode>&img=1&zoom=3&hl=en&<sig>
robots.txt disallows /books?*jscmd=* and /books?*pg=*. However,
Google consider Allow statements to overrule disallow statements
if they are longer. And they happen to allow /books?*q=subject:*.
-So, we append that to both url types (it has no effect on them),
-and we are obeying robots.txt
+So, we append that to the urls (it has no effect on them), and
+we are obeying robots.txt
Details on how Google interprets robots.txt are at
http://code.google.com/web/controlcrawlindex/docs/robots_txt.html