From e6037966d0fc676b78bce9a4dd0b7776ab9f4a7b Mon Sep 17 00:00:00 2001 From: Nick White Date: Mon, 15 Aug 2011 19:08:28 +0100 Subject: Add lots of TODOs --- LEGAL | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'LEGAL') diff --git a/LEGAL b/LEGAL index d305f90..1e788d9 100644 --- a/LEGAL +++ b/LEGAL @@ -16,14 +16,15 @@ See section 5.3 of http://www.google.com/accounts/TOS. Their robots.txt allows certain book URLs, but disallows others. -We use two types of URL: +We use three types of URL: +http://books.google.com/books?id=&printsec=frontcover http://books.google.com/books?id=&pg=&jscmd=click3 http://books.google.com/books?id=&pg=&img=1&zoom=3&hl=en& robots.txt disallows /books?*jscmd=* and /books?*pg=*. However, Google consider Allow statements to overrule disallow statements if they are longer. And they happen to allow /books?*q=subject:*. -So, we append that to both url types (it has no effect on them), -and we are obeying robots.txt +So, we append that to the urls (it has no effect on them), and +we are obeying robots.txt Details on how Google interprets robots.txt are at http://code.google.com/web/controlcrawlindex/docs/robots_txt.html -- cgit v1.2.3