Abide by google's robots.txt, and lay out legal issues

author: Nick White <hg@njw.me.uk> 2011-08-07 12:46:52 +0100
committer: Nick White <hg@njw.me.uk> 2011-08-07 12:46:52 +0100
commit: 62563596f477238d480fe4a701544413b6c722f5 (patch)
tree: 91379e6c654e9e9ff47793a892aa990e737e85dc /LEGAL
parent: 3d08e78700331588f6d43db725cc361f841c012d (diff)
1 files changed, 27 insertions, 0 deletions
diff --git a/LEGAL b/LEGAL
new file mode 100644
index 0000000..ec1a2c8
--- /dev/null
+++ b/LEGAL
@@ -0,0 +1,27 @@
+# Getgbook
+
+## TOS
+
+Google's terms of service forbid using anything but a browser
+to access their sites. This is absurd and ruinous.
+See section 5.3 of http://www.google.com/accounts/TOS.
+
+Thankfully, however, for Google Books one is only bound to it
+"for digital content you purchase through the Google Books
+service," which does not affect this program.
+See http://www.google.com/googlebooks/tos.html
+
+## robots.txt
+
+Their robots.txt allows certain book pages, but disallows
+others.
+
+We use two types of URL:
+http://books.google.com/books?id=<bookid>&pg=<pgcode>&jscmd=click3
+http://books.google.com/books?id=<bookid>&pg=<pgcode>&img=1&zoom=3&hl=en&<sig>
+
+robots.txt disallows /books?*jscmd=* and /books?*pg=*. However,
+Google consider Allow statements to overrule disallow statements
+if they are longer. And they happen to allow /books?*q=subject:*.
+So, we append that to both url types (it has no effect on them),
+and we are obeying robots.txt
author	Nick White <hg@njw.me.uk>	2011-08-07 12:46:52 +0100
committer	Nick White <hg@njw.me.uk>	2011-08-07 12:46:52 +0100
commit	62563596f477238d480fe4a701544413b6c722f5 (patch)
tree	91379e6c654e9e9ff47793a892aa990e737e85dc /LEGAL
parent	3d08e78700331588f6d43db725cc361f841c012d (diff)