blob: ec1a2c8338c9bc0dc9e64fc87db2a2c0e2087d75 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
# Getgbook
## TOS
Google's terms of service forbid using anything but a browser
to access their sites. This is absurd and ruinous.
See section 5.3 of http://www.google.com/accounts/TOS.
Thankfully, however, for Google Books one is only bound to it
"for digital content you purchase through the Google Books
service," which does not affect this program.
See http://www.google.com/googlebooks/tos.html
## robots.txt
Their robots.txt allows certain book pages, but disallows
others.
We use two types of URL:
http://books.google.com/books?id=<bookid>&pg=<pgcode>&jscmd=click3
http://books.google.com/books?id=<bookid>&pg=<pgcode>&img=1&zoom=3&hl=en&<sig>
robots.txt disallows /books?*jscmd=* and /books?*pg=*. However,
Google consider Allow statements to overrule disallow statements
if they are longer. And they happen to allow /books?*q=subject:*.
So, we append that to both url types (it has no effect on them),
and we are obeying robots.txt
|