summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNick White <git@njw.me.uk>2011-10-08 11:55:21 +0100
committerNick White <git@njw.me.uk>2011-10-08 11:55:21 +0100
commite79fe370ec4a2478046a4dda7026fcadaefcf931 (patch)
tree0ad393f1b7cadd0d35fe6e5ecaad4ec3dd1d20ac
parent67cac80cd657640f07ff57d6bf3419170082693d (diff)
Add notes to getgbook man page
-rw-r--r--TODO10
-rw-r--r--getgbook.115
2 files changed, 15 insertions, 10 deletions
diff --git a/TODO b/TODO
index 961c97e..4b0bf82 100644
--- a/TODO
+++ b/TODO
@@ -4,8 +4,6 @@ before 1.0: create bn tool, fix http bugs, be unicode safe, package for osx & wi
# other todos
-mention in getgbook man page that not all pages may be available in one run, but try later / from a different ip and it will try to fill in the gaps (can replace notes section here, too)
-
use the correct file extension depending on the image type (for google and amazon
the first page is a jpg, all the others are png)
@@ -44,11 +42,3 @@ it. gui could either be done from the java directly, or from
xml; both are gross options. see:
http://developer.android.com/resources/tutorials/hello-world.html
http://marakana.com/forums/android/examples/49.html
-
-### notes
-
-Google will give you up to 5 cookies which get useful pages in immediate succession. It will stop serving new pages to the ip, even with a fresh cookie. So the cookie is certainly not everything.
-
-If one does something too naughty, all requests from the ip to books.google.com are blocked with a 403 'automated requests' error for 24 hours. What causes this ip block is less clear. It certainly isn't after just trying lots of pages with 5 cookies. It seems to be after requesting 100 new cookies in a certain time period - 100 in 5 minutes seemed to do it, as did 100 in ~15 minutes.
-
-The method of getting all pages from book webpage does miss some; they aren't all listed. These pages can often be requested, though, though at present getgbook can't, as if a page isn't in its initial structure it won't save the url, even if it's presented.
diff --git a/getgbook.1 b/getgbook.1
index 9440030..b38af69 100644
--- a/getgbook.1
+++ b/getgbook.1
@@ -29,3 +29,18 @@ line).
is the unique ID Google assigns to each book. It is 12
characters long. It can be found by looking for the 'id='
part of the URL of its Google Books page.
+.SH NOTES
+Some pages of "limited preview" books are never available.
+.PP
+Book pages vary in availability depending on the location of
+your IP.
+.PP
+getgbook will not try to download pages that have already
+been downloaded, so stopping and then starting it later will
+continue from where it left off.
+.PP
+getgbook uses several cookies to get as many pages as possible.
+However Google Books also limits the number of pages based on
+IP address, for 24 hours. Therefore if not all pages are
+downloaded it may be worth rerunning getgbook in 24 hours, or
+from a different IP.