Restructure getgbook code

author: Nick White <git@njw.me.uk> 2011-08-21 21:14:24 +0100
committer: Nick White <git@njw.me.uk> 2011-08-21 21:14:24 +0100
commit: 6b059ae1888b0cf8d38c7fe9b4f5c10ec28ab7b6 (patch)
tree: 05e45a7b7f53277b6877f4c029e3d13ac45d281a /TODO
parent: fc43d1cacbb62fd854960901688e1b9b9752e7cd (diff)
1 files changed, 1 insertions, 5 deletions
diff --git a/TODO b/TODO
index 4eb35e4..6b08e9f 100644
--- a/TODO
+++ b/TODO
@@ -31,14 +31,10 @@ have websummary.sh print the date of release, e.g.
 
 mkdir of bookid and save pages in there
 
-add cmdline arguments for stdin parsing
-
-merge pageinfo branch
-
 ### notes
 
 Google will give you up to 5 cookies which get useful pages in immediate succession. It will stop serving new pages to the ip, even with a fresh cookie. So the cookie is certainly not everything.
 
 If one does something too naughty, all requests from the ip to books.google.com are blocked with a 403 'automated requests' error for 24 hours. What causes this ip block is less clear. It certainly isn't after just trying lots of pages with 5 cookies. It seems to be after requesting 100 new cookies in a certain time period - 100 in 5 minutes seemed to do it, as did 100 in ~15 minutes.
 
-The method of getting all pages from book webpage does miss some; they aren't all listed. These pages can often be requested, though.
+The method of getting all pages from book webpage does miss some; they aren't all listed. These pages can often be requested, though, though at present getgbook can't, as if a page isn't in its initial structure it won't save the url, even if it's presented.
author	Nick White <git@njw.me.uk>	2011-08-21 21:14:24 +0100
committer	Nick White <git@njw.me.uk>	2011-08-21 21:14:24 +0100
commit	6b059ae1888b0cf8d38c7fe9b4f5c10ec28ab7b6 (patch)
tree	05e45a7b7f53277b6877f4c029e3d13ac45d281a /TODO
parent	fc43d1cacbb62fd854960901688e1b9b9752e7cd (diff)