TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

# before 0.4

mkdir of bookid and save pages in there

have websummary.sh print the date of release, e.g.
  getxbook 0.3 (sig) (2011-08-02)

# other utils

getabook

getbnbook

# other todos

use wide string functions when dealing with stuff returned over http; it's known utf8

bug in get(): if the \r\n\r\n after http headers is cut off between recv buffers

create a tcl/tk gui wrapper, which asks for the book downloader to run, the book id, and then shows output of stdout & stderr as it runs

package for osx and windows

try supporting 3xx in get, if it can be done in a few lines
 by getting Location line, freeing buf, and returning a new
 iteration.

add https support to get

write some little tests

would likely be rather tricky, but building for android
would be nice. how it would work would be modifying the
getgbook src slightly, redefining function calls to be
findable by the java, and then writing java stuffs to call
it. gui could either be done from the java directly, or from
xml; both are gross options. see:
http://developer.android.com/resources/tutorials/hello-world.html
http://marakana.com/forums/android/examples/49.html

## getgbook

use realloc to size pages memory structure appropriately

### notes

Google will give you up to 5 cookies which get useful pages in immediate succession. It will stop serving new pages to the ip, even with a fresh cookie. So the cookie is certainly not everything.

If one does something too naughty, all requests from the ip to books.google.com are blocked with a 403 'automated requests' error for 24 hours. What causes this ip block is less clear. It certainly isn't after just trying lots of pages with 5 cookies. It seems to be after requesting 100 new cookies in a certain time period - 100 in 5 minutes seemed to do it, as did 100 in ~15 minutes.

The method of getting all pages from book webpage does miss some; they aren't all listed. These pages can often be requested, though, though at present getgbook can't, as if a page isn't in its initial structure it won't save the url, even if it's presented.