1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
got a stack trace when a connection seemingly timed out (after around 30 successful calls to -p)
getgmissing doesn't work brilliantly with preview books as it will always get 1st ~40 pages then get ip block. getgfailed will do a better job
list all binaries in readme and what they do
# other utils
getgbooktxt (different program as it gets from html pages, which getgbook doesn't any more)
getabook
getbnbook
# other todos
try supporting 3xx in get, if it can be done in a few lines
by getting Location line, freeing buf, and returning a new
iteration.
add https support to get
write some little tests
## getgbook
have file extension be determined by file type, rather than assuming png
think about whether default functionality should be dl all, rather than -a
to be fast and efficient it's best to crank through all the json 1st, filling in an array of page structs as we go
this requires slightly fuller json support
could consider making a json reading module, ala confoo, to make ad-hoc memory structures from json
Note: looks like google allows around 3 page requests per cookie session, and exactly 31 per ip per [some time period]. If I knew the time period, could make a script that gets all it can, gets a list of failures, waits, then tries failures, etc. Note these would also have to stop at some point; some pages just aren't available
|