1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
Note: looks like google allows around 3 page requests per cookie session, and about 40 per ip per [some time period]. If I knew the time period, and once stdin retry is working, could make a script that gets all it can, gets a list of failures, waits, then tries failures, etc. Note these would also have to stop at some point; some pages just aren't available
make sure i'm checking all lib calls that could fail
make sure all arrays are used within bounds
strace to check paths taken are sensible
use defined constants rather than e.g. 1024
getgbooktxt (different program as it gets from html pages, which getgbook doesn't any more)
getabook
getbnbook
openlibrary.org?
# once it is basically working #
try supporting 3xx in get, if it can be done in a few lines
by getting Location line, freeing buf, and returning a new
iteration.
add https support to get
to be fast and efficient it's best to crank through all the json 1st, filling in an array of page structs as we go
this requires slightly fuller json support
could consider making a json reading module, ala confoo, to make ad-hoc memory structures from json
write helper scripts like trymissing
write some little tests
have file extension be determined by file type, rather than assuming png
think about whether default functionality should be dl all, rather than -a
|