From ff292deb12c9def19ec3b9d624bc29f396eb2726 Mon Sep 17 00:00:00 2001 From: Nick White Date: Sun, 7 Aug 2011 14:21:47 +0100 Subject: Update documentation, including add README --- TODO | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) (limited to 'TODO') diff --git a/TODO b/TODO index 6aaf198..7023e06 100644 --- a/TODO +++ b/TODO @@ -1,13 +1,5 @@ -got a stack trace when a connection seemingly timed out (after around 30 successful calls to -p) - -getgmissing doesn't work brilliantly with preview books as it will always get 1st ~40 pages then get ip block. getgfailed will do a better job - -list all binaries in readme and what they do - # other utils -getgbooktxt (different program as it gets from html pages, which getgbook doesn't any more) - getabook getbnbook @@ -24,12 +16,13 @@ write some little tests ## getgbook -have file extension be determined by file type, rather than assuming png - -think about whether default functionality should be dl all, rather than -a +Note: looks like google allows around 3 page requests per cookie session, and exactly 31 per ip per [some time period > 18 hours]. If I knew the time period, could make a script that gets maybe 20 pages, waits for some time period, then continues. -to be fast and efficient it's best to crank through all the json 1st, filling in an array of page structs as we go - this requires slightly fuller json support - could consider making a json reading module, ala confoo, to make ad-hoc memory structures from json +got a stack trace when a connection seemingly timed out (after around 30 successful calls to -p). enable core dumping and re-run (note have done a small amount of hardening since, but bug is probably still there). -Note: looks like google allows around 3 page requests per cookie session, and exactly 31 per ip per [some time period]. If I knew the time period, could make a script that gets all it can, gets a list of failures, waits, then tries failures, etc. Note these would also have to stop at some point; some pages just aren't available +running it from scripts (getgfailed.sh and getgmissing.sh), refuses to Ctrl-C exit, and creates 2 processes, which may be killed independently. not related to torify + multiple processes seems to be no bother + ctrl-c seems to be the loop continuing rather than breaking on ctrl-c; e.g. pressing it enough times to end loop works. + due to ctrl-c on a program which is using a pipe continues the loop rather than breaking it. using || break works, but breaks real functionality in the scripts + see seq 5|while read i; do echo run $i; echo a|sleep 5||break; done vs seq 5|while read i; do echo run $i; echo a|sleep 5; do + trapping signals doesn't help; the trap is only reached on last iteration; e.g. when it will exit the script anyway -- cgit v1.2.3