summaryrefslogtreecommitdiff
path: root/TODO
blob: 79d56392f5ca6c0cbf5518db4eb3a033075ece69 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# before 0.4

mkdir of bookid and save pages in there

have websummary.sh print the date of release, e.g.
  getxbook 0.3 (sig) (2011-08-02)

# other utils

getabook

getbnbook

# other todos

use wide string functions when dealing with stuff returned over http; it's known utf8

bug in get(): if the \r\n\r\n after http headers is cut off between recv buffers

use HTTP/1.1 with "Connection: close" header

try supporting 3xx in get, if it can be done in a few lines
 by getting Location line, freeing buf, and returning a new
 iteration.

add https support to get

write some little tests

## getgbook

### notes

Google will give you up to 5 cookies which get useful pages in immediate succession. It will stop serving new pages to the ip, even with a fresh cookie. So the cookie is certainly not everything.

If one does something too naughty, all requests from the ip to books.google.com are blocked with a 403 'automated requests' error for 24 hours. What causes this ip block is less clear. It certainly isn't after just trying lots of pages with 5 cookies. It seems to be after requesting 100 new cookies in a certain time period - 100 in 5 minutes seemed to do it, as did 100 in ~15 minutes.

The method of getting all pages from book webpage does miss some; they aren't all listed. These pages can often be requested, though, though at present getgbook can't, as if a page isn't in its initial structure it won't save the url, even if it's presented.