summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNick White <git@njw.me.uk>2011-09-08 17:12:30 +0100
committerNick White <git@njw.me.uk>2011-09-08 17:12:30 +0100
commit45a96ef428c9808abcc6d4043f149eb6ac825ece (patch)
tree1818f6bf3fbff21cea2fb2228a8a0f04cb55eac3
parent6caa822d0ad5afb0d019e1b2afbbc53847d7179d (diff)
Add notes for how to get pages from amazon and barnes and noble
-rw-r--r--plans/abook33
-rw-r--r--plans/bnbook25
2 files changed, 58 insertions, 0 deletions
diff --git a/plans/abook b/plans/abook
new file mode 100644
index 0000000..0418bbd
--- /dev/null
+++ b/plans/abook
@@ -0,0 +1,33 @@
+final img urls look like:
+http://sitb-images.amazon.com/Qffs+v35lepeP2icY2OteGGgTPZO7sxgfhv6+rCKfpWLrJyvNAksvFu4WzV79TodydVXgzoaP3o=
+http://sitb-images.amazon.com/Qffs+v35lepeP2icY2OteGGgTPZO7sxgsXoL4TS0WgsVOflj1z8cVkwoGTF8uqsrBObiKx03xck=
+
+ugly, but need no cookie, and can be re-downloaded
+
+feature is called variously 'search inside this book' ('sitb') or 'look inside this book' ('litb')
+
+
+http://www.amazon.com/gp/reader/0140442278/ is reader link, but appears to just redirect back to book link, with js pre-opened
+
+sitb js library (not very obfuscated):
+http://z-ecx.images-amazon.com/images/G/01/digital/sitb/reader/v4/201010271203/sitb-library-js._V176048133_.js
+
+
+loadBookData looks promising - does ajax method:"getBookData",asin:asin - line 4903. this uses the SITB_READER_AJAX_URL. http://www.amazon.com/gp/search-inside/service-data?method=getBookData&asin=0140442278
+
+some page urls are contained in that. under the title 'litbPages', e.g. 'look inside the book'. other ajax calls are definitely the place to look; follow usage of AJAX_URL and jquery.ajax
+
+* note: https works :)
+
+the main book data only contains the initial pages linked to from sidebar; others are available from the interface by using next/prev buttons, or by scrolling. investigate the ajax calls further
+
+getSBData gives more metadata, not relevant here
+
+goToPage gives lots of good stuff (line 2972); page requested plus nearby ones. official arg requests use a 'token', but seem to be able to get away without one
+
+http://www.amazon.com/gp/search-inside/service-data?method=goToPage&asin=0140442278&page=27
+
+not all pages are available; if not, we get this (still a 200, mind):
+
+{"error":{"text":{"key":"PAGE_NOT_AVAILABLE_TEXT"},"title":{"key":"PAGE_NOT_AVAILABL
+E_TITLE"},"reftag":"rdr_bar_view"}}
diff --git a/plans/bnbook b/plans/bnbook
new file mode 100644
index 0000000..ea7200f
--- /dev/null
+++ b/plans/bnbook
@@ -0,0 +1,25 @@
+barnes & noble have a 'book viewer'/'see
+inside'. it is flash, which sucks. 'powered by zinio.com'
+
+http://search.barnesandnoble.com/Orange-Sunshine/Nicholas-Schou/e/9780312551834
+
+it seems you're only 'allowed' to open the viewer 2 or 3
+times per book per ip. if not, you get plain html, but
+seemingly only back & front covers:
+
+http://search.barnesandnoble.com/booksearch/imageviewer.asp?ean=9780312551834&imId=65316517
+http://search.barnesandnoble.com/booksearch/imageviewer.asp?ean=9780312551834&imId=65309649
+
+the flash urls are of the form:
+http://search2.barnesandnoble.com/BookViewer/?ean=9780061655944
+
+which leads to xml:
+http://search2.barnesandnoble.com/BookViewer/bookxml.asp?ean=9780061655944
+which leads to:
+http://search2.barnesandnoble.com/DigBooks/viewer/bookviewmanager.aspx?op=getbookinfo&ean=9780061655944
+this is xml, which has paths of for different qualities (for
+low quality, use tn rather than fp):
+http://search2.barnesandnoble.com/digbooks/proxy/54/9780312361983/%d/fp
+the %d can be replaced with the page number
+requesting a page which isn't available
+(freevendstatus="false") returns the standard book swf