datahoarder

181 readers

1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago

MODERATORS

archivist@lemmy.ml

Google Books - colour images (lemmy.dbzer0.com)

submitted 1 year ago by antonim@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

3 comments fedilink hide all child comments

Google Books allows viewing the scans in colour, but when I click the option to download the PDF, I am provided only with a black-and-white version.

Is it known how to obtain the original colour images, outside of inspectelementing each page one by one?

you are viewing a single comment's thread
view the rest of the comments

[–] bela@lemm.ee 2 points 1 year ago* (last edited 1 year ago) (2 children)

I just spent a bit too much time making this (it was fun), so don't even tell me if you're not going to use it.

You can open up a desired book's page, start this first script in the console, and then scroll through the book:

let imgs = new Set();

function cheese() {    
  for(let img of document.getElementsByTagName("img")) {
    if(img.parentElement.parentElement.className == "pageImageDisplay") imgs.add(img.attributes["src"].value);
  }
}

setInterval(cheese, 5);

And once you're done you may run this script to download each image:

function toDataURL(url) {
  return fetch(url).then((response) => {
    return response.blob();
  }).then(blob => {
    return URL.createObjectURL(blob);
  });
}

async function asd() {
  for(let img of imgs) {
    const a = document.createElement("a");
    a.href = await toDataURL(img);
    let name;
    for(let thing of img.split("&amp;")) {
      if(thing.startsWith("pg=")) {
        name = thing.split("=")[1];
        console.log(name);
        break;
      }
    }
    a.download = name;
    document.body.appendChild(a);
    a.click();
    document.body.removeChild(a);
  }
}

asd();

Alternatively you may simply run something like this to get the links:

for(let img of imgs) {
	console.log(img)
}

There's stuff you can tweak of course if it don't quite work for you. Worked fine on me tests.

If you notice a page missing, you should be able to just scroll back to it and then download again to get everything. The first script just keeps collecting pages till you refresh the site. Which also means you should refresh once you are done downloading, as it eats CPU for breakfast.

Oh and NEVER RUN ANY JAVASCRIPT CODE SOMEONE ON THE INTERNET TELLS YOU TO RUN

[–] antonim@lemmy.dbzer0.com 3 points 1 year ago (1 children)

Well, I may be technologically semi-literate and I may have felt a bit dizzy when I saw actual code in your comment, but I sure as hell will find a way to put it to use, no matter the cost.

You're terrific, man. No idea what else to say.

[–] bela@lemm.ee 2 points 1 year ago

lmk if you run into an issue

This kind of stuff is like an IRL puzzle game. I thought it would be a simple five minute adventure, but of course google has made sure it isn't! I suppose for 3 stars I would have given it to you in a pdf format, but I fear the man who could do that in javascript.