MobileRead Forums - View Single Post

tomsem · 05-02-2024, 08:47 PM

Quote:

Originally Posted by jackm8

I figured that as well, the hard way. I was converting a scan into pdf, did ocr on it, then I converted it into kfx and opened it on Kindle. No ocr, no searchable text. Eventually I figured that the problem was Kindle Create. It simply discards ocr as it creates kfs. I didn't go with Send to Kindle, since I don't like it butchering the quality of my files, so I learned to live with sideloaded KFX files. I get covers. I need to open dictionary separately when needed. I can annotate things via scribes pen thingy, but exporting them is a hassle, though. Scribe simply converts note into .png and saves it next to the file, the problem is that user isn't given access to those folders when device is connected to pc. And adobe loves to make life a misery for people using sideloaded formats, so it refuses to export notes as well. Simplest way, but still a nuisance, is screenshotting the screen with note on it.

It is not an issue with Kindle Create. Kindle Create is a front end to KDP, and produces KPF that you upload to KDP. It does not support production of KFX files locally.

But it so happens you can feed a KPF to Kindle Previewer to convert via its command line, which is what KFX Output plugin does. The only problem is that Kindle Previewer is not designed to handle fixed-layout content. It's somewhat unexpected that KP even attempts to convert them. And this is referenced in KFX Ouput 'Warnings and Limitations':

Quote:

Many books cannot be successfully converted to KFX using the Kindle Previewer, including those using fixed-layout.

The PNG files are just thumbnail images of a single notebook page, they are just like any other thumbnail for any item on the Kindle, and are not in any sense 'data'.

You can export NBK files (the actual notebook and annotation data files) to FXL ePub using KFX Input plugin command line. The export preserves the scale-independent vector format that pen annotations use by converting pen strokes to SVG objects.

I have not experienced any quality degradation with the PDFs I typically send via Send to Kindle, mostly these were never scanned documents.

Scanned documents do not fare well even with OCR of what appear to be good quality scans. OCR just adds text objects positioned over a raster image so there is the illusion of text selection. It cannot make the images of text scale up smoothly; you are stuck with whatever the scan resolution was. I've spent many hours just trying to fix scanned documents to look better with PDF viewers and it's just not possible to achieve a very good result: you really need to re-publish the content for that.

PDF documents generated directly from text documents do not represent text as raster image, but as vector objects. So you can convert them to whatever resolution you require when rasterizing them for some other format.

From what I can tell, when converting vector objects to a raster image, Send to Kindle sets resolution to the maximum zoom level that Scribe (or Kindle apps generally) supports, and they look perfectly crisp to me.

Raster images in the PDF will not scale smoothly, and it is up to the author to decide what resolution to embed. Often they fall short of print resolution, much less what you can see when zooming in. Ideally maps and charts are in vector format in the PDF. Not only do they consume less space, but they scale smoothly.

Photos cannot usually be vectorized so always involve tradeoffs. You cannot have 1000dpi images making PDF impossibly large.

Despite the name, Acrobat Reader is not a reading app. It is intended for use for things like e-signing, and participating in PDF workflows where it is important to retain and track annotations as a part of the document. PDF has a rich set of annotation tools, and most of the annotations would make no sense outside the context of the document itself.

There are many reading apps that support exporting annotations that retain enough context to be meaningful standalone, for example to produce references for a term paper, or to post on Goodreads. Adobe leaves that to others. They're free use PDF format for whatever targeted use cases they choose.