Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
1
votes
2
answers
279
views
Extracting table of contents from PDFs
I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using the 'Area Text Selection' feature from my document reader to copy the text. Doing this for DJVUs with...
I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using the 'Area Text Selection' feature from my document reader to copy the text. Doing this for DJVUs with djview yields nicely formatted tables of contents, like this:
CONTENTS
1. EXPERIMENTS
1.1. The definition of an experiment ..... 1
1.2. Algebras of events as Boolean algebras .... 6
1.3. Operations with experiments ...... 9
1.4. Canonical representation of polynomials of events . . 12
....
I emphasize that all I did was drag my mouse across the page and click "Copy Text". If I try this with a PDF the structure is completely lost and I have to spend some time cleaning up the text selection, moving the page and section numbers around. I might get something like this:
Table of Contents
I
Introduction
1
Introduction
1.1
Table of Contents
1.2
Acknowledgments
1
3
3
6
II
....
I am looking for a PDF reader that can similarly copy the text but with the 'structure' preserved. The fact that DJVU readers have this capability tells me this ought to be possible.
Note: I am not talking about extracting ToCs from the bookmarks: many of my PDFs don't have any. I'd also like to avoid a CLI tool that has to process the entire file: I just want it to pick the text I select, but with the newlines and overall structure intact.
Luke
(13 rep)
Dec 16, 2024, 03:42 PM
• Last activity: Dec 16, 2024, 05:00 PM
1
votes
1
answers
137
views
ddjvu: using '-quality' option with values higher than 100
I try to figure out how to use `ddjvu` to convert DjVu files to PDF. If I use ``` ddjvu -format=pdf input.djvu output.pdf ``` the output PDF is very huge and consists of lossless images. Also, there appears a warning, which I will talk about later: > TIFFWriteDirectorySec: Warning, Creating TIFF wit...
I try to figure out how to use
ddjvu
to convert DjVu files to PDF. If I use
ddjvu -format=pdf input.djvu output.pdf
the output PDF is very huge and consists of lossless images. Also, there appears a warning, which I will talk about later:
> TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported.
To make the output file smaller and to consist of lossy images, man ddjvu
suggests to use the -quality
option:
> -quality=factor: Enables lossy JPEG compression for TIFF and PDF files. This option only affects images that cannot be encoded using the preferred TIFF/G4 compression. Argument factor is a quantization factor ranging from 25 to 150. See command cjpeg(1) for more information on JPEG quantization factors. Value 80 is a good starting point.
If I use values up to 100 (e.g., 80, 90, or 100),
ddjvu -format=pdf -quality 90 input.djvu output.pdf
the TIFFWriteDirectorySec warning doesn't appear. To my understanding, this is because using -quality
means that PDF will consist of lossy images, which in turn means that it will consist of JPEGs instead of TIFFs.
But if the value is higher than 100, e.g. 105 or 150, the warning appears again. Why is that?
jsx97
(1347 rep)
Jun 12, 2024, 04:04 PM
• Last activity: Jun 20, 2024, 06:05 PM
1
votes
1
answers
163
views
How to use COMPRESSION_ADOBE_DEFLATE instead of DEFLATE?
When converting to PDF using ddjvu: ``` ddjvu -format=pdf input.djvu output.pdf ``` there is a warning: > TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported. How can I use COMPRESSION_ADOBE_DEFLATE instead of delfate...
When converting to PDF using ddjvu:
ddjvu -format=pdf input.djvu output.pdf
there is a warning:
> TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported.
How can I use COMPRESSION_ADOBE_DEFLATE instead of delfate?
I tried
ddjvu -format=pdf -quality=COMPRESSION_ADOBE_DEFLATE input.djvu output.pdf
but though -quality=deflate
works, -quality=COMPRESSION_ADOBE_DEFLATE
and its variations (e.g., AdobeDeflate
) returns the error message:
> ddjvu: valid arguments for option '-quality' an integer between 25 and 150.
jsx97
(1347 rep)
Jun 8, 2024, 07:04 AM
• Last activity: Jun 11, 2024, 08:34 AM
0
votes
1
answers
58
views
ddjvu: Does it really support converting to PDF?
ddjvu (DjVuLibre), 3.5.28: ``` ddjvu --help ``` returns ``` -format=FMT Select output format: pbm,pgm,ppm,pnm,rle,tiff. ``` Note that it doesn't mention PDF. But then, why does the folowing command works and converts input DjVu file to output PDF one? ``` ddjvu -format=pdf -quality=85 input.djvu out...
ddjvu (DjVuLibre), 3.5.28:
ddjvu --help
returns
-format=FMT Select output format: pbm,pgm,ppm,pnm,rle,tiff.
Note that it doesn't mention PDF. But then, why does the folowing command works and converts input DjVu file to output PDF one?
ddjvu -format=pdf -quality=85 input.djvu output.pdf
jsx97
(1347 rep)
Jun 7, 2024, 10:14 AM
• Last activity: Jun 7, 2024, 08:22 PM
0
votes
1
answers
238
views
Determine the paper size of a djvu document
How can I determine the paper size of a djvu document? I tried it with both DjView and Evince, but did not find any mention of the paper size. I wouldn't mind using some CLI tool, but I don't have a clue what to use.
How can I determine the paper size of a djvu document? I tried it with both DjView and Evince, but did not find any mention of the paper size. I wouldn't mind using some CLI tool, but I don't have a clue what to use.
red_trumpet
(345 rep)
Mar 18, 2020, 10:51 AM
• Last activity: Mar 29, 2024, 08:57 AM
8
votes
1
answers
454
views
How can OCRed text be preserved while converting between djvu and pdf files?
Suppose a pdf file has OCRed text. How can we convert it to a djvu file and pass the OCRed text to the djvu file? Conversely, if exchange "pdf" and "djvu" in the above? Thanks!
Suppose a pdf file has OCRed text. How can we convert it to a djvu file and pass the OCRed text to the djvu file?
Conversely, if exchange "pdf" and "djvu" in the above?
Thanks!
Tim
(106430 rep)
May 24, 2014, 05:10 AM
• Last activity: Jan 14, 2024, 06:01 PM
4
votes
3
answers
916
views
Override page numbers of a djvu document
I have a djvu scan of a book. Let's consider two cases: 1. I'd like to number the pages `0, 1, 2, ...` (usage case: the cover should get be page 0) 2. I'd like to number some pages with Roman numbers and some with Arabic numbers, for example: `i, ii, iii, ..., x, 1, 2, 3, ...` (usage case: some intr...
I have a djvu scan of a book. Let's consider two cases:
1. I'd like to number the pages
0, 1, 2, ...
(usage case: the cover should get be page 0)
2. I'd like to number some pages with Roman numbers and some with Arabic numbers, for example: i, ii, iii, ..., x, 1, 2, 3, ...
(usage case: some introductory pages are numbered Roman in the book)
Is it possible to do it on Linux?
marmistrz
(2792 rep)
Jun 17, 2015, 08:14 AM
• Last activity: Nov 27, 2023, 10:05 AM
8
votes
2
answers
9562
views
convert djvu to pdf
How to convert djvu2pdf ? My current approach is : djvups x.djvu > x.ps ps2pdf x.ps Is there more efficient and better (in terms of output quality, data/metadata loss) way to handle that ?
How to convert djvu2pdf ?
My current approach is :
djvups x.djvu > x.ps
ps2pdf x.ps
Is there more efficient and better (in terms of output quality, data/metadata loss) way to handle that ?
Grzegorz Wierzowiecki
(14740 rep)
Nov 2, 2011, 05:04 PM
• Last activity: Jul 20, 2023, 10:33 AM
0
votes
0
answers
1206
views
Fixing slow-rendering PDFs on Linux
I've got some large document scans with embedded OCR text on [Internet Archive][1] I'd like to read. Unfortunately the PDF pages render very slowly on my document readers (Okular, Evince, Zathura). I previously used the DJVU files for this reason, but since they [stopped creating them][2] I am out o...
I've got some large document scans with embedded OCR text on Internet Archive I'd like to read. Unfortunately the PDF pages render very slowly on my document readers (Okular, Evince, Zathura). I previously used the DJVU files for this reason, but since they stopped creating them I am out of options. I have tried to convert to DJVU myself with
pdf2djvu
, djvudigital
, some online tools and even first going to JPEG and each time gotten very large files, as the programs seem to have trouble separating the foreground and background. So several questions:
1. How did the Internet Archive team previously produce their DJVUs? Can their process be replicated or approximated?
2. The second link suggests slow PDF rendering has been an issue for a while (at least over Linux). Are there any workarounds, like faster backends? I tried linearizing the files but that didn't improve things.
For testing the issue consider this volume of Poincaré's collected works
Kariuki
(1 rep)
Nov 29, 2022, 01:43 PM
4
votes
6
answers
6940
views
How to split each page of a djvu file?
In a djvu file, it has two book pages in one djvu page. I would like to split it so that one book page per djvu page. For example,![enter image description here][1] I was wondering if this can be done by some software, preferably command line utilities? Thanks and regards! PS: This is [a file][2] th...
In a djvu file, it has two book pages in one djvu page. I would like to split it so that one book page per djvu page. For example,
I was wondering if this can be done by some software, preferably command line utilities? Thanks and regards!
PS: This is a file that can be used for test.

Tim
(106430 rep)
Dec 6, 2011, 05:22 PM
• Last activity: Apr 27, 2022, 01:42 AM
3
votes
1
answers
411
views
Cli grep through djvu files
How can I grep through djvu files? They are text files with some images in it. Is there some equivalent to `pdfgrep` tool?
How can I grep through djvu files? They are text files with some images in it. Is there some equivalent to
pdfgrep
tool?
Pierre B
(2293 rep)
Jul 15, 2017, 12:30 PM
• Last activity: Jun 17, 2021, 09:29 PM
1
votes
1
answers
782
views
How to install recoll dependencies "djvutxt" and "python3:pylzma"?
I installed recoll on my Kubuntu 20.04. Now it says that external apps and commands are missing that are required for the indexing are missing, specifically: djvutxt (image/vnd.djvu) python3:pylzma (application/x-7z-compressed) but I have no idea how to install them. No such packages are shown in mu...
I installed recoll on my Kubuntu 20.04. Now it says that external apps and commands are missing that are required for the indexing are missing, specifically:
djvutxt (image/vnd.djvu)
python3:pylzma (application/x-7z-compressed)
but I have no idea how to install them. No such packages are shown in muon (my package manager GUI). How can I install them?
Make42
(739 rep)
Dec 16, 2020, 09:42 AM
• Last activity: Dec 16, 2020, 10:57 PM
6
votes
5
answers
4615
views
Good ways for annotating and searching in document (pdf, djvu)
For djvu files, I enjoy reading it in djview, because when I search for some words, it can show where all the results are at a glance, and highlight them simultaneously. This is much more convenient than the searching functionality in evince for pdf files. For pdf files, I enjoy using Xournal to ann...
For djvu files, I enjoy reading it in djview, because when I search for some words, it can show where all the results are at a glance, and highlight them simultaneously. This is much more convenient than the searching functionality in evince for pdf files.
For pdf files, I enjoy using Xournal to annotate it, for example, underscore some lines, add text comments.
But for a single file (pdf or djvu), I have to create two files (one in pdf, and the other in djvu) and open them in djview and xournal (and maybe also in evince), in order to achieve the two benefits I outlined above.
I haven't tried many other functionalities of djview, xournal and evince, neither have I tried many applications yet.
Do you have some convenience ways to achieve what I hope to do and possibly more which I haven't mentioned yet?
My OS is Ubuntu 12.04.
Tim
(106430 rep)
Mar 12, 2014, 03:23 PM
• Last activity: Dec 2, 2020, 02:50 PM
3
votes
1
answers
678
views
How can I make annotation for djvu file when using Okular?
My OS is `[03/13/2020,14:35:06@~]$ uname -a Linux debian 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux ` That is, Debian 10 (Buster). When in Debian 9, I can make annotation in djvu file with Okular. But now I cannot. Below is a typical process of trial. Start Okular. Open wit...
My OS is `[03/13/2020,14:35:06@~]$ uname -a
Linux debian 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
`
That is, Debian 10 (Buster). When in Debian 9, I can make annotation in djvu file with Okular. But now I cannot. Below is a typical process of trial.
Start Okular. Open with Okular, a djvu file named 1.djvu. Now use Hilighter
to make some annotation. Then close the file 1.djvu. Thus, a popup reads: Do you want to save your changes to "1.djvu" or discard them? (see screenshot
)
I choose "Save". Then another popup says:Warning-Okular You are about to save changes, but the current file format does not support saving the following elements. Please use the Okular document archive format to preserve them. So, I click "Use annotations", and then choose "Save as Okular document archive...". (See screenshot
)
Now also a popup reads "After saving, the current document format requires the file to be reloaded. Your undo/redo history will be lost. Do you want to continue?" I choose "Yes". (See screenshot
) And the file 1.djvu is saved as 2.djvu. Now I open 2.djvu by Okular. But a popup reads "Could not open file:///.../2.djvu". See this
. How can I solve this problem?





azc
(131 rep)
Mar 13, 2020, 06:58 AM
• Last activity: Nov 23, 2020, 10:17 AM
1
votes
1
answers
83
views
djvulibre-3.5.27 Build Error
I am currently on **Linux Mint 18.3**. I downloaded the package from the Djvulibre [website](http://djvu.sourceforge.net/). Here is my error: make[2]: Entering directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles' PNG 16x16/mimetypes/djvu.png convert: delegate failed `"rsvg-convert" -o "%o...
I am currently on **Linux Mint 18.3**. I downloaded the package from the Djvulibre [website](http://djvu.sourceforge.net/) .
Here is my error:
make: Entering directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles'
PNG 16x16/mimetypes/djvu.png
convert: delegate failed `"rsvg-convert" -o "%o" "%i"' @ error/delegate.c/InvokeDelegate/1310.
convert: unable to open image `/tmp/magick-85016muix9LZjKWV': No such file or directory @ error/blob.c/OpenBlob/2712.
convert: unable to open file `/tmp/magick-85016muix9LZjKWV': No such file or directory @ error/constitute.c/ReadImage/540.
convert: no images defined `16x16/mimetypes/djvu.png' @ error/convert.c/ConvertImageCommand/3210.
Makefile:604: recipe for target '16x16/mimetypes/djvu.png' failed
make: *** [16x16/mimetypes/djvu.png] Error 1
make: Leaving directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles'
Makefile:418: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1
make: Leaving directory '/home/amucs/Downloads/djvulibre-3.5.27'
Makefile:349: recipe for target 'all' failed
make: *** [all] Error 2
CanopusX
(11 rep)
Apr 19, 2018, 10:44 AM
• Last activity: Jan 6, 2019, 01:29 PM
0
votes
1
answers
659
views
lsof doesn't show up my djvu files
I am trying to print the `djvu` file names that are currently running on either `okular` or `atril`, but when I do `lsof | grep ".djvu$"` then I am getting no output in the terminal, where the same command works for `pdf` files.
I am trying to print the
djvu
file names that are currently running on either okular
or atril
, but when I do lsof | grep ".djvu$"
then I am getting no output in the terminal, where the same command works for pdf
files.
Galilean
(574 rep)
Nov 13, 2018, 05:10 PM
• Last activity: Nov 13, 2018, 08:30 PM
1
votes
0
answers
261
views
Create multiple index files for a single indirect DjVu file
Indirect DjVu file split documents into individual pages that can be linked to and loaded individually, yet be can still be navigated like a single document. They fill a gap between PDFs and web pages. I want to create multiple index files for a single indirect DjVu document, in a very similar matte...
Indirect DjVu file split documents into individual pages that can be linked to and loaded individually, yet be can still be navigated like a single document. They fill a gap between PDFs and web pages.
I want to create multiple index files for a single indirect DjVu document, in a very similar matter to this (from 2003).
http://www.djvu-soft.narod.ru/planetdjvu/multiple_index_files_for_a_single_indirect_djvu.htm
I want to use only ordinary free (libre) DjVu tools (djvulibre-plugin, pdf2djvu, etc.)
My use case is to convert a library of PDFs to a single indirect DjVu file, then create multiple indexes that show relevant pages suitable for different audiences.
I can get some of the way there by linking to specific pages from a web page, which works very well if djvulibre-plugin is installed. However the user then has to scroll (or jump) to specific pages in the same large set.
I could also build different indirect DjVu documents to match the needs to different audiences, but that would lead to lots of redundancy.
Can multiple indexes be created for a single indirect DjVu document using current free tools?
johntait.org
(1372 rep)
Dec 1, 2015, 12:28 PM
• Last activity: Jun 12, 2018, 10:49 PM
6
votes
3
answers
7802
views
Extract several pages from a djvu file
I have a djvu file of multiple pages. I wonder how to extract a new djvu file that consists of only a subset of multiple pages? For example, a djvu file has 10 pages, and I would like to extract a new djvu file consisting of pages 3-6 of the original djvu file. Can it be done with some commands of d...
I have a djvu file of multiple pages. I wonder how to extract a new djvu file that consists of only a subset of multiple pages?
For example, a djvu file has 10 pages, and I would like to extract a new djvu file consisting of pages 3-6 of the original djvu file. Can it be done with some commands of djvulibre, such as djvused, djvm, ...? I am using Ubuntu Linux.
Consider two different cases: extract without removal of pages from the original djvu file, and extract without removal.
Thanks!
Tim
(106430 rep)
Sep 13, 2011, 03:27 AM
• Last activity: Dec 29, 2016, 08:03 PM
5
votes
1
answers
667
views
How can I determine the page count of djvu documents from the CLI?
Finding the page count of a PDF document from the CLI is as simple as: pdfinfo file.pdf | grep ^Pages: How can the same be performed with a djvu file? Without converting it to a pdf and then deleting the pdf file after checking the number of pages, please.
Finding the page count of a PDF document from the CLI is as simple as:
pdfinfo file.pdf | grep ^Pages:
How can the same be performed with a djvu file? Without converting it to a pdf and then deleting the pdf file after checking the number of pages, please.
Quora Feans
(3907 rep)
Aug 20, 2014, 09:24 PM
• Last activity: Sep 30, 2016, 11:12 AM
5
votes
1
answers
2218
views
How to make a djvu file searchable
If I create a new `djvu` file out of `tiff` files I can use `djvubind` which makes the `djvu` file searchable using for example `tesseract-ocr`. However suppose I have given `djvu` file. How can I make it searchable? For pdf I know `pdfsandwich` is there something similar for djvu?
If I create a new
djvu
file out of tiff
files I can use djvubind
which makes the djvu
file searchable using for example tesseract-ocr
.
However suppose I have given djvu
file. How can I make it searchable?
For pdf I know pdfsandwich
is there something similar for djvu?
student
(18865 rep)
Oct 2, 2014, 06:49 PM
• Last activity: Oct 5, 2014, 05:50 PM
Showing page 1 of 20 total questions