Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

2 answers

279 views

Extracting table of contents from PDFs

text-processing pdf text-formatting djvu

I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using the 'Area Text Selection' feature from my document reader to copy the text. Doing this for DJVUs with...

CONTENTS
1. EXPERIMENTS
1.1. The definition of an experiment ..... 1
1.2. Algebras of events as Boolean algebras .... 6
1.3. Operations with experiments ...... 9
1.4. Canonical representation of polynomials of events . . 12
....

I emphasize that all I did was drag my mouse across the page and click "Copy Text". If I try this with a PDF the structure is completely lost and I have to spend some time cleaning up the text selection, moving the page and section numbers around. I might get something like this:

Table of Contents
I
 Introduction
1
 Introduction
1.1
 Table of Contents
1.2
 Acknowledgments
1
3
3
6
II
....

I am looking for a PDF reader that can similarly copy the text but with the 'structure' preserved. The fact that DJVU readers have this capability tells me this ought to be possible. Note: I am not talking about extracting ToCs from the bookmarks: many of my PDFs don't have any. I'd also like to avoid a CLI tool that has to process the entire file: I just want it to pick the text I select, but with the newlines and overall structure intact.

Luke (13 rep)

Dec 16, 2024, 03:42 PM • Last activity: Dec 16, 2024, 05:00 PM

1 votes

1 answers

137 views

ddjvu: using '-quality' option with values higher than 100

pdf djvu libtiff

I try to figure out how to use `ddjvu` to convert DjVu files to PDF. If I use ``` ddjvu -format=pdf input.djvu output.pdf ``` the output PDF is very huge and consists of lossless images. Also, there appears a warning, which I will talk about later: > TIFFWriteDirectorySec: Warning, Creating TIFF wit...

I try to figure out how to use ddjvu to convert DjVu files to PDF. If I use

ddjvu -format=pdf input.djvu output.pdf

the output PDF is very huge and consists of lossless images. Also, there appears a warning, which I will talk about later: > TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported. To make the output file smaller and to consist of lossy images, man ddjvu suggests to use the -quality option: > -quality=factor: Enables lossy JPEG compression for TIFF and PDF files. This option only affects images that cannot be encoded using the preferred TIFF/G4 compression. Argument factor is a quantization factor ranging from 25 to 150. See command cjpeg(1) for more information on JPEG quantization factors. Value 80 is a good starting point. If I use values up to 100 (e.g., 80, 90, or 100),

ddjvu -format=pdf -quality 90 input.djvu output.pdf

the TIFFWriteDirectorySec warning doesn't appear. To my understanding, this is because using -quality means that PDF will consist of lossy images, which in turn means that it will consist of JPEGs instead of TIFFs. But if the value is higher than 100, e.g. 105 or 150, the warning appears again. Why is that?

jsx97 (1347 rep)

Jun 12, 2024, 04:04 PM • Last activity: Jun 20, 2024, 06:05 PM

1 votes

1 answers

163 views

How to use COMPRESSION_ADOBE_DEFLATE instead of DEFLATE?

pdf djvu libtiff

When converting to PDF using ddjvu: ``` ddjvu -format=pdf input.djvu output.pdf ``` there is a warning: > TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported. How can I use COMPRESSION_ADOBE_DEFLATE instead of delfate...

When converting to PDF using ddjvu:

ddjvu -format=pdf input.djvu output.pdf

there is a warning: > TIFFWriteDirectorySec: Warning, Creating TIFF with legacy Deflate codec identifier, COMPRESSION_ADOBE_DEFLATE is more widely supported. How can I use COMPRESSION_ADOBE_DEFLATE instead of delfate? I tried

ddjvu -format=pdf -quality=COMPRESSION_ADOBE_DEFLATE input.djvu output.pdf

but though -quality=deflate works, -quality=COMPRESSION_ADOBE_DEFLATE and its variations (e.g., AdobeDeflate) returns the error message: > ddjvu: valid arguments for option '-quality' an integer between 25 and 150.

jsx97 (1347 rep)

Jun 8, 2024, 07:04 AM • Last activity: Jun 11, 2024, 08:34 AM

0 votes

1 answers

58 views

ddjvu: Does it really support converting to PDF?

djvu

ddjvu (DjVuLibre), 3.5.28: ``` ddjvu --help ``` returns ``` -format=FMT Select output format: pbm,pgm,ppm,pnm,rle,tiff. ``` Note that it doesn't mention PDF. But then, why does the folowing command works and converts input DjVu file to output PDF one? ``` ddjvu -format=pdf -quality=85 input.djvu out...

ddjvu (DjVuLibre), 3.5.28:

ddjvu --help

returns

-format=FMT       Select output format: pbm,pgm,ppm,pnm,rle,tiff.

Note that it doesn't mention PDF. But then, why does the folowing command works and converts input DjVu file to output PDF one?

ddjvu -format=pdf -quality=85 input.djvu output.pdf

jsx97 (1347 rep)

Jun 7, 2024, 10:14 AM • Last activity: Jun 7, 2024, 08:22 PM

0 votes

1 answers

238 views

Determine the paper size of a djvu document

djvu

How can I determine the paper size of a djvu document? I tried it with both DjView and Evince, but did not find any mention of the paper size. I wouldn't mind using some CLI tool, but I don't have a clue what to use.

                                  How can I determine the paper size of a djvu document? I tried it with both DjView and Evince, but did not find any mention of the paper size. I wouldn't mind using some CLI tool, but I don't have a clue what to use.
                                

red_trumpet (345 rep)

Mar 18, 2020, 10:51 AM • Last activity: Mar 29, 2024, 08:57 AM

8 votes

1 answers

454 views

How can OCRed text be preserved while converting between djvu and pdf files?

pdf djvu

Suppose a pdf file has OCRed text. How can we convert it to a djvu file and pass the OCRed text to the djvu file? Conversely, if exchange "pdf" and "djvu" in the above? Thanks!

                                  Suppose a pdf file has OCRed text. How can we convert it to a djvu file and pass the OCRed text to the djvu file?

Conversely, if exchange "pdf" and "djvu" in the above?

Thanks!

Tim (106430 rep)

May 24, 2014, 05:10 AM • Last activity: Jan 14, 2024, 06:01 PM

4 votes

3 answers

916 views

Override page numbers of a djvu document

djvu numbering documents

I have a djvu scan of a book. Let's consider two cases: 1. I'd like to number the pages `0, 1, 2, ...` (usage case: the cover should get be page 0) 2. I'd like to number some pages with Roman numbers and some with Arabic numbers, for example: `i, ii, iii, ..., x, 1, 2, 3, ...` (usage case: some intr...

                                  I have a djvu scan of a book. Let's consider two cases:

1. I'd like to number the pages 0, 1, 2, ... (usage case: the cover should get be page 0)

2. I'd like to number some pages with Roman numbers and some with Arabic numbers, for example: i, ii, iii, ..., x, 1, 2, 3, ... (usage  case: some introductory pages are numbered Roman in the book)

Is it possible to do it on Linux?

marmistrz (2792 rep)

Jun 17, 2015, 08:14 AM • Last activity: Nov 27, 2023, 10:05 AM

8 votes

2 answers

9562 views

convert djvu to pdf

pdf conversion djvu

How to convert djvu2pdf ? My current approach is : djvups x.djvu > x.ps ps2pdf x.ps Is there more efficient and better (in terms of output quality, data/metadata loss) way to handle that ?

                                  How to convert djvu2pdf ?

My current approach is :

    djvups x.djvu > x.ps
    ps2pdf x.ps

Is there more efficient and better (in terms of output quality, data/metadata loss) way to handle that ?

Grzegorz Wierzowiecki (14740 rep)

Nov 2, 2011, 05:04 PM • Last activity: Jul 20, 2023, 10:33 AM

0 votes

0 answers

1206 views

Fixing slow-rendering PDFs on Linux

pdf pdftk djvu

I've got some large document scans with embedded OCR text on [Internet Archive][1] I'd like to read. Unfortunately the PDF pages render very slowly on my document readers (Okular, Evince, Zathura). I previously used the DJVU files for this reason, but since they [stopped creating them][2] I am out o...

                                  I've got some large document scans with embedded OCR text on Internet Archive  I'd like to read. Unfortunately the PDF pages render very slowly on my document readers (Okular, Evince, Zathura). I previously used the DJVU files for this reason, but since they stopped creating them  I am out of options. I have tried to convert to DJVU myself with pdf2djvu, djvudigital, some online tools and even first going to JPEG and each time gotten very large files, as the programs seem to have trouble separating the foreground and background. So several questions:

1. How did the Internet Archive team previously produce their DJVUs? Can their process be replicated or approximated?
2. The second link suggests slow PDF rendering has been an issue for a while (at least over Linux). Are there any workarounds, like faster backends? I tried linearizing the files but that didn't improve things.

For testing the issue  consider this  volume of Poincaré's collected works

Kariuki (1 rep)

Nov 29, 2022, 01:43 PM

4 votes

6 answers

6940 views

How to split each page of a djvu file?

djvu documents

In a djvu file, it has two book pages in one djvu page. I would like to split it so that one book page per djvu page. For example,![enter image description here][1] I was wondering if this can be done by some software, preferably command line utilities? Thanks and regards! PS: This is [a file][2] th...

                                  In a djvu file, it has two book pages in one djvu page. I would like to split it so that one book page per djvu page. For example,

I was wondering if this can be done by some software, preferably command line utilities? Thanks and regards!

PS: This is a file  that can be used for test.

Tim (106430 rep)

Dec 6, 2011, 05:22 PM • Last activity: Apr 27, 2022, 01:42 AM

3 votes

1 answers

411 views

Cli grep through djvu files

djvu

How can I grep through djvu files? They are text files with some images in it. Is there some equivalent to `pdfgrep` tool?

                                  How can I grep through djvu files? They are text files with some images in it. Is there some equivalent to pdfgrep tool?
                                

Pierre B (2293 rep)

Jul 15, 2017, 12:30 PM • Last activity: Jun 17, 2021, 09:29 PM

1 votes

1 answers

782 views

How to install recoll dependencies "djvutxt" and "python3:pylzma"?

software-installation python djvu recoll lzma

I installed recoll on my Kubuntu 20.04. Now it says that external apps and commands are missing that are required for the indexing are missing, specifically: djvutxt (image/vnd.djvu) python3:pylzma (application/x-7z-compressed) but I have no idea how to install them. No such packages are shown in mu...

                                  I installed recoll on my Kubuntu 20.04. Now it says that external apps and commands are missing that are required for the indexing are missing, specifically:

    djvutxt (image/vnd.djvu)
    python3:pylzma (application/x-7z-compressed)

but I have no idea how to install them. No such packages are shown in muon (my package manager GUI). How can I install them?

Make42 (739 rep)

Dec 16, 2020, 09:42 AM • Last activity: Dec 16, 2020, 10:57 PM

6 votes

5 answers

4615 views

Good ways for annotating and searching in document (pdf, djvu)

pdf djvu

For djvu files, I enjoy reading it in djview, because when I search for some words, it can show where all the results are at a glance, and highlight them simultaneously. This is much more convenient than the searching functionality in evince for pdf files. For pdf files, I enjoy using Xournal to ann...

                                  For djvu files, I enjoy reading it in djview, because when I search for some words, it can show where all the results are at a glance, and highlight them simultaneously. This is much more convenient than the searching functionality in evince for pdf files.

For pdf files, I enjoy using Xournal to annotate it, for example, underscore some lines, add text comments.

But for a single file (pdf or djvu), I have to create two files (one in pdf, and the other in djvu) and open them in djview and xournal (and maybe also in evince), in order to achieve the two benefits I outlined above.

I haven't tried many other functionalities of djview, xournal and evince, neither have I tried many applications yet. 
Do you have some convenience ways to achieve what I hope to do and possibly more which I haven't mentioned yet?

My OS is Ubuntu 12.04.

Tim (106430 rep)

Mar 12, 2014, 03:23 PM • Last activity: Dec 2, 2020, 02:50 PM

3 votes

1 answers

678 views

How can I make annotation for djvu file when using Okular?

okular djvu

My OS is `[03/13/2020,14:35:06@~]$ uname -a Linux debian 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux ` That is, Debian 10 (Buster). When in Debian 9, I can make annotation in djvu file with Okular. But now I cannot. Below is a typical process of trial. Start Okular. Open wit...

                                  My OS is `[03/13/2020,14:35:06@~]$ uname -a
Linux debian 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
`
That is, Debian 10 (Buster). When in Debian 9, I can make annotation in djvu file with Okular. But now I cannot. Below is a typical process of trial.

Start Okular.  Open with Okular, a djvu file named 1.djvu. Now use Hilighter to make some annotation. Then close the file 1.djvu. Thus, a popup reads: Do you want to save your changes to "1.djvu" or discard them? (see screenshot)
I choose "Save". Then another popup says:Warning-Okular  You are about to save changes, but the current file format does not support saving the following elements. Please use the Okular document archive format to preserve them. So, I click "Use annotations", and then choose "Save as Okular document archive...". (See screenshot)
Now also a popup reads "After saving, the current document format requires the file to be reloaded. Your undo/redo history will be lost. Do you want to continue?"  I choose "Yes". (See screenshot)  And the file 1.djvu is saved as 2.djvu. Now I open 2.djvu by Okular. But a popup reads "Could not open file:///.../2.djvu". See this. How can I solve this problem?

azc (131 rep)

Mar 13, 2020, 06:58 AM • Last activity: Nov 23, 2020, 10:17 AM

1 votes

1 answers

83 views

djvulibre-3.5.27 Build Error

compiling gcc error-handling djvu

I am currently on **Linux Mint 18.3**. I downloaded the package from the Djvulibre [website](http://djvu.sourceforge.net/). Here is my error: make[2]: Entering directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles' PNG 16x16/mimetypes/djvu.png convert: delegate failed `"rsvg-convert" -o "%o...

                                  I am currently on **Linux Mint 18.3**. I downloaded the package from the Djvulibre [website](http://djvu.sourceforge.net/) . 

Here is my error:

    make: Entering directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles'
    PNG      16x16/mimetypes/djvu.png
    convert: delegate failed `"rsvg-convert" -o "%o" "%i"' @ error/delegate.c/InvokeDelegate/1310.
    convert: unable to open image `/tmp/magick-85016muix9LZjKWV': No such file or directory @ error/blob.c/OpenBlob/2712.
    convert: unable to open file `/tmp/magick-85016muix9LZjKWV': No such file or directory @ error/constitute.c/ReadImage/540.
    convert: no images defined `16x16/mimetypes/djvu.png' @ error/convert.c/ConvertImageCommand/3210.
    Makefile:604: recipe for target '16x16/mimetypes/djvu.png' failed
    make: *** [16x16/mimetypes/djvu.png] Error 1
    make: Leaving directory '/home/amucs/Downloads/djvulibre-3.5.27/desktopfiles'
    Makefile:418: recipe for target 'all-recursive' failed
    make: *** [all-recursive] Error 1
    make: Leaving directory '/home/amucs/Downloads/djvulibre-3.5.27'
    Makefile:349: recipe for target 'all' failed
    make: *** [all] Error 2

                                

CanopusX (11 rep)

Apr 19, 2018, 10:44 AM • Last activity: Jan 6, 2019, 01:29 PM

0 votes

1 answers

659 views

lsof doesn't show up my djvu files

grep lsof djvu

I am trying to print the `djvu` file names that are currently running on either `okular` or `atril`, but when I do `lsof | grep ".djvu$"` then I am getting no output in the terminal, where the same command works for `pdf` files.

                                  I am trying to print the djvu file names that are currently running on either okular or atril, but when I do lsof | grep ".djvu$" then I am getting no output in the terminal, where the same command works for pdf files.
                                

Galilean (574 rep)

Nov 13, 2018, 05:10 PM • Last activity: Nov 13, 2018, 08:30 PM

1 votes

0 answers

261 views

Create multiple index files for a single indirect DjVu file

djvu

Indirect DjVu file split documents into individual pages that can be linked to and loaded individually, yet be can still be navigated like a single document. They fill a gap between PDFs and web pages. I want to create multiple index files for a single indirect DjVu document, in a very similar matte...

                                  Indirect DjVu file split documents into individual pages that can be linked to and loaded individually, yet be can still be navigated like a single document. They fill a gap between PDFs and web pages.

I want to create multiple index files for a single indirect DjVu document, in a very similar matter to this (from 2003).

http://www.djvu-soft.narod.ru/planetdjvu/multiple_index_files_for_a_single_indirect_djvu.htm 

I want to use only ordinary free (libre) DjVu tools (djvulibre-plugin, pdf2djvu, etc.)

My use case is to convert a library of PDFs to a single indirect DjVu file, then create multiple indexes that show relevant pages suitable for different audiences.

I can get some of the way there by linking to specific pages from a web page, which works very well if djvulibre-plugin is installed. However the user then has to scroll (or jump) to specific pages in the same large set.

I could also build different indirect DjVu documents to match the needs to different audiences, but that would lead to lots of redundancy.

Can multiple indexes be created for a single indirect DjVu document using current free tools?

johntait.org (1372 rep)

Dec 1, 2015, 12:28 PM • Last activity: Jun 12, 2018, 10:49 PM

6 votes

3 answers

7802 views

Extract several pages from a djvu file

text-processing djvu

I have a djvu file of multiple pages. I wonder how to extract a new djvu file that consists of only a subset of multiple pages? For example, a djvu file has 10 pages, and I would like to extract a new djvu file consisting of pages 3-6 of the original djvu file. Can it be done with some commands of d...

                                  I have a djvu file of multiple pages. I wonder how to extract a new djvu file that consists of only a subset of multiple pages? 

For example, a djvu file has 10 pages, and I would like to extract a new djvu file consisting of pages 3-6 of the original djvu file. Can it be done with some commands of djvulibre, such as djvused, djvm, ...? I am using Ubuntu Linux.

Consider two different cases: extract without removal of pages from the original djvu file, and extract without removal.

Thanks!

Tim (106430 rep)

Sep 13, 2011, 03:27 AM • Last activity: Dec 29, 2016, 08:03 PM

5 votes

1 answers

667 views

How can I determine the page count of djvu documents from the CLI?

djvu

Finding the page count of a PDF document from the CLI is as simple as: pdfinfo file.pdf | grep ^Pages: How can the same be performed with a djvu file? Without converting it to a pdf and then deleting the pdf file after checking the number of pages, please.

                                  Finding the page count of a PDF document from the CLI is as simple as:

    pdfinfo file.pdf | grep ^Pages:

How can the same be performed with a djvu file? Without converting it to a pdf and then deleting the pdf file after checking the number of pages, please.

Quora Feans (3907 rep)

Aug 20, 2014, 09:24 PM • Last activity: Sep 30, 2016, 11:12 AM

5 votes

1 answers

2218 views

How to make a djvu file searchable

djvu

If I create a new `djvu` file out of `tiff` files I can use `djvubind` which makes the `djvu` file searchable using for example `tesseract-ocr`. However suppose I have given `djvu` file. How can I make it searchable? For pdf I know `pdfsandwich` is there something similar for djvu?

                                  If I create a new djvu file out of tiff files I can use djvubind which makes the djvu file searchable using for example tesseract-ocr. 

However suppose I have given djvu file. How can I make it searchable?

For pdf I know pdfsandwich is there something similar for djvu?

student (18865 rep)

Oct 2, 2014, 06:49 PM • Last activity: Oct 5, 2014, 05:50 PM

Showing page 1 of 20 total questions