pdfttotext in pure lua?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

pdfttotext in pure lua?

Dietmar Segbert
Hello,

is there a module in pure lua, that converts a pdf-file to a text-file?

Thanks and Regards

Dietmar


Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Marc Balmer
Hello, 

is there a module in pure lua, that converts a pdf-file to a text-file? 

You can not generally convert a PDF file to a text file.

(oh, and, a PDF file is usually a text file)ยจ



Thanks and Regards 

Dietmar 


Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Vadim A. Misbakh-Soloviov
In reply to this post by Dietmar Segbert
 > is there a module in pure lua, that converts a pdf-file to a text-file?

1) Do you know that converting PDF to text (regardless of lua) is only
possible if it especially contains that source text (which is not always the
case)?
2) as far as I know, there is no such module not only for lua, but for all of
script languages I know.
3) unlike the reverse operation. There is some for text->pdf
4) anyway, the purpose of PDF is *not* to provide capability to get text back.
It is to make document looks similary everywhere (like it was printed on the
paper).

Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Dirk Laurie-2
In reply to this post by Dietmar Segbert
2016-10-23 20:26 GMT+02:00 Dietmar Segbert <[hidden email]>:

> is there a module in pure lua, that converts a pdf-file to a text-file?

I once spent a great deal of time, without gettinga s far as I wanted
to, on a pure Lua program that produces Markdown starting from the
XML output given by "pdftohtml -xml".

Among the difficulties are: recognizing page headers and footers;
reassembling words hyphenated at the end of a line; handling
footnotes and citations; recognizing tabular input; etc.

All that makes me doubt very strongly that the desired module exists.

Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Nagaev Boris
On Sun, Oct 23, 2016 at 11:03 PM, Dirk Laurie <[hidden email]> wrote:

> 2016-10-23 20:26 GMT+02:00 Dietmar Segbert <[hidden email]>:
>
>> is there a module in pure lua, that converts a pdf-file to a text-file?
>
> I once spent a great deal of time, without gettinga s far as I wanted
> to, on a pure Lua program that produces Markdown starting from the
> XML output given by "pdftohtml -xml".
>
> Among the difficulties are: recognizing page headers and footers;
> reassembling words hyphenated at the end of a line; handling
> footnotes and citations; recognizing tabular input; etc.
>
> All that makes me doubt very strongly that the desired module exists.
>

Debian has package poppler-utils [1] which provides utility pdftotext.

[1] https://packages.debian.org/sid/poppler-utils


--
Best regards,
Boris Nagaev

Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Nagaev Boris
In reply to this post by Vadim A. Misbakh-Soloviov
On Sun, Oct 23, 2016 at 10:56 PM, Vadim A. Misbakh-Soloviov
<[hidden email]> wrote:
>  > is there a module in pure lua, that converts a pdf-file to a text-file?
>
> 1) Do you know that converting PDF to text (regardless of lua) is only
> possible if it especially contains that source text (which is not always the
> case)?
> 2) as far as I know, there is no such module not only for lua, but for all of
> script languages I know.

JavaScript has one:
https://www.npmjs.com/package/pdftotextjs

> 3) unlike the reverse operation. There is some for text->pdf
> 4) anyway, the purpose of PDF is *not* to provide capability to get text back.
> It is to make document looks similary everywhere (like it was printed on the
> paper).
>



--
Best regards,
Boris Nagaev

Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Francisco Olarte
On Sun, Oct 23, 2016 at 11:58 PM, Nagaev Boris <[hidden email]> wrote:
> JavaScript has one:
> https://www.npmjs.com/package/pdftotextjs

Had you bothered to read your link target you would have noticed it is
just a wrapper for poppler. So any language having external program
calling capacity could be considered tohave one ( in fact lua has one,
as it can call the pdf version ).

I mean, it is not pure javascript, and the OP askes for a PURE LUA one.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: pdfttotext in pure lua?

Dietmar Segbert
Hello,

thanks for your answers.

I use the pdftotext utility under debian.

But i have a little Daisy-player, the Milestone 312 ACE with lua on board  
and a tts-engine from acapella. It can read word-documents and txt-files  
and so i have the idea to convert pdf to text with that device  and put  
the output into the tts.

The Milestone 312 ACE is a device specialy for blind and visual handicapd  
people.

Thanks and regards

Dietmar

P.s.: Sorry for my english.