Dear Mister Noname,
On Tue, 3 Dec 2019, it was written
That's _LOSSY_ JBIG2.
YOU DON"T HAVE TO USE LOSSY MODE!
Don't shout!!
And for the topic: you don't have to use JBIG2. Space isn't really an
issue today for scanned bilevel documents, so you can just stick with TIFF
G4 or PNG.
Christia
On 03/12/2019 20:22, Fred Cisin via cctalk wrote:
Watch out. PDF with OCR can show you a clear and crisp [possibly
wrong] interpretation of the scan, not what the actual scan looked like.
The OCR may well say "0" where the printing says "8" but what your eyes
will see will be the represe
On Tue, 3 Dec 2019, Paul Koning via cctalk wrote:
The trouble (for both of these) is that many of the
users don't know the limitations and blindly use the wrong tools.
"To the man who has a hammer, the whole world looks like a thumb."
(which is an idictment about misuse, not an indictment of h
> JBIG2 .. introduces so many actual factual errors (typically
> substituted letters and numbers)
On Tue, 3 Dec 2019, Noel Chiappa via cctalk wrote:
It's probably worth noting that there are often errors _in the original
documents_, too - so even a perfect image doesn't guarantee no errors
On Tue, Dec 3, 2019 at 10:59 AM Paul Berger via cctalk <
cctalk@classiccmp.org> wrote:
> Is there any way to know what compression was used in a pdf file?
>
There's not necessarily only one. Every object in a PDF file can have its
own selection of compression algorithm.
I don't know of any user-
> On Dec 3, 2019, at 12:59 PM, Paul Berger via cctalk
> wrote:
>
> ...
> Would TIFF G4 still be preferable to JPEG2000? It would seem I can control
> the compression used by selecting the pdf compatibility level.
JPEG2000 apparently has a lossless mode (says Wikipedia). If so, it would be
On 2019-12-02 4:57 p.m., Eric Smith via cctalk wrote:
On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk
wrote:
When I corresponded with Al Kossow about format several years ago, he
indicated that CCITT Group 4 lossless compression was their standard.
There are newer bilevel encodings t
On 12/3/19 10:30 AM, Eric Smith via cctalk wrote:
PDF was never _intended_ for documents that should undergo any further
processing.
Okay.
Fair rebuttal.
The few things that have been hacked onto it for interactive use are
actually the worst thing about PDF.
My opinion
Okay.
I don't hav
> On Dec 2, 2019, at 11:12 PM, Grant Taylor via cctalk
> wrote:
>
> On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote:
>> In my opinion, PDFs are the last place that computer usable data goes.
>> Because getting anything out of a PDF as a data source is next to impossible.
>> Sure, you, a hu
On Mon, Dec 2, 2019 at 9:06 PM Grant Taylor via cctalk <
cctalk@classiccmp.org> wrote:
> My problem with PDFs starts where most people stop using them.
>
> Take the average PDF of text, try to copy and paste the text into a text
> file. (That may work.)
>
Sure. Now try thing same thing with a TI
On Tue, Dec 3, 2019 at 1:50 AM Christian Corti via cctalk <
cctalk@classiccmp.org> wrote:
> *NEVER* use JBIG2! I hope you know about the Xerox JBIG2 bug (e.g. making
>
That's _LOSSY_ JBIG2.
YOU DON"T HAVE TO USE LOSSY MODE!
On Mon, Dec 2, 2019 at 7:08 PM Grant Taylor via cctalk <
cctalk@classiccmp.org> wrote:
> I *HATE* doing anything with PDFs other than reading them.
PDF was never _intended_ for documents that should undergo any further
processing. The few things that have been hacked onto it for interactive
use
On Mon, Dec 2, 2019 at 5:34 PM Guy Dunphy via cctalk
wrote:
> Mentioning JBIG2 (or any of its predecessors) without noting that it is
> completely unacceptable as a scanned document compression scheme,
> demonstrates
> a lack of awareness of the defects it introduces in encoded documents.
>
Perh
> From: Guy Dunphy
> JBIG2 .. introduces so many actual factual errors (typically
> substituted letters and numbers)
It's probably worth noting that there are often errors _in the original
documents_, too - so even a perfect image doesn't guarantee no errors.
The most recent one (of
At 01:20 AM 3/12/2019 -0200, you wrote:
>I cannot understand your problems with PDF files.
>I've created lots and lots of PDFs, with treated and untreated scanned
>material. All of them are very readable and in use for years. Of course,
>garbage in, garbage out. I take the utmost care in my scans t
actually we scan to pdf with back ocr also text also tiff also jpegwith
the slooowww hp 11x17 scan fax print thing i can scan entite document then
save 1 save2 save3 save 4 without rescanning each time ed at smecc
In a message dated 12/3/2019 2:16:01 AM US Mountain Standard Time,
ccta
very nice file
yep, we prefer pdf with ocr back stuff ed smecc,orgIn a message dated
12/2/2019 8:20:36 PM US Mountain Standard Time, cctalk@classiccmp.org writes:
I cannot understand your problems with PDF files.
I've created lots and lots of PDFs, with treated and untreated scanned
mate
Hi!
On Tue, 2019-12-03 11:34:06 +1100, Guy Dunphy via cctalk
wrote:
> At 01:57 PM 2/12/2019 -0700, you wrote:
> >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk
> >wrote:
> >
> > > When I corresponded with Al Kossow about format several years ago, he
> > > indicated that CCITT Group 4 loss
On Mon, 2 Dec 2019, Eric Smith wrote:
There are newer bilevel encodings that are somewhat more efficient than G4
(ITU-T T.6), such as JBIG (T.82) and JBIG2 (T.88), but they are not as
widely supported, and AFAIK JBIG2 is still patent encumbered. As a result,
*NEVER* use JBIG2! I hope you know a
On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote:
In my opinion, PDFs are the last place that computer usable data goes.
Because getting anything out of a PDF as a data source is next to
impossible.
Sure, you, a human, can read it and consume the data.
Try importing a simple table from a PDF
On 12/2/19 8:20 PM, Alexandre Souza via cctalk wrote:
I cannot understand your problems with PDF files.
My problem with PDFs starts where most people stop using them.
Take the average PDF of text, try to copy and paste the text into a text
file. (That may work.)
Now try to edit a piece of
I cannot understand your problems with PDF files.
I've created lots and lots of PDFs, with treated and untreated scanned
material. All of them are very readable and in use for years. Of course,
garbage in, garbage out. I take the utmost care in my scans to have good
enough source files, so I can cr
On 12/2/19 5:34 PM, Guy Dunphy via cctalk wrote:
Interesting comments Guy.
I'm completely naive when it comes to scanning things for preservation.
Your comments do pass my naive understanding.
But PDF literally cannot be used as a wrapper for the results,
since it doesn't incorporate the re
At 01:57 PM 2/12/2019 -0700, you wrote:
>On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk
>wrote:
>
>> When I corresponded with Al Kossow about format several years ago, he
>> indicated that CCITT Group 4 lossless compression was their standard.
>>
>
>There are newer bilevel encodings that ar
On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk
wrote:
> When I corresponded with Al Kossow about format several years ago, he
> indicated that CCITT Group 4 lossless compression was their standard.
>
There are newer bilevel encodings that are somewhat more efficient than G4
(ITU-T T.6), s
>My recommendation: use a proper multi-function copier (the big copiers)
>that can also scan to network. I currently use our big Konica-Minolta
I've got a Lexmark X646E full duplex printing/scanner. I'm still learning
how to use it at its max, but I believe I'll scan TONS of documents I have
store
On Wed, Nov 27, 2019 at 2:01 PM Paul Koning wrote:
> Another problem with bilevel scans is that, on some machines at least, they
> can be very noisy. That's what I saw on the copier/scanner at the office.
> For good scans I use gray scale scanning, with post-processing if desired to
> conver
On Wed, 27 Nov 2019, Paul Koning wrote:
On Nov 27, 2019, at 2:56 PM, Jason T via cctalk wrote:
On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk
wrote:
My recommendation: use a proper multi-function copier (the big copiers)
that can also scan to network. I currently use our big Koni
> On Nov 27, 2019, at 2:56 PM, Jason T via cctalk wrote:
>
> On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk
> wrote:
>> My recommendation: use a proper multi-function copier (the big copiers)
>> that can also scan to network. I currently use our big Konica-Minolta
>> bizhub 754.
On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk
wrote:
> My recommendation: use a proper multi-function copier (the big copiers)
> that can also scan to network. I currently use our big Konica-Minolta
> bizhub 754. Although it'a b/w copier, it can also scan in color. This
These are gr
On Wed, 27 Nov 2019, mloe...@cpumagic.scol.pa.us wrote:
On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote:
That's what I use too; it has tons of useful features, including being able
to drive my single-sided page-feed scanner and being able to number the
even-sided pages correctly. The one I us
On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote:
> From: Jay Jaeger
> CCITT Group 4 lossless compression
That's very good indeed. I scan text pages in B+W at slightly less resolution
(engineering prints I do higher, they need it), but compressed they turn out
to be ~50KB per page, or
> From: Jay Jaeger
> CCITT Group 4 lossless compression
That's very good indeed. I scan text pages in B+W at slightly less resolution
(engineering prints I do higher, they need it), but compressed they turn out
to be ~50KB per page, or less - for long documents (e.g. the DOS-11 System
Pro
> As far as multi-page documents, it seems as if my scanner (or its
> software) only does uncompressed TIFF. At bitsaver's recommended 400
> dpi, that means about 4M per page.
If you're on unix of some sort, the libtiff tools can convert these
uncompressed images to G4. The command you'd use woul
On 11/26/19 7:10 PM, Alexandre Souza wrote:
> Al, is there a "standard" you would recommend us mere mortals to scan and
> archive docs?
I've moved to 600dpi bi-tonal tiffs for all new text work since that is the
maximum
resolution my Panasonic KV-S3065 scanner supports. I use a flatbed at 300
On 11/26/2019 8:52 PM, Alan Perry via cctalk wrote:
>
> I am going through stuff in my office and found that I have some SCSI
> device docs that aren't on bitsavers. As far as multi-page documents, it
> seems as if my scanner (or its software) only does uncompressed TIFF. At
> bitsaver's recommend
On 11/26/19 7:05 PM, Chuck Guzis via cctalk wrote:
On 11/26/19 6:52 PM, Alan Perry via cctalk wrote:
I am going through stuff in my office and found that I have some SCSI
device docs that aren't on bitsavers. As far as multi-page documents, it
seems as if my scanner (or its software) only do
Al, is there a "standard" you would recommend us mere mortals to scan and
archive docs?
---8<---Corte aqui---8<---
http://www.tabajara-labs.blogspot.com
http://www.tabalabs.com.br
---8<---Corte aqui---8<---
Em qua., 27 de nov. de 2019 às 01:07, Al Kossow via cctalk <
cctalk@classiccmp.org> escre
you can ftp the uncompressed files to me and I'll take care of the conversions
On 11/26/19 6:52 PM, Alan Perry via cctalk wrote:
>
> I am going through stuff in my office and found that I have some SCSI device
> docs that aren't on bitsavers. As far as
> multi-page documents, it seems as if my s
On 11/26/19 6:52 PM, Alan Perry via cctalk wrote:
>
> I am going through stuff in my office and found that I have some SCSI
> device docs that aren't on bitsavers. As far as multi-page documents, it
> seems as if my scanner (or its software) only does uncompressed TIFF. At
> bitsaver's recommended
I am going through stuff in my office and found that I have some SCSI
device docs that aren't on bitsavers. As far as multi-page documents, it
seems as if my scanner (or its software) only does uncompressed TIFF. At
bitsaver's recommended 400 dpi, that means about 4M per page.
What should I
41 matches
Mail list logo