Turning large TIFF file into multi-page output

Discussions on NConvert - the command line tool for image conversion and manipulation

Moderators: XnTriq, helmut, xnview

Post Reply
WELL-LOG
Posts: 2
Joined: Fri May 10, 2019 3:47 pm

Turning large TIFF file into multi-page output

Post by WELL-LOG »

I'm trying to convert long (Height 140,000+ pixel) images of Well Logs in TIFF format into multi-page output (TIFF or PDF).
Is this possible in NConvert?
Example input file properties below
Thanks

Format : TIFF
Name : tiff
Compression : CCITT Group 4
Width : 6944
Height : 144786
Components per pixel : 1
Bits per component : 1
Depth : 1
# colors : 2
Color model : RGB
Bytes Per Plane : 868
Orientation : Top Left
Xdpi : 400
Ydpi : 400
Page(s) : 1
Info:
Photometric Interpretation: White=0
PhotometricInterpretation: 0
PlanarConfiguration: 1
SamplesPerPixel: 1
Software: Handmade Software, Inc. Image Alchemy v1.7.7
Metadata : ( EXIF )
cday
XnThusiast
Posts: 3985
Joined: Sun Apr 29, 2012 9:45 am
Location: Cheltenham, U.K.

Re: Turning large TIFF file into multi-page output

Post by cday »

WELL-LOG wrote: Fri May 10, 2019 4:18 pm I'm trying to convert long (Height 140,000+ pixel) images of Well Logs in TIFF format into multi-page output (TIFF or PDF).
Is this possible in NConvert?

It should be possible using NConvert, in a script such as a batch file .bat that makes multiple crops using a variable that is incremented for each crop, and then creates the required multi-page file. If desired, a loop could be placed around the above code to enable multiple input TIFFs to be processed into PDFs directly.

Note that the large pixel dimensions of the input file may require the use of NConvert 64-bit, in order to address sufficient memory when the file is opened, and that the current version 7.25 - and possibly several previous versions - of NConvert has a bug that affects the creation of multi-page files, so at present an earlier version would have to be used, such as 6.82.

I may be able to provide some support if you wish to use the above approach.

Alternatively, the XnView MP GUI program has a split function that could probably be used to create the individual pages, and also a function to create multi-page files, both in the browser. However, that would be a two-step solution and there are no equivalent batch conversion options.
WELL-LOG
Posts: 2
Joined: Fri May 10, 2019 3:47 pm

Re: Turning large TIFF file into multi-page output

Post by WELL-LOG »

Hi,

Checked for likely files overnight ... the objective is to get any digital data out of these image-only files. Our OCR software has a limitation of ~32,000 pixels, so I looked for 1 page TIFFs >30,000 pixels height.
I have a height range of over 100,000 documents in a range 30,000 to 399,996 pixels, with a width range of 2,000 - 4,500 pixels

With that quantity of documents I would like to automate this.

In ancient history I have seen this done using Handmade SW's Image Alchemy and with the add on Alchemy Tools.

The workflow was something like
Image Alchemy -> Convert TIFF to RAW
Alchemy Tools, POSTER.EXE to split into pages based on page width, keeping aspect ratio. This created a number of pages
Alchemy Tools, GLUE.EXE to merge pages
Image Alchemy -> Convert RAW into multi page PDF that was sent off to OCR

Unfortunately I have lost ALCHEMY Tools so any ideas on an alternative would be appreciated.
Thanks
cday
XnThusiast
Posts: 3985
Joined: Sun Apr 29, 2012 9:45 am
Location: Cheltenham, U.K.

Re: Turning large TIFF file into multi-page output

Post by cday »

As an initial response:

I tested a simple proof-of-concept batch file that should be adaptable to your original specification, subject to defining the pixel width of the output PDF pages. I was assuming the pages would be viewed in a viewer, so that the screen aspect ratio and whether you were prepared to scroll across the page would be considerations. You now refer to running the output through OCR software, which should not be a problem but affects the desired page width, as mentioned in your post.

Your updated information, with the wide range of possible parameters, would complicate processing as it probably wouldn't be possible to use fixed pixel values in the batch file, meaning that it would be more complicated and take more time to refine, assuming I have the skills required.

Support on the forum, other than the very limited support that can be provided by Pierre the developer, is entirely voluntary, and in practice very few members do provide any support. I generally deal with NConvert queries as best I can as no-one else responds, but although a long-time Windows user, I never expected to use the command line until I discovered NConvert, and am certainly no expert.

Reading your revised specification quickly, I wasn't immediately entirely clear about some details, but if the PDF output is to be read automatically that might to some extent ease constraints on the form of the output. It's a sunny day here, I should have more time later if you wish to continue exploring the possibilities. :D

Further thoughts:

Processing a wide range of input parameters could be eased to an extent by the following factors: cropping using percentages rather than exact number of pixels; with regard to OCR, the presumption that text size should relate to the pixel height of the output pages; the fact that for OCR the number of pages and content per page shouldn't be much of a consideration; and the fact that, if necessary, a small number of alternative batch files with different parameters could probably be used to suit the varying forms of the input images.

Update:

Just tested on a hunch an idea that might ease creation of a general purpose batch file, seems to work.
Post Reply