High performance library for creating, modiyfing and parsing PDF files in C++

Overview

Welcome to PDF-Writer.
A Fast and Free C++ Library for Creating, Parsing an Manipulating PDF Files and Streams.
Documentation is available here.
Project site is here.

If you are looking for a NodeJS module go here.

Update 9/11/2019 Ending Support:
Hi All, after almost 9 years I decided to finish supporting PDFWriter. You may still use the code as is, with the provided license, however I will not be providing answers, solutions, responses etc.
I'd like to thank everyone who used PDFWrtier and wish you all the best going forward with your projects.
Gal.

Comments
  • [Question] How to check and add a font into DR dictionary of the interactive form dictionary (AcroForm) in the catalog dictionary?

    [Question] How to check and add a font into DR dictionary of the interactive form dictionary (AcroForm) in the catalog dictionary?

    I need to create an interactive form (AcroForm) if it does not exist in a PDF. Then, check and add a Font entry in the DR dictionary of the AcroForm dictionary. How to achieve that in this library? AcroForm specs

    opened by magicboker 39
  • [Question] How to re-order pages in a PDF ?

    [Question] How to re-order pages in a PDF ?

    Thank Gal for the great SDK!

    Did anyone know how to use this SDK for re-ordering pages in a PDF? i.e. move the 2nd page to the 6th page

    I already checked the PDF modification sample codes, but they are all about modifying (or adding / deleting) the contents. There were no sample codes to re-order pages.

    opened by magicboker 18
  • [Question] How to correctly skip a failed object copying?

    [Question] How to correctly skip a failed object copying?

    I need to copy objects from one PDF to the other PDF, but sometimes some objects are not "valid" (problematic PDF file maybe generated by some buggy PDF apps), i.e. missing parent node, or missing any indirect object. In such cases, i want to skip the object copying, but when PDFDocumentCopyingContext::CopyObject return eFailure, it already allocated some object ID such that in the later pdfWriter.EndPDF() always returned eFailure due to unwritten objects in the xRefTable (failed in ObjectsContext::WriteXrefTable line: 204). Is there any method in the library that i can call to roll-back the states before the failed CopyObject call?

    opened by magicboker 12
  • Support CID fonts

    Support CID fonts

    Any plans on supporting CID fonts? I cannot decode the fonts atm.

    I know the PDFium lib has support for it for implementation inspiration: https://github.com/documentcloud/pdfium/tree/master/third_party/freetype/src/cid

    opened by viezel 9
  • ANSIFontWriter::WriteWidths crashes

    ANSIFontWriter::WriteWidths crashes

    ANSIFontWriter::WriteWidths crashes when called from CFFANSIFontWriter and no glyphs in the font have actually been used. Specifically, the result from mCharactersVector.begin() can't be dereferenced because it's empty in this case, so an exception is thrown.

    I'm not sure if this issue is confined to ANSIFontWriter or if other types of fonts have an analogous issue.

    Ideally, the font wouldn't be emitted at all since it's not actually used.

    opened by softhorizons 9
  • [Question] How to add LZWDecode support?

    [Question] How to add LZWDecode support?

    Failed in an invocation to copyingContext->AppendPDFPageFromPDF(pageIndex) --> calling PDFParser::CreateFilterForStream to parse a PDF page and failed: the log showed "PDFParser::CreateFilterForStream, supporting only flate decode and ascii 85 decode, failing". Then, i found the PDF using LZWDecode. It seemed this SDK doesn't support LZWDecode, am i right? If so, how to add LZWDecode support by myself? PS: if it's not easy to add LZWDecode filter support, how can i copy pages from one PDF to the other PDF without involving the unsupported filter support?

    opened by magicboker 7
  • How to find the page footer?

    How to find the page footer?

    Hi, I am new to this great library, I want to find the page footer of a PDF file then remove it. After going through the document, I didn't find some method directly related, can you please give me some hints?

    opened by theidexisted 6
  • End of stream isn't taken as a token while sometimes it should be

    End of stream isn't taken as a token while sometimes it should be

    Sometimes end of stream should be taken as a token - if the stream is not ended that mean there is an error, otherwise it was not successfully just because we encounter the end of stream before reading any data.

    opened by hadasg 6
  • Type1ToCFFEmbeddedFontWriter::AddComponentGlyphs() uses the private encoding for dependent glyphs even for glyphs defined using the 'seac' operator

    Type1ToCFFEmbeddedFontWriter::AddComponentGlyphs() uses the private encoding for dependent glyphs even for glyphs defined using the 'seac' operator

    The 'seac' operator ("Standard Encoding Accented Character") is defined to specify dependent glyphs in the Adobe Standard Encoding rather than the Type 1 font's private encoding, but the implementation of Type1ToCFFEmbeddedFontWriter::AddComponentGlyphs() was only getting the glyph names according to the font's private encoding. In our case, we were getting glyph names defined in the font's private encoding dictionary, but the font did not actually have any charstrings for those glyph names. This caused the recursive call to AddComponentGlyphs() to fail on calling Type1Input::CalculateDependenciesForCharIndex(), and that led to the failure to embed the font, and ultimately the failure to complete the PDF. Or, no glyph would be shown at all in the output PDF. For example, the charstring for "Aring" might use 'seac' with "A" and "ring" as dependent glyphs, specified in the Adobe Standard Encoding (where "ring" has the code point 0xCA). The font's private encoding might have a different name for the code point 0xCA, "eth"... if the font did not have a charstring for "eth", that would lead to the failure described above; and if the font did have a charstring for that character, the wrong glyph would be drawn. Our solution was to get the encoded glyph name explicitly from the StandardEncoding object instead.

    opened by TheGS 6
  • Failure in JPGParser while parsing specific JPG

    Failure in JPGParser while parsing specific JPG

    Hi Gal,

    Running the function PDFWriter::CreateImageXObjectFromJPGFile with specific JPG returns failure. When debugging I realized that the problem is in the function JPEGImageParser::ReadPhotoshopData, it seems that when resolutionBim not found there is an exceeding in the read bytes. I added some validation tests that seems to solve the problem, below is the function after my changes: EStatusCode JPEGImageParser::ReadPhotoshopData(JPEGImageInformation& outImageInformation,bool outPhotoshopDataOK) { EStatusCode status; unsigned int intSkip; unsigned long toSkip; unsigned int nameSkip; unsigned long dataLength; bool resolutionBimNotFound = true;

    do {
        status = ReadIntValue(intSkip);
        if(status != PDFHummus::eSuccess)
            break;
        toSkip = intSkip-2;
        status = SkipTillChar(scEOS,toSkip);
        if(status != PDFHummus::eSuccess)
            break;
        while(toSkip > 0 && resolutionBimNotFound)
        {
            status = ReadStreamToBuffer(4);
            if(status !=PDFHummus::eSuccess)
                break;
            toSkip-=4;
            if(0 != memcmp(mReadBuffer,sc8Bim,4))
                break; // k. corrupt header. stop here and just skip the next
            status = ReadStreamToBuffer(3);
            if(status !=PDFHummus::eSuccess)
                break;
            toSkip-=3;
            nameSkip = (int)mReadBuffer[2];
            if(nameSkip % 2 == 0)
                ++nameSkip;
            SkipStream(nameSkip);
            toSkip-=nameSkip;
            resolutionBimNotFound = (0 != memcmp(mReadBuffer,scResolutionBIMID,2));
            status = ReadLongValue(dataLength);
            if(status != PDFHummus::eSuccess)
                break;
            toSkip-=4;
            if(resolutionBimNotFound)
            {
                if(dataLength % 2 == 1)
                    ++dataLength;
                toSkip-=dataLength;
                SkipStream(dataLength);
            }
            else
            {
                status = ReadStreamToBuffer(16);
                if(status !=PDFHummus::eSuccess)
                    break;
                toSkip-=16;
                outImageInformation.PhotoshopInformationExists = true;
                outImageInformation.PhotoshopXDensity = GetIntValue(mReadBuffer) + GetFractValue(mReadBuffer + 2);
                outImageInformation.PhotoshopYDensity = GetIntValue(mReadBuffer + 8) + GetFractValue(mReadBuffer + 10);
            }
        }
        if(PDFHummus::eSuccess == status)
            SkipStream(toSkip);
    }while(false);
    outPhotoshopDataOK = !resolutionBimNotFound;
    return status;
    

    }

    What do you think?

    Attached is the problematic JPG. 32degrees_2

    Thanks, Hadas

    opened by hadasg 6
  • Combine PDFs while resizing pages

    Combine PDFs while resizing pages

    Hello,

    I've got a use case where I need to combine multiple PDFs into one while resizing all to be the same page size. Most sample code seems to do one or the other and I'm wondering if what I'd like to do is even possible.

    So given an array of documents paths, I've been able to combine them using the following code:

    std::string destinationFile = params[1];
    	PDFWriter pdfWriter;
    	PDFPageRange pageRange;
    	PDFParser parser;
    	InputFile pdfFile;
    	EStatusCode status = PDFHummus::eSuccess;
    
    	Php::out << "Creating " << destinationFile << std::endl;
    	pdfWriter.StartPDF("/tmp/file.pdf",ePDFVersion17);
    	
    	for (auto &iter : params[0]) {
    		if (file_exists(iter.second.stringValue())) {
    			pdfWriter.AppendPDFPagesFromPDF(iter.second.stringValue(), pageRange);
    			Php::out << "\tWith: " << iter.second.stringValue() << std::endl;
    		} else {
    			Php::out << "\t " << iter.second.stringValue() << " doesn't exist!" << std::endl;
    			return false;
    		}
    	}
    
    	pdfWriter.EndPDF();
    

    In #29 there is a link to some code about the ModifyingExistingFileContent test. However I'm having trouble understanding everything it is doing. When I add it to my particular use case (I modify the PDF I just combined from multiple sources). I can get different page sizes, however its basically just cropping the page, not resizing the content at all.

    So a few questions:

    • Is what I want to do possible?
    • Do I need to open use a PDFWriter instance more than once? Or can I get all source pages and combine it manually while resizing the pages?

    Below is the code after the code above that re-opens the modified file and re-writes the mediaBox size. Like I said it resizes the page, but not the content within it...

    pdfWriter.ModifyPDF(params[1], ePDFVersion17, destinationFile);
    	status = pdfFile.OpenFile("/tmp/file.pdf");
    
    	status = parser.StartPDFParsing(pdfFile.GetInputStream());
    	if (status != PDFHummus::eSuccess) {
    		Php::out << "unable to parse input file" << std::endl;
    	} else {
    		Php::out << "Pages: " << parser.GetPagesCount() << std::endl;
    	}
    
    	PDFDocumentCopyingContext* copyingContext = NULL;
    
    	for (long unsigned int x = 0; x < parser.GetPagesCount(); x++) {
    		// Change 3rd page bbox to landscape by modifying the page object
    		copyingContext = pdfWriter.CreatePDFCopyingContextForModifiedFile();
    		if (!copyingContext) {
    			Php::out << "failed to create copying context for modified file" << std::endl;
    			status = eFailure;
    			return nullptr;
    		}
    		// create a new object for the page, copy all but media box, which will be changed
    		ObjectIDType pageId                     = copyingContext->GetSourceDocumentParser()->GetPageObjectID(x);
    		PDFObjectCastPtr<PDFDictionary> pageObj = copyingContext->GetSourceDocumentParser()->ParsePage(x);
    
    		MapIterator<PDFNameToPDFObjectMap> pageObjIt = pageObj->GetIterator();
    
    		pdfWriter.GetObjectsContext().StartModifiedIndirectObject(pageId);
    		DictionaryContext* modifiedPageObject = pdfWriter.GetObjectsContext().StartDictionary();
    		while (pageObjIt.MoveNext()) {
    			if (pageObjIt.GetKey()->GetValue() != "MediaBox") {
    				modifiedPageObject->WriteKey(pageObjIt.GetKey()->GetValue());
    				copyingContext->CopyDirectObjectAsIs(pageObjIt.GetValue());
    			}
    		}
    		// write new media box
    		modifiedPageObject->WriteKey("MediaBox");
    		pdfWriter.GetObjectsContext().StartArray();
    		pdfWriter.GetObjectsContext().WriteInteger(0);
    		pdfWriter.GetObjectsContext().WriteInteger(0);
    		pdfWriter.GetObjectsContext().WriteInteger(500);
    		pdfWriter.GetObjectsContext().WriteInteger(500);
    		pdfWriter.GetObjectsContext().EndArray();
    		pdfWriter.GetObjectsContext().EndLine();
    
    		pdfWriter.GetObjectsContext().EndDictionary(modifiedPageObject);
    		pdfWriter.GetObjectsContext().EndIndirectObject();
    
    		// cleanup
    		delete copyingContext;
    	}
    
    opened by gnat42 5
  • Problem with compile / link in a project with CMake

    Problem with compile / link in a project with CMake

    Hi,

    I have a problem with compiling this library in a project.

    Undefined symbols for architecture x86_64:
      "PDFWriter::PDFWriter()", referenced from:
          create_pdf() in main.cpp.o
      "PDFWriter::~PDFWriter()", referenced from:
          create_pdf() in main.cpp.o
    ld: symbol(s) not found for architecture x86_64
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    ninja: build stopped: subcommand failed.
    

    The CMakeLists.txt

    cmake_minimum_required(VERSION 3.22)
    
    project(proj)
    
    set(CMAKE_CXX_STANDARD 20)
    
    ADD_SUBDIRECTORY(PDFWriter4_0)
    add_executable(proj main.cpp)
    

    The source is in directory ./PDFWriter4_0/ with all the content of release zip

    and the main.cpp:

    #include "PDFWriter4_0/PDFWriter/PDFWriter.h"
    #include "PDFWriter4_0/PDFWriter/PDFPage.h"
    
    using namespace proj;
    
    int main() {
        PDFWriter pdfWriter;
        /*pdfWriter.StartPDF("/Users/stephane/Code/CL/C++/astronumerologie-generate/my_pdf.pdf",ePDFVersion13);
        PDFPage* pdfPage = new PDFPage();
        pdfPage->SetMediaBox(PDFRectangle(0,0,595,842));
    
        /* Add some content for the page */
    
        /*
        pdfWriter.WritePageAndRelease(pdfPage);
        pdfWriter.EndPDF();*/
    }
    ```
    
    I'm a beginner with CMake. Thanks for making this library.
    opened by stephaneworkspace 2
  • Rectangle with overprint

    Rectangle with overprint

    I need to draw several rectangles with a fill colour and no stroke. Some of them need to be overprinted above the ones below. To do that in a PDF, it seems I need to have two graphics states. I do not know how to do it with this library. Can someone guide me how to add two graphics states in the resources page.

    opened by nosleduc 0
  • docs: fix simple typo, seconary -> secondary

    docs: fix simple typo, seconary -> secondary

    There is a small typo in PDFWriter/JPEGImageParser.cpp.

    Should read secondary rather than seconary.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • Undefined behaviour and leak in PDFObject.cpp due to delete void*

    Undefined behaviour and leak in PDFObject.cpp due to delete void*

    PDFWriter/PDFObject.cpp:58:3: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                    delete it->second;
                    ^      ~~~~~~~~~~
    PDFWriter/PDFObject.cpp:98:2: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
            delete result;
    

    Delete via void* is undefined behaviour, C++ Standard section 5.3.5/3:

    In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined. In the second alternative (delete array) if the dynamic type of the object to be deleted differs from its static type, the behavior is undefined

    80)This implies that an object cannot be deleted using a pointer of type void* because void is not an object type.

    Destructor of your original type will not be called:

    delete Abc* delete void*

    I found one place which uses SetMetadata in DecryptionHelper and sets new std::list as metadata which will be leaked because in order to release resources used by std::list you need to call its destructor.

    opened by misos1 0
Releases(4.2)
  • 4.2(Dec 27, 2022)

    Security update for the PDF-Writer project parser. The parser is used for embedding PDF files in a resultant output file or when parsing a PDF for any other reason. The release contains resolution to NPE issues discovered recently and corrections to some other issues found by Fuzz testing the parser.

    Source code(tar.gz)
    Source code(zip)
  • 4.1(Sep 17, 2022)

    A current release to include features since 2018.

    • Plugging some more memory leaks
    • LZW streams support for the parser
    • Enabling M1 compilation when using embedded libs
    Source code(tar.gz)
    Source code(zip)
  • 4.0(Apr 7, 2018)

    • Remove some compilation warnings from Xcode/Linux envs
    • Transparency groups corrections
    • Appending pages (and merging, and making forms) supports inherited resources
    • InputAsciiHexDecode filter support (By Lidia M.)
    • Allow providing predefined Object IDs to forms in various usages
    • Large OTF glyphs error correction
    Source code(tar.gz)
    Source code(zip)
  • 3.9(Sep 23, 2017)

    Add AES support to read, write and modification scenarios. As a results - some bug corrections when reading/modifying scenarios that only come up with PDF versions >= 1.6, which the lib didn't handle before.

    Also - default pdf level is now 1.4, to match PNG support that was recently added and its requirement for transparency.

    Source code(tar.gz)
    Source code(zip)
  • 3.8(Sep 10, 2017)

Generate a PDF Planner for the Remarkable Platform

Planner PDF This will generate a PDF File that can be used as a planner on the remarkable platform. It uses the libharu library to generate the PDF an

null 48 Dec 25, 2022
Hello, I created a real banking system with creating each customer a private file.

bankingsystem Hello, I created a real banking system for my school project with creating each customer a private file. Your Transaction Code is your l

Byrsh 2 Dec 21, 2021
Hello, I am creating this file to make everyone understand the basis of C++ language which is actually the advanced version of C but better than C because of its OOPs feature.

Hello-in-C++ ?? ?? FOR BEGINNERS IN C++ Hello, I am creating this file to make everyone understand the basics of C++ language which is actually the ad

Ankita Mohan 2 Dec 27, 2021
Parsing the Linux procfs

Very easy to use, procfs parsing library in C++. Build Run cmake . && make Currently supported CMake configuration flags: BUILD_SHARED_LIBS=<ON|OFF>:

Daniel Trugman 69 Jan 5, 2023
C++98 library that encapsulates memory-mapped-files for POSIX or Windows

Memory-Mapped File C++ Library Tutorial and Reference Purpose This is a library, for the C++98 language and its successive versions, to handle files a

Carlo Milanesi 84 Dec 28, 2022
Welcome to my dungeon. Here, I keep all my configuration files in case I have a stroke and lose all my memory. You're very welcome to explore and use anything in this repository. Have fun!

Fr1nge's Dotfiles Welcome to my dungeon. Here, I keep all my configuration files in case I have a stroke an d lose all my memory. You're very welcome

Fr1nge 33 Oct 28, 2022
chap analyzes un-instrumented core files for leaks, memory growth, and corruption

chap analyzes un-instrumented ELF core files for leaks, memory growth, and corruption. It is sufficiently reliable that it can be used in automation t

VMware 307 Dec 21, 2022
A high level programming language which compiles to C.

What is Stilts? The goal of this project is to create a language that's nice to work with, looks and feels like Java, but maps to low level C code wit

apaz 26 Jan 7, 2023
Mastering-Cpp-Game-Development - Code files for Mastering C++ Game Development, published by Packt

Mastering C++ Game Development This is the code repository for Mastering C++ Game Development, published by Packt. It contains all the supporting proj

Packt 74 Jan 2, 2023
This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendation, etc.

?? C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendation, etc.

huihut 27k Dec 31, 2022
Minimal Linux Live (MLL) is a tiny educational Linux distribution, which is designed to be built from scratch by using a collection of automated shell scripts. Minimal Linux Live offers a core environment with just the Linux kernel, GNU C library, and Busybox userland utilities.

Minimal Linux Live (MLL) is a tiny educational Linux distribution, which is designed to be built from scratch by using a collection of automated shell scripts. Minimal Linux Live offers a core environment with just the Linux kernel, GNU C library, and Busybox userland utilities.

John Davidson 1.3k Jan 8, 2023
A cheatsheet of modern C++ language and library features.

C++20/17/14/11 Overview Many of these descriptions and examples come from various resources (see Acknowledgements section), summarized in my own words

Anthony Calandra 15.4k Jan 6, 2023
Feature-rich C99 library for memory scanning purposes, designed for Windows running machines, meant to work on both 32-bit and 64-bit portable executables. Has a modern C++ wrapper.

memscan Feature-rich C99 library for memory scanning purposes, designed for Windows running machines, meant to work on both 32-bit and 64-bit portable

cristei 15 Oct 2, 2022
C++20 Concepts IO library which is 10x faster than stdio and iostream

fast_io fast_io is a new C++20 library for extremely fast input/output and aims to replace iostream and cstdio. It is header-only (module only in the

null 153 Feb 16, 2022
Modern, header-only, compact and cross platform C++ network/sockets library

cpp-net-lib Modern, header-only, compact and cross-platform C++ network/sockets library. Don't mind the crappy name, I suck at naming things. Why? I n

Marc 10 Jul 20, 2022
Connect 4 clone written with c++ with the RSGL library. Based on my connect 4 clone written in python/pygame and my SDL port of that same repo. Along with 3DS support by SaCode

RSGL-Connect-4 Building linux git clone https://github.com/RSGL-Org/RSGL-Connect-4.git cd RSGL-Connect-4 make ./Connect4 Bulding 3ds (3ds support

RSGL 1 Dec 28, 2022
C++ Type Traits for Smart Pointers that are not included in the standard library, containing inheritance detection and member detection.

Smart Pointer Type Trait ?? A simple, header-only cpp library implementing smart pointer type traits. You can easily compile your code diffrently depe

Woon2 12 Sep 14, 2022
C++ standard library reference

Information This is source package for Cppreference C++ standard library reference documentation available at http://en.cppreference.com. If there is

Povilas Kanapickas 358 Dec 17, 2022
A library of language lexers for use with Scintilla

README for Lexilla library. The Lexilla library contains a set of lexers and folders that provides support for programming, mark-up, and data languag

Scintilla 93 Jan 1, 2023