Package org.apache.pdfbox.pdmodel
Class PDDocument
java.lang.Object
org.apache.pdfbox.pdmodel.PDDocument
- All Implemented Interfaces:
Pageable,Closeable,AutoCloseable
- Direct Known Subclasses:
ConformingPDDocument
This is the in-memory representation of the PDF document. You need to call
close() on this object when you are done using it!!
This class implements the Pageable interface, but since PDFBox
version 1.3.0 you should be using the PDPageable adapter instead
(see PDFBOX-788).
- Version:
- $Revision: 1.47 $
- Author:
- Ben Litchfield
-
Field Summary
Fields inherited from interface java.awt.print.Pageable
UNKNOWN_NUMBER_OF_PAGES -
Constructor Summary
ConstructorsConstructorDescriptionConstructor, creates a new PDF Document with no pages.PDDocument(COSDocument doc) Constructor that uses an existing document.PDDocument(COSDocument doc, BaseParser usedParser) Constructor that uses an existing document. -
Method Summary
Modifier and TypeMethodDescriptionvoidThis will add a page to the document.voidaddSignature(PDSignature sigObject, SignatureInterface signatureInterface) Add a signature.voidaddSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) This will add a signature to the document.voidaddSignatureField(List<PDSignatureField> sigFields, SignatureInterface signatureInterface, SignatureOptions options) This will add a signaturefield to the document.voidDeprecated.Do not rely on this method anymore.voidclose()This will close the underlying COSDocument object.voidThis will decrypt a document.voidThis will mark a document to be encrypted.Returns the access permissions granted when the document was decrypted.This will get the low level document.This will get the document CATALOG.This will get the document info dictionary.This will get the encryption dictionary for this document.This will return the last signature.intDeprecated.Do not rely on this method anymore.intDeprecated.Use the getNumberOfPages method instead!getPageFormat(int pageIndex) Deprecated.Use thePDPageableadapter classThis will return the Map containing the mapping from object-ids to pagenumbers.getPrintable(int pageIndex) Get the security handler that is used for document encryption.Retrieve all signature dictionaries from the document.Deprecated.usegetLastSignatureDictionary()instead.Retrieve all signature fields from the document.Deprecated.Do not rely on this method anymore.importPage(PDPage page) This will import and copy the contents from another location.booleanIndicates if all security is removed or not when writing the pdf.booleanThis will tell if this document is encrypted or not.booleanisOwnerPassword(String password) Deprecated.booleanisUserPassword(String password) Deprecated.static PDDocumentThis will load a document from a file.static PDDocumentload(File file, RandomAccess scratchFile) This will load a document from a file.static PDDocumentload(InputStream input) This will load a document from an input stream.static PDDocumentload(InputStream input, boolean force) This will load a document from an input stream.static PDDocumentload(InputStream input, RandomAccess scratchFile) This will load a document from an input stream.static PDDocumentload(InputStream input, RandomAccess scratchFile, boolean force) This will load a document from an input stream.static PDDocumentThis will load a document from a file.static PDDocumentThis will load a document from a file.static PDDocumentload(String filename, RandomAccess scratchFile) This will load a document from a file.static PDDocumentThis will load a document from a url.static PDDocumentThis will load a document from a url.static PDDocumentload(URL url, RandomAccess scratchFile) This will load a document from a url.static PDDocumentloadNonSeq(File file, RandomAccess scratchFile) Parses PDF with the new non sequential parser and an empty password.static PDDocumentloadNonSeq(File file, RandomAccess scratchFile, String password) Parses PDF with the new non sequential parser and an empty password.static PDDocumentloadNonSeq(InputStream input, RandomAccess scratchFile) Parses PDF with the new non sequential parser.static PDDocumentloadNonSeq(InputStream input, RandomAccess scratchFile, String password) Parses PDF with the new non sequential parser.voidTries to decrypt the document in memory using the provided decryption material.voidprint()This will send the PDF document to a printer.voidprint(PrinterJob printJob) voidProtects the document with the protection policy pp.booleanremovePage(int pageNumber) Remove the page from the document.booleanremovePage(PDPage page) Remove the page from the document.voidSave the document to a file.voidsave(OutputStream output) This will save the document to an output stream.voidSave the document to a file.voidsaveIncremental(InputStream input, OutputStream output) Save the pdf as incremental for signing.voidsaveIncremental(String fileName) Save the pdf as incremental for signing.voidsetAllSecurityToBeRemoved(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.voidsetDocumentId(Long docId) voidThis will set the document information for this document.voidsetEncryptionDictionary(PDEncryptionDictionary encDictionary) This will set the encryption dictionary for this document.booleansetSecurityHandler(SecurityHandler secHandler) Sets security handler if none is set already.voidThis will send the PDF to the default printer without prompting the user for any printer settings.voidsilentPrint(PrinterJob printJob) This will send the PDF to the default printer without prompting the user for any printer settings.booleanDeprecated.usegetCurrentAccessPermissioninsteadbooleanDeprecated.Do not rely on this method anymore.
-
Constructor Details
-
PDDocument
public PDDocument()Constructor, creates a new PDF Document with no pages. You need to add at least one page for the document to be valid. -
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc- The COSDocument that this document wraps.
-
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc- The COSDocument that this document wraps.usedParser- the parser which is used to read the pdf
-
-
Method Details
-
getPageMap
This will return the Map containing the mapping from object-ids to pagenumbers.- Returns:
- the pageMap
-
addPage
This will add a page to the document. This is a convenience method, that will add the page to the root of the hierarchy and set the parent of the page to the root.- Parameters:
page- The page to add to the document.
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface) throws IOException, SignatureException Add a signature.- Parameters:
sigObject- is the PDSignature modelsignatureInterface- is a interface which provides signing capabilities- Throws:
IOException- if there is an error creating required fieldsSignatureException- if something went wrong
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) throws IOException, SignatureException This will add a signature to the document.- Parameters:
sigObject- is the PDSignature modelsignatureInterface- is a interface which provides signing capabilitiesoptions- signature options- Throws:
IOException- if there is an error creating required fieldsSignatureException- if something went wrong
-
addSignatureField
public void addSignatureField(List<PDSignatureField> sigFields, SignatureInterface signatureInterface, SignatureOptions options) throws IOException, SignatureException This will add a signaturefield to the document.- Parameters:
sigFields- are the PDSignatureFields that should be added to the documentsignatureInterface- is a interface which provides signing capabilitiesoptions- signature options- Throws:
IOException- if there is an error creating required fieldsSignatureException
-
removePage
Remove the page from the document.- Parameters:
page- The page to remove from the document.- Returns:
- true if the page was found false otherwise.
-
removePage
public boolean removePage(int pageNumber) Remove the page from the document.- Parameters:
pageNumber- 0 based index to page number.- Returns:
- true if the page was found false otherwise.
-
importPage
This will import and copy the contents from another location. Currently the content stream is stored in a scratch file. The scratch file is associated with the document. If you are adding a page to this document from another document and want to copy the contents to this document's scratch file then use this method otherwise just use the addPage method. UnlikeaddPage(org.apache.pdfbox.pdmodel.PDPage), this method does a deep copy. If your page has annotations, and if these link to pages not in the target document, then the target document might become huge. What you need to do is to delete page references of such annotations. See here for how to do this.- Parameters:
page- The page to import.- Returns:
- The page that was imported.
- Throws:
IOException- If there is an error copying the page.
-
getDocument
This will get the low level document.- Returns:
- The document that this layer sits on top of.
-
getDocumentInformation
This will get the document info dictionary. This is guaranteed to not return null.- Returns:
- The documents /Info dictionary
-
setDocumentInformation
This will set the document information for this document.- Parameters:
info- The updated document information.
-
getDocumentCatalog
This will get the document CATALOG. This is guaranteed to not return null.- Returns:
- The documents /Root dictionary
-
isEncrypted
public boolean isEncrypted()This will tell if this document is encrypted or not.- Returns:
- true If this document is encrypted.
-
getEncryptionDictionary
This will get the encryption dictionary for this document. This will still return the parameters if the document was decrypted. If the document was never encrypted then this will return null. As the encryption architecture in PDF documents is plugable this returns an abstract class, but the only supported subclass at this time is a PDStandardEncryption object.- Returns:
- The encryption dictionary(most likely a PDStandardEncryption object)
- Throws:
IOException- If there is an error determining which security handler to use.
-
setEncryptionDictionary
This will set the encryption dictionary for this document.- Parameters:
encDictionary- The encryption dictionary(most likely a PDStandardEncryption object)- Throws:
IOException- If there is an error determining which security handler to use.
-
getSignatureDictionary
Deprecated.usegetLastSignatureDictionary()instead.This will return the last signature.- Returns:
- the last signature as
PDSignature. - Throws:
IOException- if no document catalog can be found.
-
getLastSignatureDictionary
This will return the last signature.- Returns:
- the last signature as
PDSignature. - Throws:
IOException- if no document catalog can be found.
-
getSignatureFields
Retrieve all signature fields from the document.- Returns:
- a
ListofPDSignatureFields - Throws:
IOException- if no document catalog can be found.
-
getSignatureDictionaries
Retrieve all signature dictionaries from the document.- Returns:
- a
ListofPDSignatures - Throws:
IOException- if no document catalog can be found.
-
isUserPassword
@Deprecated public boolean isUserPassword(String password) throws IOException, CryptographyException Deprecated.This will determine if this is the user password. This only applies when the document is encrypted and uses standard encryption.- Parameters:
password- The plain text user password.- Returns:
- true If the password passed in matches the user password used to encrypt the document.
- Throws:
IOException- If there is an error determining if it is the user password.CryptographyException- If there is an error in the encryption algorithms.
-
isOwnerPassword
@Deprecated public boolean isOwnerPassword(String password) throws IOException, CryptographyException Deprecated.This will determine if this is the owner password. This only applies when the document is encrypted and uses standard encryption.- Parameters:
password- The plain text owner password.- Returns:
- true If the password passed in matches the owner password used to encrypt the document.
- Throws:
IOException- If there is an error determining if it is the user password.CryptographyException- If there is an error in the encryption algorithms.
-
decrypt
This will decrypt a document. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.Do not call this method if you have opened your document with one of the
loadNonSeqmethods.- Parameters:
password- Either the user or owner password.- Throws:
CryptographyException- If there is an error decrypting the document.IOException- If there is an error getting the stream data.
-
wasDecryptedWithOwnerPassword
Deprecated.usegetCurrentAccessPermissioninsteadThis will tell if the document was decrypted with the master password. This entry is invalid if the PDF was not decrypted.- Returns:
- true if the pdf was decrypted with the master password.
-
encrypt
public void encrypt(String ownerPassword, String userPassword) throws CryptographyException, IOException This will mark a document to be encrypted. The actual encryption will occur when the document is saved. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.- Parameters:
ownerPassword- The owner password to encrypt the document.userPassword- The user password to encrypt the document.- Throws:
CryptographyException- If an error occurs during encryption.IOException- If there is an error accessing the data.
-
getOwnerPasswordForEncryption
Deprecated.Do not rely on this method anymore.The owner password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.- Returns:
- The owner password passed to the encrypt method.
-
getUserPasswordForEncryption
Deprecated.Do not rely on this method anymore.The user password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.- Returns:
- The user password passed to the encrypt method.
-
willEncryptWhenSaving
Deprecated.Do not rely on this method anymore. It is the responsibility of COSWriter to hold this stateInternal method do determine if the document will be encrypted when it is saved.- Returns:
- True if encrypt has been called and the document has not been saved yet.
-
clearWillEncryptWhenSaving
Deprecated.Do not rely on this method anymore. It is the responsability of COSWriter to hold this state.This shoule only be called by the COSWriter after encryption has completed. -
load
This will load a document from a url.- Parameters:
url- The url to load the PDF from.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a url. Used for skipping corrupt pdf objects- Parameters:
url- The url to load the PDF from.force- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a url.- Parameters:
url- The url to load the PDF from.scratchFile- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
filename- The name of the file to load.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a file. Allows for skipping corrupt pdf objects- Parameters:
filename- The name of the file to load.force- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
filename- The name of the file to load.scratchFile- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
file- The name of the file to load.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
file- The name of the file to load.scratchFile- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from an input stream.- Parameters:
input- The stream that contains the document.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from an input stream. Allows for skipping corrupt pdf objects- Parameters:
input- The stream that contains the document.force- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
This will load a document from an input stream.- Parameters:
input- The stream that contains the document.scratchFile- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
load
public static PDDocument load(InputStream input, RandomAccess scratchFile, boolean force) throws IOException This will load a document from an input stream. Allows for skipping corrupt pdf objects- Parameters:
input- The stream that contains the document.scratchFile- A location to store temp PDFBox data for this document.force- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException- If there is an error reading from the stream.
-
loadNonSeq
Parses PDF with the new non sequential parser and an empty password.- Parameters:
file- file to be loadedscratchFile- location to store temp PDFBox data for this document- Returns:
- loaded document
- Throws:
IOException- in case of a file reading or parsing error
-
loadNonSeq
public static PDDocument loadNonSeq(File file, RandomAccess scratchFile, String password) throws IOException Parses PDF with the new non sequential parser and an empty password.- Parameters:
file- file to be loadedscratchFile- location to store temp PDFBox data for this documentpassword- password to be used for decryption- Returns:
- loaded document
- Throws:
IOException- in case of a file reading or parsing error
-
loadNonSeq
Parses PDF with the new non sequential parser.- Parameters:
input- stream that contains the document.scratchFile- location to store temp PDFBox data for this document- Returns:
- loaded document
- Throws:
IOException- in case of a file reading or parsing error
-
loadNonSeq
public static PDDocument loadNonSeq(InputStream input, RandomAccess scratchFile, String password) throws IOException Parses PDF with the new non sequential parser.- Parameters:
input- stream that contains the document.scratchFile- location to store temp PDFBox data for this documentpassword- password to be used for decryption- Returns:
- loaded document
- Throws:
IOException- in case of a file reading or parsing error
-
save
Save the document to a file.- Parameters:
fileName- The file to save as.- Throws:
IOException- If there is an error saving the document.COSVisitorException- If an error occurs while generating the data.
-
save
Save the document to a file.- Parameters:
file- The file to save as.- Throws:
IOException- If there is an error saving the document.COSVisitorException- If an error occurs while generating the data.
-
save
This will save the document to an output stream.- Parameters:
output- The stream to write to.- Throws:
IOException- If there is an error writing the document.COSVisitorException- If an error occurs while generating the data.
-
saveIncremental
Save the pdf as incremental for signing. Use this only for small files because this method temporarily stores the entire file into memory.- Parameters:
fileName- the filename to be used. This should be a copy of the original file.- Throws:
IOException- if something went wrongCOSVisitorException- if something went wrong
-
saveIncremental
public void saveIncremental(InputStream input, OutputStream output) throws IOException, COSVisitorException Save the pdf as incremental for signing. See the signature examples sources on how to use this.- Parameters:
input- . This must be a FileInputStream or it won't work. It should point to the same file than the output parameter.output- . This must be a FileOutputStream or it won't work. It must be positioned at the end of the file, i.e. it should just have written the original file. The appending constructor of FileOutputStream has been found not to be working, so you need to write the whole file yourself.- Throws:
IOException- if something went wrongCOSVisitorException- if something went wrong
-
getPageCount
Deprecated.Use the getNumberOfPages method instead!This will return the total page count of the PDF document. Note: This method is deprecated in favor of the getNumberOfPages method. The getNumberOfPages is a required interface method of the Pageable interface. This method will be removed in a future version of PDFBox!!- Returns:
- The total number of pages in the PDF document.
-
getNumberOfPages
public int getNumberOfPages()- Specified by:
getNumberOfPagesin interfacePageable
-
getPageFormat
Deprecated.Use thePDPageableadapter classReturns the format of the page at the given index when using a default printer job returned byPrinterJob.getPrinterJob().- Specified by:
getPageFormatin interfacePageable- Parameters:
pageIndex- page index, zero-based- Returns:
- page format
-
getPrintable
- Specified by:
getPrintablein interfacePageable
-
print
- Parameters:
printJob- The printer job.- Throws:
PrinterException- If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.- See Also:
-
print
This will send the PDF document to a printer. The printing functionality depends on the org.apache.pdfbox.pdfviewer.PageDrawer functionality. The PageDrawer is a work in progress and some PDFs will print correctly and some will not. This is a convenience method to create the java.awt.print.PrinterJob. The PDDocument implements the java.awt.print.Pageable interface and PDPage implementes the java.awt.print.Printable interface, so advanced printing capabilities can be done by using those interfaces instead of this method.- Throws:
PrinterException- If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.
-
silentPrint
This will send the PDF to the default printer without prompting the user for any printer settings.- Throws:
PrinterException- If there is an error while printing.- See Also:
-
silentPrint
This will send the PDF to the default printer without prompting the user for any printer settings.- Parameters:
printJob- A printer job definition.- Throws:
PrinterException- If there is an error while printing.- See Also:
-
close
This will close the underlying COSDocument object.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- If there is an error releasing resources.
-
protect
Protects the document with the protection policy pp. The document content will be really encrypted when it will be saved. This method only marks the document for encryption.- Parameters:
pp- The protection policy.- Throws:
BadSecurityHandlerException- If there is an error during protection.- See Also:
-
openProtection
public void openProtection(DecryptionMaterial pm) throws BadSecurityHandlerException, IOException, CryptographyException Tries to decrypt the document in memory using the provided decryption material.Do not call this method if you have opened your document with one of the
loadNonSeqmethods.- Parameters:
pm- The decryption material (password or certificate).- Throws:
BadSecurityHandlerException- If there is an error during decryption.IOException- If there is an error reading cryptographic information.CryptographyException- If there is an error during decryption.- See Also:
-
getCurrentAccessPermission
Returns the access permissions granted when the document was decrypted. If the document was not decrypted this method returns the access permission for a document owner (ie can do everything). The returned object is in read only mode so that permissions cannot be changed. Methods providing access to content should rely on this object to verify if the current user is allowed to proceed.- Returns:
- the access permissions for the current user on the document.
-
getSecurityHandler
Get the security handler that is used for document encryption.- Returns:
- The handler used to encrypt/decrypt the document.
-
setSecurityHandler
Sets security handler if none is set already.- Parameters:
secHandler- security handler to be assigned to document- Returns:
trueif security handler was set,falseotherwise (a security handler was already set)
-
isAllSecurityToBeRemoved
public boolean isAllSecurityToBeRemoved()Indicates if all security is removed or not when writing the pdf.- Returns:
- returns true if all security shall be removed otherwise false
-
setAllSecurityToBeRemoved
public void setAllSecurityToBeRemoved(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.- Parameters:
removeAllSecurity- remove all security if set to true
-
getDocumentId
-
setDocumentId
-