Package org.apache.pdfbox.pdfparser
Class ConformingPDFParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.ConformingPDFParser
- Author:
- Adam Nichols
-
Field Summary
FieldsFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
DEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected byteThis will read all bytes until a non-whitespace character is found.protected byteThis will read all bytes (backwards) until a non-whitespace character is found.This will get the document that was parsed.getObject(long objectNumber, long generation) This will get the PD document that was parsed.booleanvoidparse()This will parse the stream and populate the COSDocument object.protected COSNumberparseNumber(String number) protected longprotected COSBaseprocessCosObject(String string) protected Stringprotected bytereadByte()protected byteprotected COSDictionaryprotected intreadInt()This will read an integer from the stream.protected StringreadLine()This will read a line starting with the byte at offset and going forward until it finds a newline.protected StringThis will read a line starting with the byte at offset and going backwards until it finds a newline.protected longThis will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long.protected COSNameprotected COSNumberThis will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).protected COSBaseThis actually reads the object data.readObject(long objectNumber, long generation) This will read an object from the inputFile at whatever our currentOffset is.protected COSBaseprotected StringThis will read the next string from the stream.protected StringreadWord()voidsetRecursivlyRead(boolean recursivlyRead) Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
clearResources, isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseCOSString, parseDirObject, readExpectedString, readGenerationNumber, readLong, readObjectNumber, readString, readStringNumber, readUntilEndStream, setDocument, skipSpaces
-
Field Details
-
inputFile
-
-
Constructor Details
-
ConformingPDFParser
Constructor.- Parameters:
inputFile- The input stream that contains the PDF document.- Throws:
IOException- If there is an error initializing the stream.
-
-
Method Details
-
parse
This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.- Throws:
IOException- If there is an error reading from the stream or corrupt data is found.
-
getDocument
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.- Returns:
- The document that was parsed.
- Throws:
IOException- If there is an error getting the document.
-
getPDDocument
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Returns:
- The document at the PD layer.
- Throws:
IOException- If there is an error getting the document.
-
parseTrailerInformation
- Throws:
IOExceptionNumberFormatException
-
readByteBackwards
- Throws:
IOException
-
readByte
- Throws:
IOException
-
readBackwardUntilWhitespace
- Throws:
IOException
-
consumeWhitespaceBackwards
This will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
IOException- if there is an error reading from the file
-
consumeWhitespace
This will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
IOException- if there is an error reading from the file
-
readLongBackwards
This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.- Returns:
- the parsed number
- Throws:
IOException- if there is an error reading from the fileNumberFormatException- if the bytes read can not be converted to a number
-
readInt
Description copied from class:BaseParserThis will read an integer from the stream.- Overrides:
readIntin classBaseParser- Returns:
- The integer that was read from the stream.
- Throws:
IOException- If there is an error reading from the stream.
-
readNumber
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).- Returns:
- the COSNumber which was read/parsed
- Throws:
IOException
-
parseNumber
- Throws:
IOException
-
processCosObject
- Throws:
IOException
-
readObjectBackwards
- Throws:
IOException
-
readNameBackwards
- Throws:
IOException
-
getObject
- Throws:
IOException
-
readObject
This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.- Parameters:
objectNumber- the object number you expect to readgeneration- the generation you expect this object to be- Returns:
- the object being read.
- Throws:
IOException
-
readObject
This actually reads the object data.- Returns:
- the object which is read
- Throws:
IOException
-
readString
This will read the next string from the stream.- Overrides:
readStringin classBaseParser- Returns:
- The string that was read from the stream.
- Throws:
IOException- If there is an error reading from the stream.
-
readDictionaryBackwards
- Throws:
IOException
-
readLineBackwards
This will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Returns:
- the string which was read
- Throws:
IOException- if there was an error reading data from the file
-
readLine
This will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Overrides:
readLinein classBaseParser- Returns:
- the string which was read
- Throws:
IOException- if there was an error reading data from the file
-
readWord
- Throws:
IOException
-
isRecursivlyRead
public boolean isRecursivlyRead()- Returns:
- the recursivlyRead
-
setRecursivlyRead
public void setRecursivlyRead(boolean recursivlyRead) - Parameters:
recursivlyRead- the recursivlyRead to set
-