Class AbstractBankPDFPage
java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.PDFTextStripper
org.apache.pdfbox.text.PDFTextStripperByArea
de.frankmuenster.mahoe.pdfextractor.AbstractBankPDFPage
- Direct Known Subclasses:
AbstractBankPDFPageFirst,SantanderPdfPageEven,SantanderPdfPageOdd,TargoBankPdfPageEven,TargoBankPdfPageOdd
public abstract class AbstractBankPDFPage
extends org.apache.pdfbox.text.PDFTextStripperByArea
Extended PDFTextStripperByArea class to add positions of each word to the
result string.
- Author:
- Frank Münster
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringArea-Name für die Beschreibungstatic final StringArea-Name für den Betragstatic final StringArea-Name für Buchungsdatumstatic final doubleConversion value from CM to PDFBox valuesstatic final StringArea-Name für den fälligen Betragstatic final StringArea-Name für die Empfänger BIC der Ausgleichsbuchungstatic final StringArea-Name für das Fälligkeitsdatumstatic final StringArea-Name für die Empfänger IBAN der Ausgleichsbuchungstatic final StringArea-Name für den Fremdwährungsbetragstatic final StringArea-Name für die Fremdwährungstatic final StringArea-Name für den Karteninhaberstatic final StringArea-Name für das Kartenkontostatic final StringArea-Name für das Kaufdatumstatic final StringArea-Name für denKursstatic final StringArea-Name für Rechungsdatumstatic final StringArea-Name für SaldoFields inherited from class org.apache.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedStandard constructor calls super class. -
Method Summary
Modifier and TypeMethodDescriptionprotected floatcomputeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0) Defines the regions in a Map.protected static RectanglegetRectangleFrom(double regionX, double regionY, double regionW, double regionH) Return the region asRectangle.protected abstract doublegetXPos()Liefert die zu erwartete X-Position des Buchungsdatenprotected final voidAdd the region to this TextStripper, but only if it is notNULLprotected voidshowGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) protected voidwriteString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) Override the default functionality of PDFTextStripper.Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea
addRegion, extractRegions, getRegions, getTextForRegion, processTextPosition, removeRegion, setShouldSeparateByBeads, writePageMethods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeText, writeWordSeparatorMethods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Field Details
-
CM_TO_PDF
public static final double CM_TO_PDFConversion value from CM to PDFBox values- See Also:
-
KARTEN_KONTO
Area-Name für das Kartenkonto- See Also:
-
KARTEN_INHABER
Area-Name für den Karteninhaber- See Also:
-
RECHNUNGS_DATUM
Area-Name für Rechungsdatum- See Also:
-
SALDO
Area-Name für Saldo- See Also:
-
BUCHUNGS_DATUM
Area-Name für Buchungsdatum- See Also:
-
KAUF_DATUM
Area-Name für das Kaufdatum- See Also:
-
BESCHREIBUNG
Area-Name für die Beschreibung- See Also:
-
FREMDWAEHRUNG
Area-Name für die Fremdwährung- See Also:
-
FREMDW_BETRAG
Area-Name für den Fremdwährungsbetrag- See Also:
-
KURS
Area-Name für denKurs- See Also:
-
BETRAG
Area-Name für den Betrag- See Also:
-
FAELLIG_BETRAG
Area-Name für den fälligen Betrag- See Also:
-
FAELLIG_DATUM
Area-Name für das Fälligkeitsdatum- See Also:
-
FAELLIG_IBAN
Area-Name für die Empfänger IBAN der Ausgleichsbuchung- See Also:
-
FAELLIG_BIC
Area-Name für die Empfänger BIC der Ausgleichsbuchung- See Also:
-
-
Constructor Details
-
AbstractBankPDFPage
Standard constructor calls super class.- Throws:
IOException
-
-
Method Details
-
getRectangleFrom
protected static Rectangle getRectangleFrom(double regionX, double regionY, double regionW, double regionH) Return the region asRectangle. It converts centimetre into PDF measure- Parameters:
regionX-regionY-regionW-regionH-- Returns:
- a
Rectanglefor defined region
-
defineRegions
Defines the regions in a Map. The region rectangle values must able be converted to PDF measures. Following key needs to be defined in theMap:
AbstractBankPDFPage.BUCHUNGS_DATUMAbstractBankPDFPage.KAUF_DATUMAbstractBankPDFPage.BESCHREIBUNGAbstractBankPDFPage.FREMDW_BETRAGAbstractBankPDFPage.FREMDWAEHRUNGAbstractBankPDFPage.KURSAbstractBankPDFPage.BETRAG
- Returns:
- a
Mapwith the regions defined
-
getXPos
protected abstract double getXPos()Liefert die zu erwartete X-Position des Buchungsdaten- Returns:
- die zu erwartete X-Position des Buchungsdaten
-
setRegion
Add the region to this TextStripper, but only if it is notNULL- Parameters:
regionName- the name for the regionregions- the rectangle defining the region
-
writeString
protected void writeString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) throws IOException Override the default functionality of PDFTextStripper.- Overrides:
writeStringin classorg.apache.pdfbox.text.PDFTextStripper- Throws:
IOException
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) throws IOException - Overrides:
showGlyphin classorg.apache.pdfbox.contentstream.PDFStreamEngine- Throws:
IOException
-
computeFontHeight
- Throws:
IOException
-