Class AbstractBankPDFPage
java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.PDFTextStripper
org.apache.pdfbox.text.PDFTextStripperByArea
de.frankmuenster.mahoe.pdfextractor.AbstractBankPDFPage
- Direct Known Subclasses:
AbstractBankPDFPageFirst
,SantanderPdfPageEven
,SantanderPdfPageOdd
,TargoBankPdfPageEven
,TargoBankPdfPageOdd
public abstract class AbstractBankPDFPage
extends org.apache.pdfbox.text.PDFTextStripperByArea
Extended PDFTextStripperByArea class to add positions of each word to the
result string.
- Author:
- Frank Münster
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
Area-Name für die Beschreibungstatic final String
Area-Name für den Betragstatic final String
Area-Name für Buchungsdatumstatic final double
Conversion value from CM to PDFBox valuesstatic final String
Area-Name für den fälligen Betragstatic final String
Area-Name für die Empfänger BIC der Ausgleichsbuchungstatic final String
Area-Name für das Fälligkeitsdatumstatic final String
Area-Name für die Empfänger IBAN der Ausgleichsbuchungstatic final String
Area-Name für den Fremdwährungsbetragstatic final String
Area-Name für die Fremdwährungstatic final String
Area-Name für den Karteninhaberstatic final String
Area-Name für das Kartenkontostatic final String
Area-Name für das Kaufdatumstatic final String
Area-Name für denKursstatic final String
Area-Name für Rechungsdatumstatic final String
Area-Name für SaldoFields inherited from class org.apache.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
Standard constructor calls super class. -
Method Summary
Modifier and TypeMethodDescriptionprotected float
computeFontHeight
(org.apache.pdfbox.pdmodel.font.PDFont arg0) Defines the regions in a Map.protected static Rectangle
getRectangleFrom
(double regionX, double regionY, double regionW, double regionH) Return the region asRectangle
.protected abstract double
getXPos()
Liefert die zu erwartete X-Position des Buchungsdatenprotected final void
Add the region to this TextStripper, but only if it is notNULL
protected void
showGlyph
(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) protected void
writeString
(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) Override the default functionality of PDFTextStripper.Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea
addRegion, extractRegions, getRegions, getTextForRegion, processTextPosition, removeRegion, setShouldSeparateByBeads, writePage
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeText, writeWordSeparator
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Field Details
-
CM_TO_PDF
public static final double CM_TO_PDFConversion value from CM to PDFBox values- See Also:
-
KARTEN_KONTO
Area-Name für das Kartenkonto- See Also:
-
KARTEN_INHABER
Area-Name für den Karteninhaber- See Also:
-
RECHNUNGS_DATUM
Area-Name für Rechungsdatum- See Also:
-
SALDO
Area-Name für Saldo- See Also:
-
BUCHUNGS_DATUM
Area-Name für Buchungsdatum- See Also:
-
KAUF_DATUM
Area-Name für das Kaufdatum- See Also:
-
BESCHREIBUNG
Area-Name für die Beschreibung- See Also:
-
FREMDWAEHRUNG
Area-Name für die Fremdwährung- See Also:
-
FREMDW_BETRAG
Area-Name für den Fremdwährungsbetrag- See Also:
-
KURS
Area-Name für denKurs- See Also:
-
BETRAG
Area-Name für den Betrag- See Also:
-
FAELLIG_BETRAG
Area-Name für den fälligen Betrag- See Also:
-
FAELLIG_DATUM
Area-Name für das Fälligkeitsdatum- See Also:
-
FAELLIG_IBAN
Area-Name für die Empfänger IBAN der Ausgleichsbuchung- See Also:
-
FAELLIG_BIC
Area-Name für die Empfänger BIC der Ausgleichsbuchung- See Also:
-
-
Constructor Details
-
AbstractBankPDFPage
Standard constructor calls super class.- Throws:
IOException
-
-
Method Details
-
getRectangleFrom
protected static Rectangle getRectangleFrom(double regionX, double regionY, double regionW, double regionH) Return the region asRectangle
. It converts centimetre into PDF measure- Parameters:
regionX
-regionY
-regionW
-regionH
-- Returns:
- a
Rectangle
for defined region
-
defineRegions
Defines the regions in a Map. The region rectangle values must able be converted to PDF measures. Following key needs to be defined in theMap
:
AbstractBankPDFPage.BUCHUNGS_DATUM
AbstractBankPDFPage.KAUF_DATUM
AbstractBankPDFPage.BESCHREIBUNG
AbstractBankPDFPage.FREMDW_BETRAG
AbstractBankPDFPage.FREMDWAEHRUNG
AbstractBankPDFPage.KURS
AbstractBankPDFPage.BETRAG
- Returns:
- a
Map
with the regions defined
-
getXPos
protected abstract double getXPos()Liefert die zu erwartete X-Position des Buchungsdaten- Returns:
- die zu erwartete X-Position des Buchungsdaten
-
setRegion
Add the region to this TextStripper, but only if it is notNULL
- Parameters:
regionName
- the name for the regionregions
- the rectangle defining the region
-
writeString
protected void writeString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) throws IOException Override the default functionality of PDFTextStripper.- Overrides:
writeString
in classorg.apache.pdfbox.text.PDFTextStripper
- Throws:
IOException
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) throws IOException - Overrides:
showGlyph
in classorg.apache.pdfbox.contentstream.PDFStreamEngine
- Throws:
IOException
-
computeFontHeight
- Throws:
IOException
-