Class AbstractBankPDFPage
- java.lang.Object
-
- org.apache.pdfbox.contentstream.PDFStreamEngine
-
- org.apache.pdfbox.text.PDFTextStripper
-
- org.apache.pdfbox.text.PDFTextStripperByArea
-
- de.frankmuenster.mahoe.pdfextractor.AbstractBankPDFPage
-
- Direct Known Subclasses:
AbstractBankPDFPageFirst
,SantanderPdfPageEven
,SantanderPdfPageOdd
,TargoBankPdfPageEven
,TargoBankPdfPageOdd
public abstract class AbstractBankPDFPage extends org.apache.pdfbox.text.PDFTextStripperByArea
Extended PDFTextStripperByArea class to add positions of each word to the result string.- Author:
- Frank Münster
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
BESCHREIBUNG
Area-Name für die Beschreibungstatic java.lang.String
BETRAG
Area-Name für den Betragstatic java.lang.String
BUCHUNGS_DATUM
Area-Name für Buchungsdatumstatic double
CM_TO_PDF
Conversion value from CM to PDFBox valuesstatic java.lang.String
FAELLIG_BETRAG
Area-Name für den fälligen Betragstatic java.lang.String
FAELLIG_BIC
Area-Name für die Empfänger BIC der Ausgleichsbuchungstatic java.lang.String
FAELLIG_DATUM
Area-Name für das Fälligkeitsdatumstatic java.lang.String
FAELLIG_IBAN
Area-Name für die Empfänger IBAN der Ausgleichsbuchungstatic java.lang.String
FREMDW_BETRAG
Area-Name für den Fremdwährungsbetragstatic java.lang.String
FREMDWAEHRUNG
Area-Name für die Fremdwährungstatic java.lang.String
KARTEN_INHABER
Area-Name für den Karteninhaberstatic java.lang.String
KARTEN_KONTO
Area-Name für das Kartenkontostatic java.lang.String
KAUF_DATUM
Area-Name für das Kaufdatumstatic java.lang.String
KURS
Area-Name für denKursstatic java.lang.String
RECHNUNGS_DATUM
Area-Name für Rechungsdatumstatic java.lang.String
SALDO
Area-Name für Saldo
-
Constructor Summary
Constructors Modifier Constructor Description protected
AbstractBankPDFPage()
Standard constructor calls super class.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected float
computeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0)
protected abstract java.util.Map<java.lang.String,java.awt.Rectangle>
defineRegions()
Defines the regions in a Map.protected static java.awt.Rectangle
getRectangleFrom(double regionX, double regionY, double regionW, double regionH)
Return the region asRectangle
.protected abstract double
getXPos()
Liefert die zu erwartete X-Position des Buchungsdatenprotected void
setRegion(java.lang.String regionName, java.util.Map<java.lang.String,java.awt.Rectangle> regions)
Add the region to this TextStripper, but only if it is notNULL
protected void
showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, java.lang.String arg3, org.apache.pdfbox.util.Vector arg4)
protected void
writeString(java.lang.String text, java.util.List<org.apache.pdfbox.text.TextPosition> textPositions)
Override the default functionality of PDFTextStripper.-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea
addRegion, extractRegions, getRegions, getTextForRegion, processTextPosition, removeRegion, setShouldSeparateByBeads, writePage
-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeText, writeWordSeparator
-
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
-
-
-
Field Detail
-
CM_TO_PDF
public static final double CM_TO_PDF
Conversion value from CM to PDFBox values- See Also:
- Constant Field Values
-
KARTEN_KONTO
public static final java.lang.String KARTEN_KONTO
Area-Name für das Kartenkonto- See Also:
- Constant Field Values
-
KARTEN_INHABER
public static final java.lang.String KARTEN_INHABER
Area-Name für den Karteninhaber- See Also:
- Constant Field Values
-
RECHNUNGS_DATUM
public static final java.lang.String RECHNUNGS_DATUM
Area-Name für Rechungsdatum- See Also:
- Constant Field Values
-
SALDO
public static final java.lang.String SALDO
Area-Name für Saldo- See Also:
- Constant Field Values
-
BUCHUNGS_DATUM
public static final java.lang.String BUCHUNGS_DATUM
Area-Name für Buchungsdatum- See Also:
- Constant Field Values
-
KAUF_DATUM
public static final java.lang.String KAUF_DATUM
Area-Name für das Kaufdatum- See Also:
- Constant Field Values
-
BESCHREIBUNG
public static final java.lang.String BESCHREIBUNG
Area-Name für die Beschreibung- See Also:
- Constant Field Values
-
FREMDWAEHRUNG
public static final java.lang.String FREMDWAEHRUNG
Area-Name für die Fremdwährung- See Also:
- Constant Field Values
-
FREMDW_BETRAG
public static final java.lang.String FREMDW_BETRAG
Area-Name für den Fremdwährungsbetrag- See Also:
- Constant Field Values
-
KURS
public static final java.lang.String KURS
Area-Name für denKurs- See Also:
- Constant Field Values
-
BETRAG
public static final java.lang.String BETRAG
Area-Name für den Betrag- See Also:
- Constant Field Values
-
FAELLIG_BETRAG
public static final java.lang.String FAELLIG_BETRAG
Area-Name für den fälligen Betrag- See Also:
- Constant Field Values
-
FAELLIG_DATUM
public static final java.lang.String FAELLIG_DATUM
Area-Name für das Fälligkeitsdatum- See Also:
- Constant Field Values
-
FAELLIG_IBAN
public static final java.lang.String FAELLIG_IBAN
Area-Name für die Empfänger IBAN der Ausgleichsbuchung- See Also:
- Constant Field Values
-
FAELLIG_BIC
public static final java.lang.String FAELLIG_BIC
Area-Name für die Empfänger BIC der Ausgleichsbuchung- See Also:
- Constant Field Values
-
-
Method Detail
-
getRectangleFrom
protected static java.awt.Rectangle getRectangleFrom(double regionX, double regionY, double regionW, double regionH)
Return the region asRectangle
. It converts centimetre into PDF measure- Parameters:
regionX
-regionY
-regionW
-regionH
-- Returns:
- a
Rectangle
for defined region
-
defineRegions
protected abstract java.util.Map<java.lang.String,java.awt.Rectangle> defineRegions()
Defines the regions in a Map. The region rectangle values must able be converted to PDF measures. Following key needs to be defined in theMap
:
AbstractBankPDFPage.BUCHUNGS_DATUM
AbstractBankPDFPage.KAUF_DATUM
AbstractBankPDFPage.BESCHREIBUNG
AbstractBankPDFPage.FREMDW_BETRAG
AbstractBankPDFPage.FREMDWAEHRUNG
AbstractBankPDFPage.KURS
AbstractBankPDFPage.BETRAG
- Returns:
- a
Map
with the regions defined
-
getXPos
protected abstract double getXPos()
Liefert die zu erwartete X-Position des Buchungsdaten- Returns:
- die zu erwartete X-Position des Buchungsdaten
-
setRegion
protected final void setRegion(java.lang.String regionName, java.util.Map<java.lang.String,java.awt.Rectangle> regions)
Add the region to this TextStripper, but only if it is notNULL
- Parameters:
regionName
- the name for the regionregions
- the rectangle defining the region
-
writeString
protected void writeString(java.lang.String text, java.util.List<org.apache.pdfbox.text.TextPosition> textPositions) throws java.io.IOException
Override the default functionality of PDFTextStripper.- Overrides:
writeString
in classorg.apache.pdfbox.text.PDFTextStripper
- Throws:
java.io.IOException
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, java.lang.String arg3, org.apache.pdfbox.util.Vector arg4) throws java.io.IOException
- Overrides:
showGlyph
in classorg.apache.pdfbox.contentstream.PDFStreamEngine
- Throws:
java.io.IOException
-
computeFontHeight
protected float computeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0) throws java.io.IOException
- Throws:
java.io.IOException
-
-