Monday 9 September 2013

How to extract text per column (table) with itextsharp

How to extract text per column (table) with itextsharp

I using itextsharp to extract the text from a PDF file.
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader,
pageNumber, strategy);
currentText =
Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default,
Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
return currentText;
With that code I could obtain all information in plain text, that's great.
But, when I try to process it, I found that I can't know where a specific
column start and when ends.
This is the example table example:

After extracting the text I obtain something like this
Name
Details
Note
MYNAME
THIS ARE THE
DETAILS
HERE YOU CAN
FIND A NOTE
As you can see, It is hard to know where a column information start and
where ends...
¿Any idea?
Thanks a lot

No comments:

Post a Comment