SiteKickr Web Development

Parsing the PDF

I had to pull some data from a source, to create a Google map today. But, the only format the address data was in was a three-column PDF.

As I toiled over the best way to go about parsing this data, I remembered the good old days of RTF files, and how life was so simple back then!

The RTF option is a prime candidate for getting PDF data into a usable format. I proceeded as follows:

  1. Copy all PDF data and paste it into Windows Wordpad
  2. Save the file as an RTF
  3. Create a script, in programing language of choice
    1. Read the RTF into a variable
    2. Use the carriage return character to split all lines into an Array or list
    3. Iterate over the list, parsing RTF codes from each line, and using them to identify the pieces of information you need
    4. Output the data
  4. In my case, I needed to geocode a bunch of addresses, so I copy/pasted the entire contents of my output over to http://www.findlatitudeandlongitude.com/batch-geocode/