The goal of my current project is to save documents forever. Yes, forever! I know that can sound a bit silly, but the idea really is that these documents should be preserved (and be usable) for generations to come.
So first, a look back at recent history... Think about a document created in 1991, about twenty years ago. There's a very good chance it was made in WordPerfect 5.1. The old-timers reading this blog will no doubt have fond memories of that one... But back to today, how on earth are we going to open that file? Well it's Google to the rescue!
But I guess you do get my point.
If we've learned anything in the last twenty years, it's that predicting the future of computing is risky business. Here are a couple of Bill Gates gems that illustrate this rather well:
- 640K ought to be enough for anybody (1981)
- I see little commercial potential for the internet for the next 10 years (1994)
So we're looking at:
These requirements mean that most document formats are eliminated right away. These include any sort of 'proprietary' formats like MS Word and OpenOffice. XML on the other hand could have been an option, but unfortunately, with XML you loose some important features like page layouts and image data.
PDF or Portable Document Format seems like it could be the right choice. But then you run into the problem of all the different PDF versions...
Fortunately, research on the subject reveals that this has been addressed!
For long-term preservation of documents PDF/A (where A = Archive) is the solution! No chance of hidden royalty-fees or obscure pending patents. And internationally, it looks like most governments are heading in this direction.
So while PDF/A is the way to go if you want to preserve documents for the next generation, note that there are also different versions of PDF/A in existence. For instance PDF/A-1 is based on PDF 1.4 and consists of two compliance levels: