Choice of Document Formats

About this Document

This is an overview of popular file formats, which is derived from a reply I wrote to a thread in the Boston Perl Mongers mailing list.

Document Information

Written By:
Shlomi Fish
Finish Date:
Last Updated:


Creative Commons License

This document is Copyright by Shlomi Fish, 2006, and is available under the terms of the Creative Commons Attribution License 3.0 Unported (or at your option any later version of that licence).

For securing additional rights, please contact Shlomi Fish and see the explicit requirements that are being spelt from abiding by that licence.

The Article Itself

A previous correspondent wrote:

Write it in POD?

I’m not aware of any POD based Wikis, but it doesn’t seem like it would be hard to merge the two approaches, with a “traditional” web-facing wiki front-end that stores things as a POD-like syntax on the back.

This way, you get the collaborative editing and there are already tools out there to convert the POD source to PDF etc.

I think Kwiki has a plugin for POD (Perl’s so-called “Plain, Old, Documentation”).

Just a note about POD: POD is incredibly limited. Some things that you may want to try to do with it are not possible. It is not the only generic format available, however. One option is naturally DocBook/XML, which can be translated into HTML as well as PDF, Microsoft Word, LaTeX and other formats. It cannot be directly translated to plain text, but can through an intermediate format. POD can be translated into DocBook/XML using Pod-DocBook. (Recent Update: Pod::PseudoPod::DocBook may be a better choice.).

Don’t use the original module by Alligator Descartes which is the still the default on CPAN out of being a Dead Camel. It is old and broken and has been unmaintained for a long time.

Note that the DocBook generated may not be perfectly semantically-correct due to the fact DocBook is richer than POD.

Other alternatives for such markups that are somewhat text-with-brief-style-specifiers can be found in this Linux-elitists thread called “mini-markup language?”. Prominent examples include AsciiDoc which can be converted to HTML, XHTML and DocBook/XML, and Markdown which can only be converted to XHTML.

They all can be converted to HTML and some of them to DocBook too. One Wiki or another is also an option, but note that they tend to have incompatible formats, and some may not have an ability to export as DocBook. I like the MediaWiki format which is an extension of that of UseModWiki (and its Oddmuse Wiki fork, which should be better.), but I think that DokuWiki’s format is also quite good. I really dislike the default Kwiki format, and despite all the flood of Kwiki plugins, no-one has written a UseModWiki/Oddmuse/MediaWiki-subset format for it yet. I keep intending to do that, but I have not find the time for this yet.

You can also try to use XHTML 1.1 with semantic markup of elements for use as a good generic markup.

All that put aside, I should note that if you are thinking about using TeX or LaTeX, please re-consider. Tex/LaTeX are very convenient for generating PostScript or PDF but:

  1. The only thing that can understand TeX is tex. I believe it was said much earlier than when Tom Christiansen ported it to the Perl world. It is in fact much more true for TeX than it is for Perl.
  2. Conversion of LaTeX to DocBook or HTML often doesn’t work quite well. Often, the tools are outdated and generate old or invalid HTML, and often they break on more than complex LaTeX. TeX and LaTeX are Turing-complete, and the syntax is incredibly problematic.
  3. LaTeX has poor support for hypertext, and other PDF niceties.
  4. PDF and PostScript, which are the default-and-least-error-prone TeX formats, have relatively poor accessibility and internationalisation. For example, from my understanding Bi-directional text (mixed Arabic-English text, etc.) is rendered visually.
  5. It is easier to convert semantic XHTML or DocBook/XML to LaTeX than the other way around.

LaTeX is much less verbose than DocBook/XML, but I think you can find a less problematic format. It is is still excellent for writing texts with lots of mathematical formulae, but still a very problematic format. When working with LaTeX I often get obscure TeX errors that I can’t tell immediately what exactly went wrong. In DocBook/XML it just reports that one tag is missing, or that the order of tags are incorrect, which takes me much less time to solve.

Going full circle now - POD is a good option if it does what you need. The Camel Book and other perl books were written in POD. I wrote some documentation for Perl and non-Perl projects in POD. I also write all my man pages in POD because nroff scares me.

But if you feel that you want something better, you have many options.

One final note is that DocBook/XML was problematic for using in bi-directional texts because of implementation or standard problems, last time I checked. Otherwise, its Unicode support should be very good.

Coverage and Comments

TODO: Fill in.