Skip navigation.
KDE Developer's Journals

Who cares about document formats?

richard dale's picture

I've loved reading the articles about whether or not the Microsoft OOXML document format should be an ISO standard, as opposed to the ODF ISO standard for word processing documents. In particular, Miguel de Icaza's heroic defence of his position against over 500 rabid anti-microsoft Slashdot posters. I admire someone who can think for themselves against entrenched opposition (eg Richard Stallman or Miguel de Icaza), and I don't actually care whether or not I agree with them or not.

But I wonder if people will really care much about these document formats in 50 years. It seems to be an extension of typewritten 20th century documents to me, when it used to be important whether or not this or that sentence appeared in bold/underline format. It doesn't have anything to do with semantics or machine readability of the document. One of my pet hates are word processors, and the culture of sending word processor documents as enclosures in emails, as opposed to putting them online on a wiki or similar, it just really seems out of date, and pisses me off. So why not base your document future on semantic web ideas like RDF, instead of these short lived presentation oriented formats? Instead of bold or italic styles we shouldn't we be more concerned about the foaf (Friend of a Friend) or other ontologies that the document contains?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ants.aasma's picture

Who cares about semantic markup?

People write documents to communicate with other people, not computers. They usually don't want to be bothered with formalizing all the semantic meaning in their writing, especially when there is no good way to convey that meaning to the person consuming that writing. This is not taking into account the fact that ontologies build up a whole separate language that the person writing has to learn, and that language doesn't usually (somewhat on purpose) have ways to convey all the nuances, uncertainties and ambiguity of natural language. People want to use rich text because it allows them to embed more meaning into their writing. By forcing them into the framework of semantic markup you in fact force them to dumb down their writing to the level that computers could understand at least a little. I'd say the correct way would be to make computers understand humans better, not teach humans to simplify things for computers.

As for embedding word processor documents in emails and keeping most information more private than is called for, you are right - that should be a crime.

richard dale's picture

Re: Who cares about semantic markup?

Yes, I agree that it will put them off, if you expect people do have to go through extra steps to annotate their documents with semantic info. But suitable tools can be designed to minimize the effort. For instance, the DBpedia RDF data base is derived from Wikipedia without the authors needing to do anything special.

Word processing documents don't even contain hyper-links, and so it is not just super sophisticated embedded semantic RDF tags that are missing. Once you get used to expecting links in every document you read, it seems odd when they're missing. The UK Sunday Independent even included 'fake hyperlinks' when it is was recently redesigned - although they were not very well received. I think that indicates the way people's expectations of printed media are being changed by their use of the internet.

fprog26's picture

format

well, instead of ODF and OOXML, if they decided to save files in MHTML or similar
like in the good old save web format of office 2000, that would have been a better
start then learning yet another XML format...

For those who do not get the point, see PrinceXML:
http://www.princexml.com/

They use good old HTML+SVG+MathML+CSS print to generate books!

aseigo's picture

yes, we will care. we already do.

> But I wonder if people will really care much
> about these document formats in 50 years.

do we care if we can read documents from 50, 100, 200 or 500 years ago? of course we do. even if the "we" is largely historians, biographers, geneologists, etc. that is important knowledge.

we already have blind spots in our data where companies have ended up with files they couldn't open. i had a friend who used to make the Big Bucks reverse engineering such formats for companies and then transferring the rescued data to new formats they could open.

> sending word processor documents as
> enclosures in emails

agreed. remember, however, that ODF is about much more than word processing documents and even used only for the Right Purposes we'll still have word processing documents.

> future on semantic web ideas like RDF,
> instead of these short lived presentation
> oriented formats

a) i think that's why these formats are going towards XML
b) because people need presentation at some level
c) it's not either/or, it's both/and imho. we need both and they all need to be open, well documented and have open implementations.

richard dale's picture

Re: yes, we will care. we already do.

do we care if we can read documents from 50, 100, 200 or 500 years ago? of course we do. even if the "we" is largely historians, biographers, geneologists, etc. that is important knowledge.

Of course we care. The title of the blog is a bit misleading - what I really meant was 'who cares about word processing document formats, as opposed to world wide web document formats'.

On the web, it is important to be able separate content from presentation, and to be able to add semantic mark up. Those are exactly the qualities you need for long lived document content.

How long will it be when there is no such thing as a document which isn't on the web? What will be the point then of document formats designed primarily for offline reading?

zander's picture

that means; what is a document format?

The funny thing here is, the OpenDocument format is actually quite a bit more then just a format to have bold/italic/columns/formulas. It actually has a lot of the features that a 'web document' (whatever that is) should have. You can add loads of metadata, you can link, and you can embed and/or link to foreign data.

The only difference with html or rdf is that ODF actually has a lot of features for high quality typesetting as well.

Bottom line is that I'm confused by your splitting the arena of documents based on which protocol you use to load/save them.
ODF (not OOXML) has all the features of separating content from presentations (styles), semantic marking (metadata), interdocument-linking, svg etc etc etc. So it would do very well in your enviroment of the future.

richard dale's picture

Re: that means; what is a document format?

I've been reading about ODF here and I think you have a point. When I think 'word processing document', it means a 20th century 'data silo' to me. But ODF has xlinks for hyper-links, and it seems it isn't too hard to write a ODF to XHTML converter. So as you say above "..it would do very well in your environment of the future", yes indeed it might. The only thing that would be nice would be for the RDFa specification to be extended so that you could embed non-human readable semantic data in the documents. So as well as having meta-data that describes the entire document, you could embed machine readable semantics it to give you a future proof 'content + presentation + semantic markup' web capable combination.

On the other hand, when I read more about OOXML, I think it does have all the characteristics of a dead legacy format, even if well specified.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.