Skip navigation.
KDE Developer's Journals

oever's blog

oever's picture

Alpha version of Office Viewer for Nokia N900 available

Today, Nokia released the first public version of the office document viewer for the Nokia N900 phone. It was uploaded to the Maemo repositories. This version supports text files, spreadsheets and presentations in OpenDocument format (ODF) and Microsoft Office formats. The viewer requires the latest update (PR1.1) to the N900 software. You can install 'Office Viewer' by adding the maemo-devel repository to your N900 catalogues:

Catalog name:
Maemo Extras-devel
Web address:
http://repository.maemo.org/extras-devel
Distribution:
fremantle
Components:
free

Then the application 'freoffice' will be available in the category 'Office'. The install is 9 megabytes.

With the viewer, you can open multiple files at once, open office documents from your e-mail, search in office files and copy and paste from your documents. A very nice feature is the ability to give presentations with the phone. Here are some screen shots of the viewer running on the N900.

Presentation Spreadsheet Text Document Overview

The code for this viewer is available in the KOffice repository. New releases of the viewer will be uploaded to the repository as KOffice progresses towards version 2.2.

The viewer has a simple user interface and responds quickly to user input such as page changing and scrolling.

oever's picture

Strigi 0.7.1

This is just a quick note to tell the world about the newest Strigi release. It has version number 0.7.1 and is the recommended Strigi version for use with KDE 4.4 and Nepomuk.

Go get it.

0.7.1
- Support more fields from ODF documents
- Improved skipping behavior on streams for large files.
- Added album art support.
- Added support for ID3v1 tags.
- Added MP3 stream metadata extraction, UTF-16 support in tags.
- Extended the range of metadata extracted by ID3 analyzer.
- Added a FLAC audio file analyzer.
- Significantly unbreak the PDF analyzer.
- Fix scanning trees where permissions are insufficient to read some parts
- Check for multithreaded version of libxml2
- Require newer CLucene version (0.9.21)

Join us in #strigi for comments and questions.

oever's picture

testing document conversion

Being able to properly read many different file formats is important for KOffice success. By 'read', I mean 'convert to ODF' because the conversion and reading is strictly separated in KOffice. KWord will convert a .doc file to a .odt file before loading it into the internal rendering and editing structure. There is even a nice separate program called 'koconverter' that can convert files on the command-line.

So far, there were no decent tests to avoid regressions in our filters. I have
written a small framework (well, a shell script, but framework sounds better) that makes it simple to write tests. There are a number of tests there now for converting ppt files, but it would be great to have them for other input formats too. And here is where I hope you will help. All you need is a small input file that highlights a feature or problem and a small XSL file. The XSL file contains the test.

Look at this small example. Suppose you have a file, it can be a .doc, .docx or another office format. The file contains only one image and you want to have an automated test to verify that the ODF that is created also has one image. The following XSL file tests this:

<?xml version="1.0" encoding="UTF-8"?>
<x:stylesheet
   xmlns:d="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
   xmlns:x="http://www.w3.org/1999/XSL/Transform" version="1.0"
>
<x:template match="/">
  <x:if test="count(//d:image) != 1">
   <x:message terminate="yes">
    Error: there should be exactly one image.
   </x:message>
  </x:if>
</x:template>
</x:stylesheet>

If the number of image elements is not exactly one, the XSL transformation will abort with an error message.

So you see that the framework is written in such a way that writing tests is easy and fast.
When reporting a bug in KOffice or koconverter you can help a lot by writing an XSL for our automated
tests. You will see that this will speed up fixing the bug and it will help
avoid regressions.

This way of testing is a bit unconventional: these are not unit tests but overall
tests. Files are converted to ODF and the output file is checked. Not a small
part is tested but the complete conversion is tested. A benefit is that the tests are independent of the programs doing the conversion. We just check the result. So the same method could be used on any programs that write out ODF files.

Here is how our tests in KOffice work. First we convert the input file to ODF with
koconverter. An ODF is a zip file with many files and we usually want to check
the content of the XML files. So after conversion with koconverter, the ODF file
is uncompressed. Then an XSL transformation is run on the file content.xml.

In XSL on can report errors like this:

  <x:if test="string($style/s:graphic-properties/@d:fill-color) != '#bbe0e3'">
    <x:message terminate="yes">
      Error: draw:fill-color of the second frame should be '#bbe0e3'.
    </x:message>
  </x:if>

(You see that XML does not have to be too verbose.) The prefixes x: and s: in
this snippet stand for http://www.w3.org/1999/XSL/Transform and
urn:oasis:names:tc:opendocument:xmlns:style:1.0 respectively.
The test checks if the fill-color for a particular part of the output document
has the correct value. If not an error message is printed and the
transformation stopped.

You can replay this example by checking out the tests:

  svn checkout svn://anonsvn.kde.org/home/kde/trunk/tests/kofficetests/
  cd import/powerpoint
  make test

That was the overview of how the tests work. Now let us look into one more complicated test. It has two files: background.ppt and background.xsl. background.ppt is the input file and background.xsl
is the transformation that verifies the output of the transformation.

The file background.ppt has two frames, one of which must have a light blue
(#bbe0e3) background. At the moment the frame gets a background color, but it
is wrong. So when fixing this bug we first formulate what we want the result to be by
writing an XSL file.

One XSL file can contain multiple tests. This test is called
testSolidBackground:

  <x:template name="testSolidBackground">

We assign the second frame in content.xml to a variable:
  <x:variable name="frame"
    select="o:body/o:presentation/d:page/d:frame[position()=2]"/>

Now we find the name of the style for this frame:
  <x:variable name="stylename" select="$frame/@p:style-name"/>

And find the style with that name:
  <x:variable name="style"
    select="o:automatic-styles/s:style[@s:name=$stylename]"/>

Now we do a sanity check: do we even have a second frame?
  <x:if test="count($frame) != 1">
    <x:message terminate="yes">
      Error: there is no second frame on the first slide.
    </x:message>
  </x:if>

And do we even have a style?
  <x:if test="count($style) != 1">
    <x:message terminate="yes">
      Error: there is no style for the second frame.
    </x:message>
  </x:if>

Now we test if the background is 'solid':
  <x:if test="string($style/s:graphic-properties/@d:fill) != 'solid'">
    <x:message terminate="yes">
      Error: draw:style of the second frame should be solid.
    </x:message>
  </x:if>

And we check the color:
  <x:if test="string($style/s:graphic-properties/@d:fill-color) != '#bbe0e3'">
    <x:message terminate="yes">
      Error: draw:fill-color of the second frame should be '#bbe0e3'.
    </x:message>
  </x:if>

That is all there is to it! Learning XSL if you do not know it yet is some
effort but one that will pay off. Once you have the XSL you can run 'make test'
while fixing the bug. This will call the test for you which has as side-effect
that the conversion is run and the odf file unpacked.

I hope you all will start using this method for reporting and fixing filter bugs. I stop by starting you off with some links to XSL and XPath.

oever's picture

Getting an energy efficient small server

For mirroring my backup drive, central data store for devices, music playing and a webserver for experiments, I'd like to run a small server at home. I want this server to be energy efficient, easy to modify, robust, silent and run customizable free software. It should have at least 500 GB of storage, but 1 or 1.5 TB is better. You can buy very low-energy computers such as the Fit-PC 2 (6 watt) or the Linutop 2 (8 watt). Energy costs for machines that run constantly can be roughly estimated by doubling the power draw in watt, so running a device that uses 8 watt constantly costs about 16 euro a year.

Until recently the computer I used most was a Dell X1 Latitude laptop. That machine is now 4.5 years old. At the time, I chose it because it is a laptop with no fan and hence very silent. It is still better than any atom based netbook. So I would like to use this laptop as a server. UPS and screen are integrated which is a nice plus. The machine has a 1.8" disk built in. It is not possible to replace it with a disk of at least 500 GB. I wanted to know the energy cost of adding more storage to the X1. So I did some power measurements with an 2.5" external disk (Toshiba, 160 GB) and a 3.5" external disk (TrekStore 500GB). I measured on my current main laptop, a Lenovo X200s too.

Lenovo X220s (console, idle, low brightness unless otherwise specified)
Adapter only: 6 W
Console, low brightness: 19 W
Console, high brightness: 21 W
100% cpu and high brightness: 40 W
mounted 2.5" disk: 24 W
active (dd) 2.5" disk: 28 W
mounted 3.5" disk: 37 W
active (dd) 3.5" disk: 41W

Dell Latitude X1 (console, idle, low brightness unless otherwise specified)
Adapter only: 0 W
Console, low brightness: 15 W
Console, high brightness: 19 W
100% cpu and high brightness: 23 W
mounted 2.5" disk: 17 W
active (dd) 2.5" disk: 21 W
mounted 3.5" disk: 32 W
active (dd) 3.5" disk: 37 W

The 2.5" disk uses USB for power. The 3.5" disk has a separate adapter which is included in the power measurements. The device used for the power measurements is a DEM1379.
The idle 3.5" drive uses 13-15 more watt and the active drive uses 13-16 more watt. The difference is as large as power usage of the entire server. So I am now wondering if there are more energy efficient external 3.5" drives.

oever's picture

Good karma

This weekend I visited my parents in law, because my wifes paternal grandmother celebrated her 90th birthday. I noticed that the laptop they use was still running Kubuntu Feisty with OpenOffice 2.2. On this machine, reading emails, managing photos, surfing the internet and working on office documents are most important. Digikam is used for photos. Kmail and konqueror from KDE 3.5 are installed and a mix of OpenOffice and Microsoft Office 97 on wine is in use for editing office documents. in short, a horribly outdated setup of more than two years old. IT is still moving fast. Feisty was not a long term release and no updates for it anymore.

So in a slightly reckless move I decided to update the machine to the next Kubuntu: karmic koala. This meant going to KDE 4.3. To my relief the install went very well. All important settings for digikam and kmail were migrated automatically. Dolphin is really nice and more intuitive for non-professional users. The kwin effects add a nice touch of class (translucent wobbly windows). Plasmoids on the desktop (photo frames and weather forcast) were very well received.

In short: good karma! Thank you very much, Kubuntu team.

(my last two blogs were written on the Nokia N900 which has a good keyboard)

oever's picture

Printing photo albums

One important feature for photo management is missing in the FOSS world:an application for creating photo albums that can be sent away for printing at a printing service. There is however a pretty slick closed source application that works on linux. It can be fiound at for example Pixum (also in.nl and .de). It is based on Qt 4.4 and installs using a perl script which downloads the artwork and the required libraries. The application is customized for different printing companies that have these customized downloads available from their website. Not all of them
offer the linux or even the macintosh version. This is a shame and probably done to limit the number of different questions users might have. A standard for these photo album ordering services would be great, but I'm not holding my breath and will recommend Pixum for now.

oever's picture

Strigi partial port to javascript

You may remember two of my recent blogs. One was about a project to parse powerpoint files and another one was about porting hexdump to the browser.

So how about a combination of those two topics: parsing powerpoint files in the browser. It is quite a feasible task. The powerpoint file format is largely described in an xml schema now. From this scheme one would need to generate a parser like there is for c++ and java already. The parsers for java and c++ are both less then 700 lines of code.

We have not reached that stage yet and I do not have time to implement a powerpoint parser in javascript soon. I have written some requirements for it. To parse the individual data streams in a ppt file, one must parse the OLE2 file format. Currently we use pole for this in c++ and poifs in java. Now I could port either of these libraries to javascript, but there is another nice OLE parser: strigi.

In Strigi, the OLE file format is treated like other container formats such as zip, tar and mime. Porting parts of Strigi to javascript seemed like an interesting challenge. In Strigi, we use low level c++ to ensure speed. Most of the techniques used in the c++ are not available in javascript. So the javascript version is bound to be much slower. Still, I was curious what Strigi would look like in javascript.

And now it is ready. The parts required for reading OLE files have been ported. The result is one html page of 600 lines. It can read ppt files and list the streams in there. When clicking the streams, you see the stream in 'hexdump' style display. The speed is not even that bad. It takes about a second to parse a megabyte of file.

enjoy the demo!
(firefox 3.5 or recent webkit browser required)

oever's picture

Sensors in the N900

Nokia has been kind enough for lending me an awesome N900. This will allow me to test KOffice on the phone. Document loading, parsing and scrolling speed could do with improvements.

Apart from using the N900 for serious things, I've also done a bit of playing with it. Qt has a famous OpenGL demo that shows the Qt logo in a QGLWidget. Instead of controlling the rotation of the object with the scrollbars, the adapted version uses the accelerometers in the device to move the logo.

This was a simple adaptation: reading the accelerometers is simple:

Nokia-N900-41-10:~# cat /sys/class/i2c-adapter/i2c-3/3-001d/coord
36 -18 -1134

The application source and a debian package for the N900 are now available.

Here is a screenshot of five running instances in the application overview. The five logos all move if you move the phone.
N900 running qtup

Perhaps a Qt on Maemo guru can adapt this program to be a desktop widget with a transparent background.

oever's picture

hexdump in the browser

This morning, I thought: why is XMLHttpReq for xml and text but not for binary files? It turns out you can use it for binary files, but not in each browser and only for remote files. I've written a small implementation of hexdump. It loads a binary file like this:

function load_binary_resource(url) {  
  // this works in firefox, chromium and arora,
  // but binary files are read only partially in konqueror and opera
  var req = new XMLHttpRequest();  
  req.open('GET', url, false);  
  req.overrideMimeType('text/plain; charset=x-user-defined');  
  req.send(null);
  return req.responseText;  
}

Then you can access each byte with code like this:

  var data = load_binary_resource(file);
  var byte5 = data.charCodeAt(5) & 0xff;

As described here.
With these techniques I made hexdump in the browser.

oever's picture

Pleasantly Producing PowerPoint Parsers

KOffice has the potential to be a widely used office suite. One of the requirements for user adoption is good support for popular file formats and most presentations are available as Powerpoint presentations. KOffice uses ODF as native format. There is an import filter for PowerPoint presentations in KOffice which is currently incomplete. At KO, we are working to improve this situation.

To convert data from one file format to another, you have to understand both formats. ODF is an open standard and rather well documented. Since about a year, Microsoft has, after significant political pressure, put documentation for their file formats on-line. In the header of their documentation, permission is granted to use the documentation to develop software:

Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you may make copies of it in order to develop implementations of the technologies described in the Open Specifications and may distribute portions of it in your implementations using these technologies or your documentation as necessary to properly document the implementation. You may also distribute in your implementation, with or without modification, any schema, IDL‘s, or code samples that are included in the documentation.

This documentation is available as PDF files. The file describing PPT is 663 pages, the one describing drawings, which are an essential part of presentations, is 620 pages. To implement a parser for all of that is a lot of work. It is an exercise that would have to be undertaken for each language in which one would want to parse these files.

It is easier to convert the documentation to a computer readable format and generate parsers for different situations from that. This is now being done in msoscheme. It comes with a big file called mso.xml which already contains a very large part of the documentation. From this file, a C++ and a Java parser are generated (Java, C++). Both parsers can deserialize ppt files to a runtime representation that can be the start for conversion to e.g. ODF.

A small Qt program called ppttoxml can convert a ppt to an XML representation. This XML representation is easy to read and understand and therefor very helpful for us in improving our current Powerpoint filter.

It would be great to get people from other projects that want to read ppt files on board. It does not matter what programming language or languages you use. You can write a parser generator in less than 700 lines of code.

Here is are the commands you need to see what a ppt file looks like on the inside:

git clone git://gitorious.org/msoscheme/msoscheme.git
mkdir msoscheme/cpp/build
cd msoscheme/cpp/build
cmake ..
make
./ppttoxml myfile.ppt myfile.xml
Syndicate content