Skip navigation.
KDE Developer's Journals

What do users want?

tjansen's picture

The main problem is to find out what users want. Did you know whether users will like Expose or they will use KXmlRpc before they have been implemented? I don't think so. Did users ask for Expose? Unlikely. Can we know whether WinFS's way of searching files with meta-data queries will replace today's file systems? No. All I know is that it sounds useful, and that if it is a success KDE will look pretty outdated with its file dialogs.

Unless you limit yourself to cloning proven concepts, you will never know. Every feature comes with a risk, the risk that users won't accept it. There's nothing you can do about it, unless you refuse to innovate. And that comes with another risk, the risk that the other product is superior. In the end users don't care whether the competition has 9 ideas that failed miserably as long as there is one really useful feature. Did they use more resources while finding out which idea was the good one? Sure. But which is the better product? The one that contains 9 things that nobody uses and one brilliant feature, or the product that lacks all 10?

Basing the software on demand is what the market is doing anyway, for free software as well as proprietary software. You write software, you release it and and hope that people like it. If people buy/use software it will continue to live. If not it will just go away, sooner or later. The question is how large is the risk that you are willing to take. Implementing a feature that no one has done before has a high risk, it's easy to fail and lose all the time/money that you have spent. But the gain is high, you could have written a brilliant feature that stands out against the competition. Cloning a feature that the competition already has and all your users want is low risk. You will make your users happier than they would be without it, but it doesn't make your product better than the competition.

So far free software had mostly success in cloning features from proprietary systems and making them better, more reliable, maybe faster. But if you look at the most successful projects, their competition stood still. Linux was able to succeed with its strategy because the lack of Unix progress in the last 10 years. Since Apache destroyed the competition neither Apache nor any other web server added any noticable improvement. SQL databases also had no significant progress (if you ignore high-end stuff like clustering), so it was easy for MySQL and others to grab a relatively large share. Browsers stood almost still since the IE 4.x days, which made it was quite easy for Konqueror to compete. All these products went a low risk way. Their technical archievement may be high, but it was clear from the beginning that the end result would be useful.

This is also a reason for the fast, perceived progress of free software. We can laugh about Microsoft's Bob project, or the paper clip. But only because they have tested for us whether it works, and we have learned from their mistakes. That's the comfort that you have when you are picking up the ideas that others have tested.

But the next challenge is competing with those systems that are making progress. Most people will not stop using proprietary systems as long as the free alternatives are not noticably better. Even when both are comparable in functionality inertia will win over price and freedom in most cases. And to get better, free software needs many new ideas. And there will certainly be many flops among them, and many people will lose their time by writing them. But these failure are needed for progress, to find the good ideas.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
cmiramon's picture

Metadata and innovation

> Can we know whether WinFS’s way of searching files with meta-data
> queries will replace today’s file systems? No. All I know is that
> it sounds useful, and that if it is a success KDE will look pretty > outdated with its file dialogs.

Personally, based on my experience, I doubt that keyword queries for files will ever be a must-have feature. I've been using metadata
for 10 years for my research work (I'm a medieval historian) taking
lecture notes or excerpts from medieval sources in MsWord and filling conscienciously the keywords, author, etc. fields for each file.

Do these keywords help me finding back information? No. My way of indexing has changed a lot during these 10 years and it is always much faster to do a full-text search or grep in Linux to refresh my memory than try to figure which keyword may correspond to my quest. For files where you can extract automatically metadata like pictures or mp3, there are already standalone applications like juk, ACDsee, etc. Integrating it in the OS, is a marginal improvement. Google
(improved full-text search) killed Yahoo (metadata based) and good
full-text search is the way to go.

By the way, why to search my home directory I'm obliged to open a Konsole and fiddle with grep when searching on the Internet is so simple Alt-F2 gg:"Look this string". Who is the developer, who will implement my wish Alt-F2 grep:"Look this string" with the result in a nice window where I can sort the result chronologically, by mime type, etc... Why clone Microsoft when there is a lot to be cloned in
Google ?

If somebody wan't to do something 'Kool' with Reiser4 and KDE, it would be to use the concept of a file is a directory to add the possibility to stick to any file, annotations and diff. If for example every file in my /etc is a directory, I could have /etc/hosts would contain no only the configuration file but /etc/hosts/annotate /etc/hosts/diff /etc/hosts/help /etc/hosts/backup_this_important_file Integrating that in the KDE GUI, is a nice, innovative task.

luke chatburn's picture

I utterly agree

I completely agree.... Metadata searching is not the way to go. It's a waste of the user's time, and most of all, it is fairly performance-crippling.

Honestly, I don't believe that detailed searching is the way to go as a whole. We're better off aiding the user in creating good file structures and making those the implicit backbone of a search.

Users rarely want to search through all their files for something. They usually have a good idea where it is, and a fair idea of what they called it. Looking in a directory is enough, and it has benefits in locational association with other files.

In my opinion, time is better spent adding non-searchable metadata to files that the user really needs, like using Reiser4's subdirectory concept to add sticky notes to a document that travel with it wherever it goes. That, in my opinion, would be great and a major boost to KDE's existing sticky note capabilities. It would also benefit the user massively in terms of re-acquiring their thought process when they open a document again.

Has anyone tried the Google toolbar, by the way? It's great in that it sits beneath IE's address bar, and you can enter a word to search google with, click the button to search you current site with it instead, through google's cache, or switch on highlight mode, where it goes through the current page and highlights where a word appears. Words in a phrase get highlighted different colours, too.

In short, highlighting is fab, and a great way of quickly searching a document, or in Konqy's case, a directory.

It's my one big gripe about file manager searching such as Windows explorer does, that it gives you a completely new view with only the files you asked for, instead of simply showing you which are the ones you wanted. It works for some tasks, but not others.

tjansen's picture

I don't

It’s a waste of the user’s time, and most of all, it is fairly performance-crippling.

Why? You can implement indexing as an idle process after a file has been modified. Queries in the file browsers are running over that index. There's nothing that can slow anything down if you are not using it. It's also not neccessary to index system files.

Users rarely want to search through all their files for something. They usually have a good idea where it is, and a fair idea of what they called it.

Do they? When thinking about a few documents on my disk that I wrote in the last months, I have no idea where they are and what they are called. But I could immediately say two or three keywords that would lead me directly to them, especially if the meta search engine would score hits in headlines higher than in the text body.

Looking in a directory is enough, and it has benefits in locational association with other files.

For this you need to know the file name and the directory. And it doesn't help large part of the users who does not understand and use the directory concept (I haven't seen a study about this, but I wouldnt be suprised at all if more people were able to use Google than create a directory).

Has anyone tried the Google toolbar, by the way? It’s great in that it sits beneath IE’s address bar, and you can enter a word to search google with

Yup, and WinFS is about doing exactly the same thing for your local file system and file shares in your local LAN. Just not limited to HTML, but searching through all document types. They even used the Google toolbar as an example to explain it.

sad eagle's picture

re: Loosing files.

It does certainly happen to me, but I am not so sure whether the metadata approach is the proper one. Most of the time, when I 'lose' something, I think of things like "where did I save that Qt snapshot I downloaded 5 minutes ago", or "where did that testcase disappear? I recall workeing on it while discussing that bug, on menus and such, with Fredrik and Karol". It's almost as if one needs to keep a log of stuff that happening, and let one search in relation to that (which of course seems highly invasive/privacy violating). I think the big problem with meta-data, though, is that it requires users to do work get it right, unless it's heavily automated

datschge's picture

re: Loosing files.

Aren't you actually asking for a list of the most recently touched files? Probably it would be an idea to combine (web/local) browsing history and the act of opening/viewing/editing files (aka recent documents) and the used applications in one flat history list, ideally combined with a duration how long each of those was in the active window. Now show all that info in, uhm, KOrganizer and you can perfectly trace back all your action. Eye-wink

sad eagle's picture

Not quite.

The stuff can be happening a week or two ago, too. And I don't care about most of the files I touched --- say my schoolwork is all neatly organized in a nice heirarchical tree (i.e. ~/Academic/CSC/257/Proj3/, etc.). The difficulty, I think, is that the files that are hard to find are exactly the ones where the user was a bit careless about the organization -- and I don't see how asking users to do more work to provide proper meta info will help in that case (although of course, I am not a typical user).

In fact, the stuff that I miss from the file system is not indexing but things like versionning, and seamless synchronization between different hosts. i.e. for CS projects, I can often develop both on the department machines and my home machine, so I frequently end up scp'ing tarballs and individual files; and if I am too lazy to use CVS, I may end up with multiple versions of directory, which makes it much harder to remember which is which 2 years from then. Now, multi-machine syncing is probably an issue unique to CS students. But I don't think versionning is -- and if one could easily hold revision histories of files transparently, it would probably help with a lot of work. (Can anyone say VAX/VMS?)

tom chance's picture

CVS

A note of interest, some people in the Gentoo community have started using CVS as a meta-filesystem for their home directory, and from looking at how they've done it, I wouldn't say it's a million miles away from extending journalling in filesystems like Reiser and ext3.

Unfortunately, all these ideas aren't really within the scope of the KDE project, though it would be cool to, a few years down the line, have a filesystem that has both heirachy and powerful search and synchronisation capabilities, as well as accessible CVS-style versioning.

luke chatburn's picture

Well...

The issue isn't the indexing as such (although the idea of losing cycles outside of a user-running process will unnerve heavy-serving people), but the simple fact that in order to search a large expanse of meta-data, sooner or later it has to pass through memory. If you want sane search times and decent indexing speed, you will take up a lot of resources. Moreover, you're running a multi-purpose SQL db with a lot of data in memory constantly. The alternative is to segment the data into classes of meta-data (eg. keywords from A-K in filex, L-Z in another, then load the half you need, depending on the keywords). In large numbers of user files indexed, you need a lot of fine-granularity data structures on the disk. That's not to say that they need to be files, just data spaces. Either way, you then need to think about how you arrange those on the disk to get maximal read/write performance, especially for multi-keyword searches, you also become very disk I/O bound, trying to load and test the data, if you don't have it in memory. Pretty soon, you're getting your DB to do complex filesystem construction, except that it is having to read from the disk, test, fail, try again, instead of having the disk already built in a way that suits the task. That's why for this purpose, Reiser4 is far, far better. Reiser 4 also has the advantages that you can access the meta-data directly via script and application, without rewriting heavily and using an API/DCOP intermediary app to translate the calls through your DB app and the simple fact that you don't need to write that quasi-DB app, which will be a pain. Trust me... MS will be picking the bugs out of/optimising WinFS for the next ten years.

Don't get me wrong, databasing is ideal for large data sets where you are willing to lose resources to it, and you're dedicating the machine to it. It's great for that. For a user to sometimes search for a file, it really isn't beneficial, and the resource damage is severe.

I'll freely admit that sometimes users need to do a search for files, but typically, they encode key information into the filename and path. Those are, as mentioned in the Reiser papers, types of meta-data unto themselves. If you need to do a true full-text search, it's rare enough in most cases that you can take the hit of a few seconds of delay. Using meta-data headlines is in effect a half-way house between the two. Using Reiser4 meta-data is a half way house between that and a full search. In general, the speed is *good enough*.

You don't want to be using meta-data searching for the most part within scripts or speed-critical applications anyway (because by using it, you're losing speed and resources), not to mention having to screen the results of a semi-uncertain search with definitive criteria, in which case, you probably could have done better in organising the files in the first place, and that's second nature to scripters.

Ultimately, my point about data organisation, is that we can teach users, or we can develop work flows, such as a project-centric approach, where we can do organisation for the user. If we do this, then we not only abstract the user from the file system if they can't understand it, but we also add a solid basis of organisation that really does allow us to do useful things. Sufficed to say, I have a few ideas on the subject, and I'll see if I can get some work done on them when I finish my (unrelated) thesis Smiling

My point about the Google Toolbar was an aside, unrelated to the remainder of the post. Essentially, full searching is great on heavily-deliniated groups of data, say a single directory, where I might know I'm looking at all my documents for a month, but I'm not sure which ones refer to a given event. That's great, and a search with criteria is relatively sane. If that happens across multiple file types, so much the better. The issue is simply saying "Try and find all references to this in everything I've ever done...!". It's a huge task, even if you can segment the data quickly.

Well, I think I'll leave things at that... Sorry if I antagonised you, but I must say that I am completely convicted that WinFS is a silly idea that won't help users' work flows, productivity or ease of use. Users who really need speed of file finding already organise. It seems like a development that offers bloat and a tonne of conceptual problems, where ultimately, users rarely search anyway, and we can make those situations rarer, easily. Even if they do occur, if we coax the user properly, we can perform a classic search with a short enough timeframe that the user will be very happy.

tjansen's picture

If you want sane search times

If you want sane search times and decent indexing speed, you will take up a lot of resources. Moreover, you’re running a multi-purpose SQL db with a lot of data in memory constantly.

Using a SQL db is the WinFS approach, but IMHO its clumsy and unneccessary. Even worse, the SQL data model does not have much in common with most documents. XQuery is a more natural solution.

Whether it uses a lot of memory or not is an implementation issue. There's no good reason why it should use a lot of memory when not in use. There are small DBs, and a file system does not need all capabilities that Oracle provides. Indexes can be mmapped and are usually read-only. I don't see any dramatic problem.

Pretty soon, you’re getting your DB to do complex filesystem construction, except that it is having to read from the disk, test, fail, try again, instead of having the disk already built in a way that suits the task.

If I understand it correctly Reiser4 can be the DB, it just needs to be accessed by a query language engine which could run in-process. (Whether Reiser4 is good enough for a complex query language may be a different question, I haven't seen any APIs for Reiser4 yet)

You don’t want to be using meta-data searching for the most part within scripts or speed-critical applications anyway (because by using it, you’re losing speed and resources), not to mention having to screen the results of a semi-uncertain search with definitive criteria,

You are assuming that there is no way to access a file beside using meta data. No one said that there isnt also a 'regular' file path.
Even WinFS has it, only Gnome Storage seems to lack it.

My point about the Google Toolbar was an aside, unrelated to the remainder of the post.

Yes, but it is a good example how useful keyword based searches can be. Since the invention of Google I don't browse through API documentation anymore, I just type 'QString' into my browser and it shows me the documentation. A meta-file system allows the same for my local file system. I expect to type "knot slp paper" into my file browser and get my Nove Hrady paper. I don't want to remember its file name or directory, and even if I knew it I wouldnt want to click through the file browser or enter the path name, not even with tab-completion. It's too slow compared to Google-style keyword browsing.

Sorry if I antagonised you, but I must say that I am completely convicted that WinFS is a silly idea that won’t help users’ work flows, productivity or ease of use.

You haven't antagonised me Smiling
We'll see either 6 months after the release of Longhorn or when there is a free equivalent, whatever comes first...

luke chatburn's picture

:)

My point is just that Reiser4 will let you do something close to this... Smiling You can meta-data search in Reiser, let it do the hard work and integrate it better. Meta-data searching complements full-data searching in OS capabilities, but it is usually rare enough in real use that you don't need a permenant overhead. It just takes a little longer when you actually want to do it.

In fact, it might be interesting to do a meta-data search followed by a secondary full-text search of the results. So: "Look for all files headlined 'letters to Sally' and find ones mentioning 'my dog Jeff'". In which case, you're looking at those files at the FS level anyway Smiling

I just think that the idea that we all need meta-data searching is a little overrated, and MS are clutching at straws if they think that this will make a big difference to people's lives. We can do it better with Reiser4, it'll also take less programmatical work and be more stable, with lower non-use overhead Smiling

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.