What's a good solution for file-tagging in linux?

I've just released an alpha of my new program that attempts to provide this functionality. It currently meets some, but not all, of your requirements. It may be of interest to you anyway. It provides a command-line tool for tagging and a virtual file-system for browsing (where tags are represented by directories).

http://www.tmsu.org/

any file readable by the user can be tagged freely

Yes.

a user can search for files matching one or several tags

Yes. Either via the command-line tool or by browsing the tag directories in the virtual file-system.

files can be moved around without losing the previously associated tags

No. However the application stores fingerprints of the files tagged which are used to help identify moved files. A 'repair' command is provided that will update the paths of moved files. (Obviously this mechanism breaks down if a file is both moved and modified.)

the system could be backed up easily

Yes. It's a simple Sqlite 3 database file.

no dependencies on any desktop environment

Yes. No dependencies and as it can be run as a virtual file-system it is available to peruse as a file-system in any program that supports symbolic links.

if any gui is involved, there must be a cli fallback

No GUI at present.


It's not clear what kind of searching you want. If you want it to work anywhere in unix, rather that just your home directory, and you only want to do pathname-based searches, the following scheme is workable, with a little bit of shell hackery, and using the standard locatedb:

  1. Each directory that contains at least one tagged file needs a standard subdirectory, say .path-tags;
  2. Each file in the directory $FILE with link $TAG (which should not contain the char _) has a link $TAG_$FILE -> ../$FILE

I leave the details of the locate-tag script to you; it should be a two- or three-liner, using only the locate command and shell hackery. (If you're interested, I could write one).

Some of the KDE chaps talked about this sort of scheme for metadata, although I don't recall the details.

It should also be possible to do more sophisticated, content-examining tests based on this scheme with a similar script wrapped around find.

Thoughts on updated requirements

  1. any file readable by the user can be tagged freely - Yes, should be no problem
  2. a user can search for files matching one or several tags - Likewise
  3. files can be moved around without losing the previously associated tags - The directories they inhabit can be freely moved about, but if the file is moved from the directory, we are in trouble. If the tags took the form $TAG_$INODE_$FILE and we have an efficient way to find which paths have a given inode, then we can do this, losing tags only if we move out of filesystems. Copying files might make some trouble, and this is clearly more complicated than my original suggestion.
  4. the system could be backed up easily - not essentially difficult.
  5. no dependencies on any desktop environment - none
  6. if any gui is involved, there must be a cli fallback - that's where we live!

Postscript The "reverse-inode-lookup" file described by the link (2) you showed me in your answer to (1) can be used to give some additional infrastructure. We can run a service on the reverse lookup file, which checks that each inode given in the filename of a tag matches the inode of the file (if any) the tag points to. If there is no match, then the required surgery can be performed (does the inode still exists? where is it?), and the reverse lookup file being either mutated or regenerated, and the tag symlinks being updated.

I anticipate one tricky case: what if the tagged file is not where the tags say it should be, the reverse lookup file says it still exists, but the prodigal file is not where the lookup file says it is, the lookup file being out of date? There are a few ways to handle this case, none obviously ideal. Apart from this, this whole task seems to be the kind of thing Perl is well-suited to...


Nobody mentioned, but you definitely should look at extended file system attributes. ext4 for example has them. there are tools getfattr and setfattr to deal with them. Of course you will have to write some shell scripts to search for files tagged with sometag. Regarding mentioned questions all the answers are "Yes". You should only take into account that it's depended on file system.