How can a big table be treated as a database?

A way of getting around the a_/;test[a] syntax is to write out the tests in string form, and use replace to insert the values. For this to work you need to build rules from your table. Here is a simple implementation:

 SetAttributes[queryCriteria, HoldAll]
 queryCriteria[theTable_, query_] := Function[{entry}, 
 Unevaluated[query] /. (Rule @@@ Transpose[{theTable[[1]], entry}]), HoldAll]

 Select[theTable, queryCriteria[theTable, "color" == "blue" && "size" > 10]]

Personally I would prefer not having to give theTable as an argument to the query function constructor, since conceptually you shouldn't need a table to define a query, however it's needed during the construction because you have the field names listed in the first row. A way to nicely work around this is to consider a query an indpependent entitiy, which doesn't use the table until it's used in Select. This can be defined by setting an Upvalue pattern for Select, to resemble your included example, I use where as a name for the query:

 SetAttributes[where, HoldAll]
 Select[table_, where[query_]] ^:= Select[table, queryCriteria[table, query]]

So that the query can be written:

 Select[theTable, where["color" == "blue" && "size" > 10]]

This is all just ways of doing a similar thing with different syntax however. I would expect that performance issues become more important with big Databases.


Don't reinvent the wheel: If you need a database you should be aware of the SQLite access readily built into Mathematica, though unfortunately undocumented:

db = Database`OpenDatabase[FileNameJoin[{$TemporaryDirectory, "mma-temp-db.sqlite"}]];

Database`QueryDatabase[db, 
    "CREATE TABLE stuff(id INTEGER PRIMARY KEY,color TEXT,size REAL,flavor TEXT)"
];

Database`QueryDatabase[db, "BEGIN"];

Scan[
  Database`QueryDatabase[db, 
    ToString@StringForm[
      "INSERT into stuff(color,size,flavor) VALUES ('`1`',`2`,'`3`')",
      Sequence @@ #
  ]] &,
  theTable[[2 ;; -1, 2 ;; 4]]
];

Database`QueryDatabase[db, "END"];

Database`QueryDatabase[db,"SELECT * FROM stuff WHERE color = 'blue' AND size > 10"]

Database`CloseDatabase[db]

and in case you are more into speed than persistency:

Database`OpenDatabase[":memory:"]

for details just look for documentation about sqlite, there is tons of good documentation around for it...

EDIT: as murta mentioned in his comment it is also possible to make use of SQLite with the officially supported and documented DatabaseLink`. In version 10 a corresponding driver is included, for earlier versions a SQLite JDBC-driver has to be installed manually. As far as I can tell using the Database`* functions is a very lightweight approach most probably making direct use of the sqlite libraries while DatabaseLink` makes use of Java/JLink/JDBC which is kind of heavyweight but of course also has its advantages. Also from murta is the above example using DatabaseLink:

Needs["DatabaseLink`"]
conn=OpenSQLConnection[JDBC["SQLite",$TemporaryDirectory<>"testBase.sqlite"]];
SQLExecute[conn,"CREATE TABLE stuff(id INTEGER PRIMARY KEY,color TEXT,size REAL,flavor TEXT)"];
SQLInsert[conn,"stuff",{"color","size","flavor"},theTable[[2;;-1,2;;4]]];
SQLExecute[conn,"SELECT * FROM stuff WHERE color = 'blue' AND size > 10"]
CloseSQLConnection[conn]

For in memory version use: conn=OpenSQLConnection[JDBC["SQLite(Memory)","jdbc:sqlite::memory:"]];

Just for completeness: there are also drivers for HSQL included in all versions of DatabaseLink that I can remember of which provide similar functionality as SQLite, since version 10 there are also drivers for H2 and Derby included which also claim similar functionality.

EDIT since version 11.1 the Database` functions have been removed. So for any version newer than 11.0 one has to use the DatabaseLink` approach, but as they come with the SQLite driver you still can access SQLite databases in those versions out of the box.


I had forgot about this question, or the answer linked by @Leonid, or @jVincent's etc, and last week I was under the same "need".

I'll just post what I used since it's no extra work, in case it still helps someone.

Speed wasn't a concern, so I have no clue how much time this wastes

LabeledMatrix[cs_, mat_][cols : {__String}, funQ_] := 
    LabeledMatrix[cs, mat][cols, funQ, cols];

Normal[LabeledMatrix[_, mat_, ___]] ^:= mat

(lm : LabeledMatrix[cs_, mat_?MatrixQ])[cols : {__String}, funQ_, showCols : {___String}] :=
    Extract[mat[[All, label2Position[lm, showCols]]], 
      Position[LabeledMatrix[cs, mat][cols], {i___} /; funQ[i], {1}]];

LabeledMatrix[cs_, mat_?MatrixQ][cols : {__String}, All] := 
    LabeledMatrix[cs, mat][cols];
(lm : LabeledMatrix[cs_, mat_?MatrixQ])[cols : {__String}] := 
    mat[[All, label2Position[lm, cols]]];

SetAttributes[label2Position, Listable];
label2Position[LabeledMatrix[cols_List, ___], lab_] := 
    First@Flatten@Position[cols, lab, {1}, 1];

There's basically no error checking, formatting rules, etc.

Usage

LabeledMatrix is a wrapper. It takes, as a first argument, the names of the columns, and as a second, the data matrix.

lm = LabeledMatrix[
   {"ID", "Person", "Age"},
   {{4, "Peter", 23}, {5, "Mary", 33}, {55, "John", 23}}];

Say you want the "Person" and "Age", column

lm[{"Person", "Age"}]

(* {{"Peter", 23}, {"Mary", 33}, {"John", 23}} *)

The first argument, (unless you use the 3 argument form), is a list of the columns you want as output.

If you give a second argument, then that second argument is a predicate function to filter rows. The arguments taken by that function are those supplied in the first argument. Example

lm[{"Age", "ID"}, #2 > #1 &]

(* {{23, 55}} *)

If you supply a third argument, it's the list of columns returned. The first argument still works as the input to the predicate. So, say you want the IDs of the people aged under 30

lm[{"Age"}, # < 30 &, {"ID"}]

(* {{4}, {55}} *)

A second argument of All is the same as nothing. Normal gives the data matrix. First, or some other convenience function you want to create, the names of the columns.