Querying a CSV file

I think OpenOffice.org Database can do what you want. It works like this.

  1. Start Open Office.org Database, it shows the "Database Wizard"

  2. Select "Connect to an existing database: Text"

    enter image description here

  3. Specify path to text files as well as details like separator character etc.

    enter image description here

  4. Create and execute Queries

    enter image description here

If you have ever worked with Microsoft Access you will find the GUI familiar.


If you can do without a GUI there are always the traditional UNIX commands. I use them a lot to do simple queries to (small) CSV files. Here is how it works:

clause      operation   command
-------------------------------
from             join     `join`
where     restriction     `grep`
order by           --     `sort`
group by  restriction      `awk`
having    restriction     `grep`
select     projection      `cut`
distinct  restriction     `uniq`
limit     restriction     `head`
offset    restriction     `tail`

Have you tried LogParser?

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.

A tutorial (and a another one) on using the SQL like query language with CSV files I found using google.

Example Query:

logparser -i:CSV "SELECT TOP 10 Time, Count INTO c:\logparser\test\Chart.GIF 
FROM c:\logparser\test\log.csv ORDER by Time DESC" -charttype:bar3d

You can use ODBC to query text files:

Accessing Text Files using ODBC Data Provider

Note that you don't need MS Access for this to work, the tutorial in the above link just uses MS Access to create the text file, but as you already have a text file, scroll down halfway, and start the tutorial where you see the title Accessing a Text File.

Update: I created a DSN on a .csv file myself to be able to create this step by step tutorial... here it comes:

  • Make sure your .csv file is in its own directory without anything else.
  • Open the "ODBC Data Source Administrator" (start - control panel - administrative tools - Data Sources (ODBC)).
  • Go to the File DSN tab and click on "Add...".
  • Choose "Microsoft Text Driver (*.txt, *.csv) from the list and click "Next >".
  • Give a name for your file data source (e.g. "test") and click "Next >".
  • Click "Finish" (After this, a dialog will appear where the "Data source name" and "Description" fields are indeed greyed out. This is normal. No worries.
  • Uncheck the "Use Current Directory" checkbox. The "Select Directory" button will be enabled.
  • Click the "Select Directory" button and nagivate to the folder in which you placed your .csv file in the first step.
  • Click on the "Options>>" button.
  • Click on the "Define Format..." button.
  • In the left "Tables" list, select your .csv file and click on the "Guess" button. (This will analyse your csv file and create an appropriate field for each column in your .csv file.)
  • Go through the generated columns (F1,F2,...) in the right list, give them meaningful names and set the appropriate data type (sometimes the guessing is not always right).
  • Once everything is set up right, click "OK" (2 times).

At this point you should have a file DSN with which you can access your .csv file through ODBC. If you inspect your folder where the .csv file is placed, you'll see a schema.ini file, which contains the config you just created. When you have multiple .csv files, each one corresponds with a table and each table will have a [filename.csv] block in the schema.ini file in which the different columns are defined... You can also create/change that schema.ini file directly in a text editor in stead of using the GUI described above.

As for your extra question "how to connect to this ODBC provider using a query tool":
I have a tool which I wrote myself long time ago which is not eligible for publication. But a quick Google search came up with odbc-view, a freeware tool which does what you want.
I downloaded and installed the tool.
After starting the tool:

  • Click on "DataSource...".
  • Select your File Data Source which you created previously (e.g. "test").
  • In the query pane type "select * from [filename.csv]".
  • Click "Execute".

You should see the contents of your .csv file in the lower pane now.
Hope this helps... Let me know how you do or if you need further assistance.

Tags:

Csv

Query