Extract filename without extension from the absolute location

Edit:

Even though my first solution answered the question as asked, I saw @DavidBoho 's answer and he made several good points. He suggested that if the filename is my_file.tar.gz then the return value should be my_file.tar and also that my solution would fail in the event that the file had no extension at all. All of the code here is available on this fiddle.

Given the table and data as follows:

CREATE TABLE with_filename
(
  file_id INTEGER,
  file_name VARCHAR (256)
);

populate:

INSERT INTO with_filename
VALUES
(1, '/users/mcm1/ualaoip2/vmm/file1.pdf'),
(2, '/users/mcm1/ualaoip2/vmm/file2.py'),
(3, '/users/mcm1/ualaoip2/vmm/file3.pdf'),
(4, '/users/mcm1/ualaoip2/vmm/file4.c'),
(5, '/users/mcm1/ualaoip2/vmm/file5.java'),
(6, '/users/mcm1/ualaoip2/vmm/file6.class'),
(7, '/users/mcm1/ualaoip2/vmm/file7'),
(8, '/users/mcm1/ualaoip2/vmm/file8.tar.gz'),
(9, '/users/mcm1/my_prog.cpp');

My original solution:

SELECT LEFT(
      RIGHT(file_name, POSITION('/' IN REVERSE(file_name)) - 1),
                       POSITION('.' IN 
      RIGHT(file_name, POSITION('/' IN REVERSE(file_name)) - 1)) - 1
) AS my_file
FROM with_filename;

gives the result:

my_file
file1
file2
file3
file4
file5
file6
file       -- << should be file7
file8      -- << should be file8.tar
my_prog

After reading @DavidBoho's post, he used the SPLIT_PART function to resolve the problems with files 7 & 8 - see the fiddle. I decided to look again at my own SQL and I came up with this (perhaps more traditional?):

SELECT 
      REPLACE(SUBSTRING(file_name, (LENGTH(file_name) + 2) - POSITION('/' IN REVERSE(file_name))),
      RIGHT(file_name, POSITION('.' IN LEFT(REVERSE(file_name), POSITION('/' IN REVERSE(file_name)) - 1))),
      '') AS the_files
    FROM with_filename

Result:

the_files
file1
file2
file3
file4
file5
file6
file7
file8.tar
my_prog

which is also the correct answer!

While I was grasping for a solution, I became interested in regular expressions as a means of solving this problem. Even though I (we) was/were able to solve this using "traditional" SQL, it became clear to me that regexes are extremely powerful and even though SQL is now Turing complete, it could rapidly become extremely convoluted for relatively simple string manipulation problems, so I decided to investigate.

I found two regex solutions - in fairness, I can't claim to have done this myself, the solutions are as a result of a question I asked on StackOverflow. So, the regex solutions are as follows:

The best one is this one -

SELECT
    file_name,
    REGEXP_REPLACE(file_name, '^.*/([^/]*?)(\.[^/.]+)?$', '\1') AS filename
FROM with_filename;

There is a second one, but IMHO (and that of the original author) it's a bit of a hack - it involves two nested REGEXP_REPLACEs

SELECT
    file_name,
    REGEXP_REPLACE(REGEXP_REPLACE(file_name, '^.*/(.*)$', '\1'), '\.[^.]+$', '') AS filename
FROM with_filename

Finally, there may be a solution possible using the UNNEST and the STRING_TO_ARRAY functions together - I came up with this code:

SELECT fn,  
    LEFT(fn, POSITION('.' IN fn) - 1) AS lef
FROM with_filename w,
  UNNEST(STRING_TO_ARRAY(w.file_name, '/')) AS fn
GROUP BY fn
HAVING COUNT(fn) = 1
ORDER BY lef;

which gives the result:

fn  lef
file7   file       -- << should be file7
file1.pdf   file1
file2.py    file2
file3.pdf   file3
file4.c     file4
file5.java  file5
file6.class     file6
file8.tar.gz    file8  -- << should be file8.tar
my_prog.cpp     my_prog

I tried lots of different permutations with this, but couldn't get it to work. Would be grateful for any input! :-)

Another interesting function is REGEXP_SPLIT_TO_TABLE.

SELECT 
  fn, 
  COUNT(fn)
FROM
(
  SELECT REGEXP_SPLIT_TO_TABLE(w.file_name, '/') AS fn
  FROM with_filename w
) AS sq
GROUP BY fn
HAVING COUNT(fn) = 1
ORDER BY fn

Result:

fn  count
file1.pdf   1
file2.py    1
file3.pdf   1
file4.c     1
file5.java  1
file6.class     1
file7   1
file8.tar.gz    1
my_prog.cpp     1

Again, this might be worth pursuing - didn't have time.

Example DML/DDL

CREATE TABLE with_filename
(
  file_id INTEGER,
  file_name VARCHAR (256)
);

INSERT INTO with_filename
VALUES
(1, '/users/mcm1/ualaoip2/vmm/file1.pdf'),
(2, '/users/mcm1/ualaoip2/vmm/file2.py'),
(3, '/users/mcm1/ualaoip2/vmm/file3.pdf'),
(4, '/users/mcm1/ualaoip2/vmm/file4.c'),
(5, '/users/mcm1/ualaoip2/vmm/file5.java'),
(6, '/users/mcm1/ualaoip2/vmm/file6.class'),
(7, '/users/mcm1/ualaoip2/vmm/file7'),
(8, '/users/mcm1/ualaoip2/vmm/file8.tar.gz'),
(9, '/users/mcm1/my_prog.cpp');

Tags:

Substring