What data analysis software should a supervisor recommend to their research students?

Teaching how to program should be a priority in research, in at least Sciences. Doesn't matter what STEM field you are in, almost certainly you will need to deal with data, and using "black box" software only teaches to do whatever the software tells you.

I've seen people give results using standard deviation and mean for non-normal distributions, and its just because they didn't know how to plot their distribution and just used black box software. I've seen people rename file by file a folder with 500 files with data. These are worse than "Attack ships on fire off the shoulder of Orion".

Since nowadays almost all research in STEM is performed using computers, understanding computers is a must.

My recommendation is to make people learn MATLAB if you have access to it, or Python if you don't have access to MATLAB or you are a Open source / free software supporter. Both of this languages are designed to be very high level, and not need "advanced" computer science skills (such as inheritance in OOP, or pointers in C)*. Both of the languages are widely used and there are numerous free online courses to learn, in Coursera, Codeacademy, Udacity, EDx or any other online learning platform.

Learning how to code to the basic point should take less than 2 months, considering that meanwhile the student is also doing other things. And they can save thousands of hours of tedious work.

Let me repeat the key message: We need researchers with programming skills. Its incredibly important skill to be able to perform research in the XXI century.

While this answer mainly focuses in STEM, basic programming in other fields that use statistics is also useful.

*Of course, knowing about that helps.


I would like for students to rapidly acquire flexible, durable analysis skills.

Your criteria are quite stringent! I think you are going to have to compromise at some point along the line. If you want them to acquire the skills rapidly, then they are probably going to have to use menu-based software, which will be limited in its flexibility. The long-term and more flexible solution would be for the students to learn statistical programming, but that of course has a steep learning curve.

In my opinion, R has a lot of advantages. [I imagine you have already come across it, so I may be stating the obvious here and you may have a good reason for ruling it out, but...]

  • It is free and open source, and therefore once learnt, the skill can be taken anywhere.

  • It's massively flexible when you take into account all of the add-on packages

  • Students can "ease" into it using R commander, which gives a menu-based interface but also outputs the corresponding code.

  • It is popular and therefore very well resourced.

The best compromise that I can think of would be to start the students off using the menu-based R commander package, but encourage them to inspect and customise the code where possible. If you are not able to give training yourself, it would probably be a good idea to arrange for someone else (either in your department, or pay someone external) to give a course. There are lots of good self-learning resources available, but a course ought to speed up the learning process. When they see how powerful the software is, it is likely to encourage them to put in the time and effort to learn to use it well.


We're talking about students here, not currently practicing researchers, and so my comment is really made with respect to future trends rather than the current state of play which I believe the other answers address.

I believe that in the future, more and more people will be expected to know how to program if they are going to do any kind of data analysis. Perhaps not on the more theoretical side, but since you said your students are doing practical work, I will assume that is not an issue. Tools like R and Matlab are good places to start if you are unfamiliar with programming and want to get something done right now; but honestly, since the barrier to entry for programming in fully-fledge generic programming languages is so low these days (and expected to get lower), I see no reason not to point students in the direction of a full programming language and the modules they might want for doing statistical analyses that are relevant to their field.

Whilst R and Matlab are fine choices, personally, I would introduce my students to something like Python, and the excellent modules that are avalible to do all the data analysis that can be done in R/Matlab that exist in the Python ecosystem. Python has a very gradual learning curve at the beginner end of the spectrum, while at the other end advanced programmers can write code thats just as fast as C if they take advantage of the newer, optimized interpreters. These 2 pros, plus the plethora of modules for doing any kind of analysis/plotting R or Matlab can do, is what has made Python the defacto language of choice in my field (Bioinformatics), and likely a powerful tool under your student's belts going forward with whatever they decide to pursue in life.

Of course, there are other languages out there, such as Java, Julia, Rust, etc - however I would rather teach those as second or third languages to learn, once you have a strong foundation in Python.

For the record, i'm not saying "teach them python", i'm saying just make them aware of it's existence.