What Is the Difference Between Interior Point Methods, Active Set Methods, Cutting Plane Methods and Proximal Gradient Methods?

Although these are all optimization algorithms, they tend to be used in different contexts. Note, you requested a lot of technical information that I don't remember off the top of my head, but perhaps the following will get you started.

Interior Point and Active Set Methods

Both of these algorithms are used to solve optimization problems with inequalities. Generally speaking, they are used in conjunction with other algorithms for solving problems of the type: $$ \min\limits_{x\in X} \{ f(x) : h(x)\geq 0\} $$ or $$ \min\limits_{x\in X} \{ f(x) : g(x)=0,h(x)\geq 0\} $$ That said, they go about handling the inequality in different ways.

In an active set method, we largely ignore the inequality constraint until we generate an iterate that would violate it. At this point, we momentarily stop, and then treat the blocking inequality constraint as an equality constraint. Those inequality constraints that we treat as equality constraints are called active. At this point, we continue with the algorithm, but we play a game to insure that our new iterates lie in the nullspace of the total derivative of the constraint. Really, unless we want to generate some kind of hateful method, we just assume that the $h$ are affine constraints because moving in the nullspace of the total derivative means moving around in the nullspace of the operator that represents $h$. Now, at some point, the algorithm may want to move off of the inequality, so we also need a mechanism to recognize when this occurs. At this point, the offending inequality constraint becomes inactive and we largely ignore it again. Anyway, the simplex method is an example of an active set method specialized for linear programs. Generally speaking, this algorithms tend to be very reliable and robust. That said, excessive pivoting, which means adding and removing inequality constraints from the active set, can dramatically slow down performance. In a nonlinear setting, working in the nullspace of the derivative of $h$ can be a pain. Every time we modify the active set, we end up changing the Hessian and the gradient. This complicates iterative system solvers since we constantly modify the problem. In addition, we have to solve to get into the active set, which can be expensive. There are some tricks that can be played with updating QR factorizations that help this process and this is discussed in things like Nocedal and Wright's book Numerical Optimization on page 478.

Interior point methods typically refer to primal-dual interior point methods. Well, there's probably a better name since sometimes people use primal or dual only methods. Anyway, part of the confusion on the name is that there are a couple of interior point methods such as primal-dual, reflective (from Coleman and Li), and even something like Zoutendijk's feasible direction method is an interior point method. Anyway, in more common usage, interior point methods attack the optimality conditions \begin{align*} \nabla f(x) + g^\prime(x)^*y - h^\prime(x)^*z =& 0\\ g(x) =& 0\\ h(x)z =& 0\\ h(x) \geq& 0\\ z \geq& 0 \end{align*} by perturbing the complimentary slackness condition as well as requiring strict inequality. This gives \begin{align*} \nabla f(x) + g^\prime(x)^*y - h^\prime(x)^*z =& 0\\ g(x) =& 0\\ h(x)z =& \mu e\\ h(x) >& 0\\ z >& 0 \end{align*} where $e$ is the identity element and $\mu$ is the barrier parameter, which is carefully reduced to 0. The other way to derive it is to replace the inequality constraint with a log barrier function in the objective. That actually gives us the two different ways to visualize the problem. Personally, I prefer to just think of the perturbed problem, which is then fed to Newton's method. Most people prefer to think of how we modify the objective problem to represent inequality constraints. Really, the easy problem to graph is $\min\{x : x\geq 0\}$. With a log barrier, we change the objective $f(x)=x$ into $f(x)-\mu\log(x)$. If you graph that latter function, it'll give an idea of why it works. Anyway, interior point methods tend to work very efficiently and can solve many large scale problems, or really even small scale, faster than active set methods. Simply, rather than figuring out how to creatively pivot, we figure out how to creatively manipulate and manage the barrier parameter. That said, it can be a royal pain to do so. In addition, odd things can happen if we get to close to the boundary of the feasible set. As such, if we have to repeatedly solve an optimization problem where one problem is a perturbed version of another, interior point methods can be less optimal. In that case, an active set method can be preferable. All that said, the real advantage, in my opinion, that interior point methods have over active set methods is that the Hessian and gradient are only manipulated once per optimization iteration and not every time we hit the boundary. For nonlinear problems, this can be a big deal.

Cutting Plane Methods

Generally, these methods are used to help assist in solving mixed integer linear programs (MILPs.) In a MILP, we have a linear objective, linear equality constraints, linear inequality constraints, and integer constraints on the variables. Though there are many different algorithms to solve these formulations, the traditional, robust method is an algorithm called branch and cut. Essentially, solving an MILP is hard, so we relax it into a linear program, which, if we're minimizing, gives a lower bound. For example, if we had a MILP like $$ \min\{d^Tx : Ax=b, Bx\geq c, x\in\{0,1\}\} $$ we can relax it into $$ \min\{d^Tx : Ax=b, Bx\geq c, 0\leq x\leq 1\} $$ which gives a lower bound to the original problem. Now, the branch piece of branch and cut fixes the variables to integer quantities and then tries to bound the problem. It's called branch because we normally track these decisions with a tree where one branch fixes a variable one way and another branch fixes a variable another way. In addition, we add new inequality constraints to the relaxations to help strengthen the relaxation and give a better bound. These new constraints are called cutting planes. Essentially, we can add redundant, unnecessary inequality constraints to the original MILP, but these constraints may not be redundant for the relaxations. In fact, if we knew all the linear constraints necessary to represent the convex hull of the feasible set, we'd immediately be able to solve the MILP since we know the solution lies on the convex hull. Of course, we don't know this, so we try to be smart and add cutting planes like Gomory cuts. Long story short, cutting plane methods try to approximate some feasible set with new inequality constraints. Most often, we see this in MILPs, but there are other places where they arise.

Proximal Point Methods

In truth, I'm not super familiar with these algorithms and their properties. As such, the snarky answer is they're the algorithm that no one really used until compressed sensing became popular. Generally speaking, they're called proximal because there's a penalty term that keeps us close to a previous point. For example, in $\textrm{prox}(y) = \arg\min\{f(x) + \lambda \|x-y\|^2\}$, we have a term to keep us in proximity to $y$. Anyway, someone else should answer this one. Snarkiness aside, after the compressed sensing craze, they pop up in a variety of useful applications.