When teaching, which question types are effective for stimulating deep processing of the material in students?

This is a question which is far too big to reasonably answer, particularly without knowing the subject matter and tasks of importance. In terms of theories with solid evidence, ones that I find useful as guidelines are:

  • Chi's Active-Constructive-Interactive framework. However, it sounds like your system does not naturally lend itself to constructive or interactive activities. This is unfortunate, because I've seen studies with evidence of learning gains for things as simple as quasi-interactive tasks like allowing people to continue answering multiple choice questions for partial credit after wrong answers (e.g., wrong answers trigger hints for further reflection).
  • 25 Learning Principles: More of a list of known effects rather than a theory, it is still useful to know. The primary issue with some of the items on this list is that it is hard (maybe impossible) to optimize for all of them in designing a course. Also, certain content fits certain approaches better, either because of the type of knowledge or expectations by teachers/students (e.g., embedding math into stories might be pedagogically valid, but jarring/disliked due to unfamiliarity).
  • Aligning Media to Cognitive Tasks: While there is a broad literature on this (Mayer obviously being the big US name), Schnotz's work highlights the relevance of picking multimedia that naturally communicates the information you are trying to convey. For example, it is incredibly hard to verbally describe an individual chair (picture worth a thousand words) but equally hard to show enough pictures to imply the simple concept that "most chairs have 4 legs." This intuition implies two things. First, that much of "learning styles" literature represents preferences, and shouldn't be mistaken for what actually leads to better learning (since the content affordances will almost always dominate the best approach to use). Second, it highlights the need to align your learning tasks to the modality that makes the right information salient and accessible, at least initially. Later, if the authentic application of those skills involves tasks where they are not salient or easy accessible, tasks to represent that greater complexity are also useful. This represents one incarnation of the general strategy of scaffolding, where tasks are simplified/supported initially, but increasingly approximate unassisted performance of the cognitive tasks needed. This has implications for question sequencing.
  • Multiple Representations: Related to both of the above, I think it is important to note that multiple ways of testing the same information is important. This has a lot of backing behind it, which is why it is big in the US Common Core for math. It is suspected to have two major mechanisms: a) multiple pathways/cues to recall the same information and b) understanding of the abstract relationships/processes rather than brittle task-specific procedures tied to one representation. This would not generally apply to just different input controls, though, which are likely to be a last-step after the desired learning-relevant processing is done. Or, put another way, if your question leads to knowledge that is so brittle they fall on their face because they have a checkbox instead of a multiple-choice, they probably won't be able to apply that knowledge in practice anyway.
  • Generative: Also noted in some of the above, and relevant to the question types that you list, open response is different and generally better. This is particularly relevant because many of the above types (e.g., multiple choice) are actually very hard to build correctly, with most teachers using a strategies like: 1 right answer, 1 obviously wrong/off-topic answer, 2 variants of right answer with flaws. They're very prone to test-gaming, unless you start with open responses and build one with a right answer vs. common misconceptions, for example. Gaming behaviors are known to reduce learning, probably because learners spend their time thinking about the strategy to game the answer, rather than actually processing the domain-relevant information.
  • Comprehensive Testing: Testing on the full set of knowledge so-far has also been shown to increase learning gains. This was noted by one major report as the single simplest and most effective way to increase course-level learning gains, if I recall, but I don't recall the cite off the top of my head. I also cannot recall if this had implications related to studying more or if it was more of a repeated practice issue.
  • Question Taxonomies: There are also a few question taxonomies to look into, which have some indications that deeper questions (e.g., causal reasoning) result in deeper learning. However, deeper learning is not necessarily more learning. The relationship can be quite complex, and if all you really want is just recall (e.g., the end-task is recall) then deep questions may not be efficient for that goal. As always, aligning practice to your target task is important: if your ultimate task will be shallow, shallow practice probably works. There might be some value to "multiple representations of different shallow practice" using MC questions vs. Y/N vs. checkboxes maybe, but that seems like such an uncommon situation that I am genuinely not sure if anyone studies it.

In general, I would say that I don't see tons of difference between any of the closed-response questions (Y/N, MC, Checkboxes, Ranking, Likert). Yes/No is a bit flimsier than the others because you only need to evaluate one assertion, but I'd still say you'd do similar thinking when given the same question regardless of how you're entering an answer. I'm not aware of any research that shows substantial differences between these techniques off the top of my head, and even if differences were observed there's a good chance that they're not useful differences. As with the learning styles literature noted earlier, just because you can find a difference when everything else is kept equal (controlled) does not mean that the difference is useful in practice because you generally have much stronger factors to manipulate. Open response is different because you get less cues (e.g., you can't just use recognition, and instead need to use recall). There is some research indicating that is better, but applying the caveat of what better means (e.g., might be deeper but less efficient, if all you truly need is recall).

I will note that there are a few systems out there for making interactive questions for courses. CMU has the CTAT system. WPI maintains the ASSISTments system. Both of these have various levels of retry/hinting support for interactivity, among other types of adaptation. There are also some similar systems for doing this with open-response formats (i.e., dialogs), but I don't currently know any good ones that can yet be easily and freely embedded by instructors into courseware and that also have professional support staff available.

Finally, if it really is the input controls that you're most interested in, you might find more literature on that in fields like HCI or marketing survey design. Both of those fields tend to focus more on the affordances and optimizations to input mechanisms, while learning sciences and education is typically more interested in the content of the questions (other than the notable concerns about choice-based questions and simple active learning tasks in general). But personally, I think that the big gains aren't in those minor tweaks.