Second argument of BeginPackage with nested package loading

I've certainly encountered this behavior before. While I can't speak authoritatively, I'd think this is as designed, although it does introduce certain inconsistency. I also think that this issue is a result of clash of cultures: the end user - oriented one from the earlier days of Mathematica, and the one coming from standard software-engineering practices.

A bit of history, and the package extension model

I think, it may help to go a little into the history of the package mechanism.

On one hand, private import via Needs inside a package wasn't always available, in the sense that such an import in early versions of Mathematica would still keep the context of the loaded package on the $ContextPath, after the package has been loaded, even if called in the private section of the package.

On the other hand, to me it looks like early package development practices had, in particular, these features:

  • (Deeply) nested package dependencies were not very common
  • In many cases, one wanted to extend some of the built-in packages, giving the end user the functionality of one of the core package extended with some additional functionality, rather than build a package based on other packages but expose only that package's interface.
  • The end-user typically was not supposed to know anything about package development and such. In other words, the majority of end-users were considered non-programmers (which was IMO quite justified).

So, the main goals of the second argument of BeginPackage were, I think, these:

  • The usual one - make the functionality from packages listed in the second argument of BeginPackage available for the implementation of a given package
  • Provide a formal declarative syntax to specify package dependencies
  • Provide a way to expose an extended package "MyPackage`", that would also make additional packages available to the end-users without a need to call Needs on them separately.

While the first two goals are really necessary and are present in one form or another in most programming languages, the last one is different. And the inconsistency caused by it, is exactly what you noted: if we think of a package as an extension of one or more other packages, then it should always behave like that, no matter which way we load it.

Why I think the current behavior is better

From the developer's perspective, I'd argue that the current behavior is better than if BeginPackage was loading all those packages and their dependencies on the $ContextPath. Here are the reasons:

  • Information-hiding. It is always better to keep only as much information available to the client code, as it needs, and no more.
  • Level of control. By restricting the set of loaded packages to only those listed in the second argument of BeginPackage (but not their dependencies), the system allows the developer to second-guess developers of those packages s/he uses to build a given package at hand. The developer can then control exactly which packages are on the $ContextPath during the loading of their own code.

    If the packages were loaded automatically with all their dependencies, the developer of a given package would need to care about name clashes with all those dependent packages, and perhaps remove them from the $ContextPath. However, in general this is not even possible, since s/he is not supposed to know all those dependencies of dependencies.

  • Nesting ambiguity. If the dependencies of dependencies were loaded too, then which packages do we keep on the $ContextPath after the package loads? Where do we stop in this nesting? We may end up adding a bunch of contexts to the $ContextPath, which would've been really bad.

Implications for package design

I think, the guiding principles should be separation of responsibility and information-hiding.

The fact that your package A depends on packages B and C, is an internal business of your package, about which the users of A could not care less. So, there are two different cases here:

  • You do want to expose some of the functionality of B and / or C to the end user of your package A

    As I said before, this really doesn't sound like a right approach to me, in particular because, preferably, for any functionality there should be a single source that provides it and is responsible for it. Of course, duplication of code is the worst, but duplication of the interface is almost as bad.

    However, if you really need to, there is always delegation: create your own wrappers around these functions from B and C you need, and expose these wrappers as a part of the interface of A. This means that now you take the responsibility of those functions as a part of the A's interface - which is IMO the right thing to do. If later something changes in the functionality of B or C, it will be your responsibility to maintain the consistency of the interface of A, so you then don't put that burden on the users of A.

  • You don't require the end user of A to have an access of B and C - then you don't need anything besides the standard single - package development machinery.

If your package contains several sub-packages, you simply load those in your Kernel/init.m. This does not change the fact that you expose a single interface. If you want to make those sub-packages relatively independent, and packages on their own right, this may require to use some delegation / wrappers inside your main package, as a price for it - as I mentioned above. In most cases, however, I wouldn't do it, but would let those users who need that functionality load those packages (B and / or C in this example) separately, on their own.

Summary

Putting it more bluntly: the extension package idea (leaving contexts on the $ContextPath after the package has been loaded) may be Ok from the end-user usability point of view, but is IMO a failed concept for software development.

The advantage of such an approach is that it is friendly to non-programmers. The disadvantage is that it does not nest, and therefore also does not scale. From the software engineering perspective, a package can not and should not take any responsibility for other packages, and should only be responsible for its own interface. In addition, "extension package" model makes it harder to enforce information hiding and encapsulation - another two of the fundamental software engineering principles.

So, here I agree with Kuba, in that I tend to use the second argument of BeginPackage rarely. The only scalable model of imports are private imports, which is what also use all other programming languages I am familiar with.


What features I/we/developers need:

  1. allowing (only) MyPackage` to use particular non System` packages

    e.g. using GeneralUtilities` functions after BeginPackage["MyPackage`"]

  2. allowing user to use particular non System` packages (only) after loading MyPackage`

    e.g. After << MyPackage` I would like to be able to call PrintDefinitions from GeneralUtilities` .

  3. Both, 1. and 2. simultaneously

Which one of them is available with the second argument of BeginPackage?

None...

Since needed context is exported after EndPackage[] 1. point (only) is not fulfilled.

The second one (2.) isn't fully available either because the user who wants to create a NewPackage` which needs MyPackage` will have to load GeneralUtilities` anyway. As you have already shown. So thanks, I don't need a separate scheme to achieve what is essentially the same thing.

The second argument of BeginPackage helps neither with 1. or 2. it can't help with them both obviously.

That's why I'm not using it at all.

Moreover I'd expect from the second argument of BeginPackage to give me (1) or (3). It doesn't and I find it unintuitive.

Ok, but what do you do then?

To solve our 3 problems:

  1. Put Needs["GeneralUtilities`"] after BeginPackage["MyPackage`"]

  2. Put Needs["GeneralUtilities`"] before BeginPackage["MyPackage`"] or even better, in init.m before Get["MyPackage`MyPackage`"]

  3. Put them in both those places.

Straightforward, readable, always working.