When programming in large code bases it can happen inadvertently that cycles of imports are created. In JavaScript, import statements trigger loading and initialization of the specified file directly. In case there is a dependency cycle, files might only be initialized partially and hence errors might occur later during runtime. In this post we present how N4JS detects and avoids these cases by showing validation errors in the source code.
Introduction
Let's start with the most simple example in JavaScript to illustrate the essential problem.
console.log(s); // prints 'test'?
export const s = "test";
Executing the two-liner above results in:
ReferenceError: Cannot access 's' before initialization.
This is quite obvious and wouldn't surprise anyone. It is obvious because the read access is stated right before the definition of the constant s in the same file. However, it wouldn't be very obvious anymore when both the read access and the definition of s happen in separate files. Let's split up the example into the files F1.mjs and F2.mjs.
F1.mjs
import * as F2 from "./F2.mjs";
export const s = "test";
console.log(F2.s); // prints undefined?
F2.mjs
import * as F1 from "./F1.mjs";
export const s = F1.s;
Executing this example results in a similar error:
ReferenceError: s is not defined.
And again the cause for the error is an access to a not yet initialized variable. As a side note: Modifying the variable to be a 'var' instead of a 'const' would fix the error and return the print-out "undefined". This is due to hoisting of var symbols, but is still not the intended result which would be the print-out "test".
So far, both of the examples either give an unintended result or a compile time error is shown. Errors like these can only be identified after they actually happened during tests or production. While the two-liner example seems way too obvious to actually occur often in practice, the second case can easily hide in projects of many files and imports. A further difference is that even if the first example results in a runtime error, it can usually easily be identified and fixed. The second example however can span across many files by increasing the size of the cycle of import statements and therefore is hard to find and fix.
Two important properties of the execution semantics of JavaScript in Node.js can be witnessed here:
(1) In case a file m is started or imported that imports another file m', a subsequent import back to file m will be skipped. As a result, file m' might be only initialized partially when accessing not yet initialized elements from m.
(2) There is an exception to (1) regarding functions. Since functions are hoisted, they do not have to be reached by the control flow to get initialized. Hoisting will initialize them immediately so that they can be called from any location.
Let's look at a third example which reveals a similar case of reference errors. This time the error occurs depending on the entry point of the program. Have a look at the two files below which either result in the print-out "test" or "undefined" depending on which file was the entry point for node.js. Starting with file G1.mjs causes the execution to follow the green indicators and yields "test" whereas starting with file G2.mjs follows the red indicators and yields "undefined".
These kinds of errors might not be of interest when implementing a stand-alone application since these programs usually have a single and well known entry point only. Yet, cycles can occur also in parts of the program and then the entry point is determined by the order of import statements. Moreover, when writing libraries and exposing an API that spans across several files, the entry point can differ a lot and is defined by the library's user. Hence, in case of an unfortunate setup of files and import statements, a library might suffer from unexpected behavior depending on which part of its API was called first.
Also note that all the examples stated their imports at the top and all other statements below. When mixing import statements or dynamic imports with other code, it is even easier to create reference errors.
Validations in N4JS
One of the goals for N4JS is to provide both many handy and powerful language constructs along with type safety and strong validations. The reason behind the latter one is to prevent especially those errors to happen at runtime that are hard to find and hard to reproduce. Migrated to N4JS, the second example would show validation errors at the references to F1.s and F2.s due to the dependency cycle. The approach to detect these cases is explained in the following paragraphs by first laying out the terminology, reasoning about the general problem afterwards, and then defining the error cases in N4JS.
Terminology
Top level elements are those AST elements of a JavaScript file that are direct children of the root element such as import statements, const or class declarations, and others. Some top level elements can contain expressions or statements such as initializers of consts or extends clauses of classes. These initializers are executed when loading a file. A reference located in such initializers to a top level element (imported or not) is called load time reference.
In addition to compile time and runtime, the term load time is used to refer to the first phase of runtime during which all import statements and top level elements of the started JavaScript file are executed. In this regard we assume that initialization is performed during load time. In a separate step later, some specific calls to the API of imported files would perform the actual requested functionality.
A dependency between two files consist of an import statement and may have imported elements that may be used in the same file. Dependencies with unused imported elements are called unused imports, and those without imported elements are called bare imports. The target of a dependency is the imported file and also the imported element (except for bare imports). There exists at least one dependency for each import statement, and for each code reference to an imported (top level) element. Dependencies are differentiated into three kinds:
Compile time dependencies arise from all non-unused import statements. Runtime dependencies are the subset of compile time dependencies that is necessary at runtime only, i.e. it does not include unused imports or imports used for type information. (We assume that unused imports do not have intended side effects like bare imports have.) Load time dependencies are the subset of runtime dependencies with load time references.
A dependency cycle exists when traversing import statements of one file to the imported files will eventually lead to one of the already visited files. Note that the term dependency cycle refers to files and not necessarily to imported elements. Dependency cycles are differentiated as follows: Compile time dependency cycles are those relying on compile time dependencies. Runtime dependency cycles rely on runtime dependencies and are of special interest later. Load time dependency cycles rely on load time dependencies and are evaluated to errors in N4JS.
Reasoning
To get a clearer understanding, it is important to know the impact of dependency cycles in a program. An inherent property of dependency cycles is that at least one of the import statements during load time gets skipped since it would load a file that is already processed. In a cycle free program, all import statements of all files can be understood as a directed graph of files connected by import statements that define a partial load order. It is usually harmless that the total order of loading files depends on the entry point, i.e. which file is imported first or started the program, since it complies to that partial order. However, in case of dependency cycles the graph contains a cycle which will be broken up at load time to re-establish a directed graph and partial order. That means that the loading of at least one file of each cycle will be skipped because it is already being loaded. Other files that depends on that skipped import might be initialized partially only. Consequently, the entry point, e.g. the order of import statements, impacts whether a file is initialized partially or completely after its import statement was executed. Sorting import statements is a very common IDE feature and usually deemed to be innocent of causing runtime errors. Yet this assumption does not necessarily hold if the program contains dependency cycles.
We learned that partial initialization occurs if a load time initializer accesses a reference to a not yet initialized element of a skipped file. Probably that not yet initialized element will be initialized later during load time, but harm was already done since the current file had read the wrong value. Where exactly did the problem occur? References to not yet initialized elements can be located not only directly in load time initializers but can also be at locations reachable transitively e.g. by calling other functions in between starting from the initializer. Determining all reachable references from load time initializers which potentially access not yet initialized values can only be done by an expensive analysis that is usually imprecise due to over-approximation. In many cases it is even impossible due to reflective calls, dynamic loading etc. However, a simpler way to rule out accesses to partial initialized elements is to make a clear cut and forbid any expressions or statements in load time initializers that cannot be evaluated at compile time, e.g. function calls. On the downside, this strictness also reduces some programming freedom and even rules out legal load time references that would not cause runtime errors.
To summarize the approach: Either runtime dependency cycles need to be removed or - if that is not possible - load time initializers need to be restricted to not reference potentially skipped files.
A very interesting situation is when a runtime dependency cycle C contains a file m that has a load time initializer with a dependency d to file m'. This means that the cycle becomes a cycle that has a correct and an incorrect way of loading its files: Due to load time dependency d file m' must be loaded without being skipped. Still, at least one other import must be skipped to break the cycle. To make sure that loading of m' is not skipped, m' must not be the entry point of the cycle C. Choosing another entry point e.g. file m will result in partially loading m first, loading the rest of the cycle C including m' completely until another import to m is skipped. In other words: A load time dependency to a file m' within a cycle C constrains m' to never be the entry point into C. This situation is illustrated in the figure below.
The figure above shows the third example with additional information about its dependencies and cycles. As you can see there exists a runtime dependency cycle (indicated in blue), since the two files reference each other in runtime import statements. Also indicated in orange there exists a load time dependency because the reference to G2.s is located in an expression of a top level element that is evaluated during load time. Hence, this dependency imposes the constraint that the entry point to the third example must be G1.
In contrast note that the second example has a load time dependency cycle due to the two load time dependencies created by the accesses to F1.s and F2.s.
Error cases
Four types of errors are indicated in different situations regarding load time dependencies. Based on a source code analysis, runtime dependency cycles and references located in top level elements are detected first.
(1) Given this information, load time dependency cycles can be identified and be evaluated to errors. These errors are attached to the references of the load time dependencies.
Three other types of errors occur if and only if there exists a runtime dependency cycle C of modules m and m' (and maybe including others).
(2) Any load time reference in C that references a top level element in C (imported ones or in the same file) is marked with an error. This includes all load time dependencies. The reason to forbid any references to e.g. local or imported functions from C is that these may reach and access partially initialized variables. In N4JS there is one exception to that rule: extends clauses of classes. Load time references are still allowed here and not causing problems because extends clauses in N4JS are already restricted to references to other classes only (and not arbitrary expressions like in JavaScript). Note that ordinary dependencies (i.e. that do not have references in load time code) are still allowed, e.g. within the body of methods.
(3) Any dependency d to a module m' is marked with an error if and only if there exists a load time dependency to m' already. In other words: There may be no other dependency in C to m' if d is a load time dependency. In case an importing module m* is not in C a dependency to m' is allowed.
(4) However, when importing m' from m*, it is mandatory to also import another module m prior to import m*. Otherwise, an error is shown. The import of m prior to the import of m' will ensure that loading of m' is not skipped.
When programming with N4JS and errors like that occur, there are two ways to solve them. First and best solution is to remove the dependency cycle, which in many cases is a code smell already. This can be done by breaking the cycle or merging two or more files or file parts that mutually depend on each other. In case that is not possible, removing some load time dependencies is necessary. However, keep in mind that any load time dependency in a dependency cycle will impose a runtime execution order on the importing file to be loaded always prior to the imported file.
H1.n4js
import * as H2 from "H2";
class C extends H2.C {} // no error (3) here
H2.n4js
import "H1";
export public class C {}
The last example shows a case similar to the third example: The are two files that have a runtime dependency cycle. Additionally, there is a load time dependency created by the extends clause that references H2.C. Note that the third example produces the validation error (3) at the load time reference G2.s because we disallow all non-compile time expressions or statements in load time initializers. This shows where simplifications of our approach might be improved in the future. Since we leave an exception to error (3) in case the load time dependency is an extends clause, the last example shows no errors in N4JS.
Conclusion
The core problem are read accesses to variables that are not yet initialized. While these kinds of problems are relatively obvious and easy to find when happening in a single file, it is much harder to detect them when they are caused due to dependency cycles of two or more files. For the single file case, several IDEs and languages already provide validations and put error markers to read accesses of undefined symbols, such as VSCode for TypeScript. By introducing the validations described in this blog post, N4JS also can rule out initialization errors due to dependency cycles from happening. Unfortunately, in some cases this approach is too strict but we hope to relax some of the restrictions to improve the compromise of program safety and programming freedom.
by Marcus Mews