August 20, 2021

Dependency Cycles During Load Time

When programming in large code bases it can happen inadvertently that cycles of imports are created. In JavaScript, import statements trigger loading and initialization of the specified file directly. In case there is a dependency cycle, files might only be initialized partially and hence errors might occur later during runtime. In this post we present how N4JS detects and avoids these cases by showing validation errors in the source code.


Introduction

Let's start with the most simple example in JavaScript to illustrate the essential problem.


console.log(s); // prints 'test'?
export const s = "test";

Executing the two-liner above results in: 

ReferenceError: Cannot access 's' before initialization.

This is quite obvious and wouldn't surprise anyone. It is obvious because the read access is stated right before the definition of the constant s in the same file. However, it wouldn't be very obvious anymore when both the read access and the definition of s happen in separate files. Let's split up the example into the files F1.mjs and F2.mjs.

F1.mjs


import * as F2 from "./F2.mjs";
export const s = "test";
console.log(F2.s); // prints undefined?

F2.mjs


import * as F1 from "./F1.mjs";
export const s = F1.s;

Executing this example results in a similar error:

ReferenceError: s is not defined.

And again the cause for the error is an access to a not yet initialized variable. As a side note: Modifying the variable to be a 'var' instead of a 'const' would fix the error and return the print-out "undefined". This is due to hoisting of var symbols, but is still not the intended result which would be the print-out "test".

So far, both of the examples either give an unintended result or a compile time error is shown. Errors like these can only be identified after they actually happened during tests or production. While the two-liner example seems way too obvious to actually occur often in practice, the second case can easily hide in projects of many files and imports. A further difference is that even if the first example results in a runtime error, it can usually easily be identified and fixed. The second example however can span across many files by increasing the size of the cycle of import statements and therefore is hard to find and fix.

Two important properties of the execution semantics of JavaScript in Node.js can be witnessed here:

(1) In case a file m is started or imported that imports another file m', a subsequent import back to file m will be skipped. As a result, file m' might be only initialized partially when accessing not yet initialized elements from m.

(2) There is an exception to (1) regarding functions. Since functions are hoisted, they do not have to be reached by the control flow to get initialized. Hoisting will initialize them immediately so that they can be called from any location.

Let's look at a third example which reveals a similar case of reference errors. This time the error occurs depending on the entry point of the program. Have a look at the two files below which either result in the print-out "test" or "undefined" depending on which file was the entry point for node.js. Starting with file G1.mjs causes the execution to follow the green indicators and yields "test" whereas starting with file G2.mjs follows the red indicators and yields "undefined".

These kinds of errors might not be of interest when implementing a stand-alone application since these programs usually have a single and well known entry point only. Yet, cycles can occur also in parts of the program and then the entry point is determined by the order of import statements. Moreover, when writing libraries and exposing an API that spans across several files, the entry point can differ a lot and is defined by the library's user. Hence, in case of an unfortunate setup of files and import statements, a library might suffer from unexpected behavior depending on which part of its API was called first.

Also note that all the examples stated their imports at the top and all other statements below. When mixing import statements or dynamic imports with other code, it is even easier to create reference errors.


Validations in N4JS

One of the goals for N4JS is to provide both many handy and powerful language constructs along with type safety and strong validations. The reason behind the latter one is to prevent especially those errors to happen at runtime that are hard to find and hard to reproduce. Migrated to N4JS, the second example would show validation errors at the references to F1.s and F2.s due to the dependency cycle. The approach to detect these cases is explained in the following paragraphs by first laying out the terminology, reasoning about the general problem afterwards, and then defining the error cases in N4JS.


Terminology

Top level elements are those AST elements of a JavaScript file that are direct children of the root element such as import statements, const or class declarations, and others. Some top level elements can contain expressions or statements such as initializers of consts or extends clauses of classes. These initializers are executed when loading a file. A reference located in such initializers to a top level element (imported or not) is called load time reference.

In addition to compile time and runtime, the term load time is used to refer to the first phase of runtime during which all import statements and top level elements of the started JavaScript file are executed. In this regard we assume that initialization is performed during load time. In a separate step later, some specific calls to the API of imported files would perform the actual requested functionality.

A dependency between two files consist of an import statement and may have imported elements that may be used in the same file. Dependencies with unused imported elements are called unused imports, and those without imported elements are called bare imports. The target of a dependency is the imported file and also the imported element (except for bare imports). There exists at least one dependency for each import statement, and for each code reference to an imported (top level) element. Dependencies are differentiated into three kinds:

Compile time dependencies arise from all non-unused import statements. Runtime dependencies are the subset of compile time dependencies that is necessary at runtime only, i.e. it does not include unused imports or imports used for type information. (We assume that unused imports do not have intended side effects like bare imports have.) Load time dependencies are the subset of runtime dependencies with load time references.

A dependency cycle exists when traversing import statements of one file to the imported files will eventually lead to one of the already visited files. Note that the term dependency cycle refers to files and not necessarily to imported elements. Dependency cycles are differentiated as follows: Compile time dependency cycles are those relying on compile time dependencies. Runtime dependency cycles rely on runtime dependencies and are of special interest later. Load time dependency cycles rely on load time dependencies and are evaluated to errors in N4JS.


Reasoning

To get a clearer understanding, it is important to know the impact of dependency cycles in a program. An inherent property of dependency cycles is that at least one of the import statements during load time gets skipped since it would load a file that is already processed. In a cycle free program, all import statements of all files can be understood as a directed graph of files connected by import statements that define a partial load order. It is usually harmless that the total order of loading files depends on the entry point, i.e. which file is imported first or started the program, since it complies to that partial order. However, in case of dependency cycles the graph contains a cycle which will be broken up at load time to re-establish a directed graph and partial order. That means that the loading of at least one file of each cycle will be skipped because it is already being loaded. Other files that depends on that skipped import might be initialized partially only. Consequently, the entry point, e.g. the order of import statements, impacts whether a file is initialized partially or completely after its import statement was executed. Sorting import statements is a very common IDE feature and usually deemed to be innocent of causing runtime errors. Yet this assumption does not necessarily hold if the program contains dependency cycles.

We learned that partial initialization occurs if a load time initializer accesses a reference to a not yet initialized element of a skipped file. Probably that not yet initialized element will be initialized later during load time, but harm was already done since the current file had read the wrong value. Where exactly did the problem occur? References to not yet initialized elements can be located not only directly in load time initializers but can also be at locations reachable transitively e.g. by calling other functions in between starting from the initializer. Determining all reachable references from load time initializers which potentially access not yet initialized values can only be done by an expensive analysis that is usually imprecise due to over-approximation. In many cases it is even impossible due to reflective calls, dynamic loading etc. However, a simpler way to rule out accesses to partial initialized elements is to make a clear cut and forbid any expressions or statements in load time initializers that cannot be evaluated at compile time, e.g. function calls. On the downside, this strictness also reduces some programming freedom and even rules out legal load time references that would not cause runtime errors.

To summarize the approach: Either runtime dependency cycles need to be removed or - if that is not possible - load time initializers need to be restricted to not reference potentially skipped files.

A very interesting situation is when a runtime dependency cycle C contains a file m that has a load time initializer with a dependency d to file m'. This means that the cycle becomes a cycle that has a correct and an incorrect way of loading its files: Due to load time dependency d file m' must be loaded without being skipped. Still, at least one other import must be skipped to break the cycle. To make sure that loading of m' is not skipped, m' must not be the entry point of the cycle C. Choosing another entry point e.g. file m will result in partially loading m first, loading the rest of the cycle C including m' completely until another import to m is skipped. In other words: A load time dependency to a file m' within a cycle C constrains m' to never be the entry point into C. This situation is illustrated in the figure below.

The figure above shows the third example with additional information about its dependencies and cycles. As you can see there exists a runtime dependency cycle (indicated in blue), since the two files reference each other in runtime import statements. Also indicated in orange there exists a load time dependency because the reference to G2.s is located in an expression of a top level element that is evaluated during load time. Hence, this dependency imposes the constraint that the entry point to the third example must be G1.

In contrast note that the second example has a load time dependency cycle due to the two load time dependencies created by the accesses to F1.s and F2.s.

Error cases

Four types of errors are indicated in different situations regarding load time dependencies. Based on a source code analysis, runtime dependency cycles and references located in top level elements are detected first.

(1) Given this information, load time dependency cycles can be identified and be evaluated to errors. These errors are attached to the references of the load time dependencies.

Three other types of errors occur if and only if there exists a runtime dependency cycle C of modules m and m' (and maybe including others).

(2) Any load time reference in C that references a top level element in C (imported ones or in the same file) is marked with an error. This includes all load time dependencies. The reason to forbid any references to e.g. local or imported functions from C is that these may reach and access partially initialized variables. In N4JS there is one exception to that rule: extends clauses of classes. Load time references are still allowed here and not causing problems because extends clauses in N4JS are already restricted to references to other classes only (and not arbitrary expressions like in JavaScript). Note that ordinary dependencies (i.e. that do not have references in load time code) are still allowed, e.g. within the body of methods.

(3) Any dependency d to a module m' is marked with an error if and only if there exists a load time dependency to m' already. In other words: There may be no other dependency in C to m' if d is a load time dependency. In case an importing module m* is not in C a dependency to m' is allowed.

(4) However, when importing m' from m*, it is mandatory to also import another module m prior to import m*. Otherwise, an error is shown. The import of m prior to the import of m' will ensure that loading of m' is not skipped.

When programming with N4JS and errors like that occur, there are two ways to solve them. First and best solution is to remove the dependency cycle, which in many cases is a code smell already. This can be done by breaking the cycle or merging two or more files or file parts that mutually depend on each other. In case that is not possible, removing some load time dependencies is necessary. However, keep in mind that any load time dependency in a dependency cycle will impose a runtime execution order on the importing file to be loaded always prior to the imported file.

H1.n4js


import * as H2 from "H2";
class C extends H2.C {} // no error (3) here

H2.n4js


import "H1";
export public class C {}

The last example shows a case similar to the third example: The are two files that have a runtime dependency cycle. Additionally, there is a load time dependency created by the extends clause that references H2.C. Note that the third example produces the validation error (3) at the load time reference G2.s because we disallow all non-compile time expressions or statements in load time initializers. This shows where simplifications of our approach might be improved in the future. Since we leave an exception to error (3) in case the load time dependency is an extends clause, the last example shows no errors in N4JS.


Conclusion

The core problem are read accesses to variables that are not yet initialized. While these kinds of problems are relatively obvious and easy to find when happening in a single file, it is much harder to detect them when they are caused due to dependency cycles of two or more files. For the single file case, several IDEs and languages already provide validations and put error markers to read accesses of undefined symbols, such as VSCode for TypeScript. By introducing the validations described in this blog post, N4JS also can rule out initialization errors due to dependency cycles from happening. Unfortunately, in some cases this approach is too strict but we hope to relax some of the restrictions to improve the compromise of program safety and programming freedom.


by Marcus Mews

September 08, 2020

N4JS goes LSP

A few weeks ago we started to publish a VSCode extension for N4JS to the VSCode Marketplace. This was one of the last steps on our road to support LSP-based development tools. We chose to make this major change because of several reasons that affected both users and developers of N4JS.

An N4JS project in VSCode with the N4JS language extension


Our language extension for N4JS is hosted at the Microsoft VSCode Marketplace and will be updated regularly by our Jenkins jobs. Versions will be kept in sync with the language version, compiler version and version of the N4JS libraries to avoid incompatible setups. At the moment, the LSP server supports all main features of the language server protocol (LSP) such as validation, content assist, outline view, jump to definition and implementation, open symbol, the rename refactoring and many more. In addition, it will also generate output files whenever a source change is detected. We therefore heavily improved the incremental LSP builder of the Xtext framework and plan to migrate back those changes to the Xtext repository. For the near future we plan to work on stability, performance and also to support some of the less frequently used LSP features.


When looking back, development of N4JS has been based on the Xtext framework from the start and thus it was straightforward to build an Eclipse-based IDE as our main development tool. Later on, we also implemented a headless compiler used for manual and automated testing from the command line. The development of the compiler already indicated some problems stemming from the tight integration of the Eclipse and the Xtext frameworks together with our language specific implementations. To name an example, we had two separate builder implementations: one for the IDE and the other for the headless compiler. Since the Eclipse IDE is using a specific workspace and project model, we also had two implementations for this abstraction. Another important problem we faced with developing an Eclipse-based IDE was that at some points we had to implement UI tests using the SWTBot framework. For us, SWTBot tests turned out to be very hard to develop, to maintain, and to keep from becoming flaky. Shifting to LSP-based development tools, i.e. the headless compiler and an LSP server, allows us to overcome the aforementioned problems.


Users of N4JS now have the option to either use our extension for VSCode or integrate our LSP server into their favorite IDE themselves, even into the Eclipse IDE. They also benefit from more lightweight tools regarding disk size and start-up performance, as well as a better integration into well-known tools from the JavaScript development ecosystem.



September 30, 2019

Instanceof Type Guards in N4JS

Statically typed languages like Java use instanceof checks to determine the type of an object at runtime. After a successful check, a type cast needs to be done explicitly in most of those languages. In this post we present how N4JS introduced type guards to perform these type casts implicitly. 

No error due to implicit cast in successful instanceof type guard

The example above shows that strict type rules on the any instance a causes errors to show up when accessing the unknown property pX. However, after asserting that a is an instance of X, the property pX can be accessed without errors. A separate type cast is unnecessary, since type inference now also considers instanceof type guard information.


Hover information on variable access of a shows the inferred type

The resulting type is the intersection type of the original type (which is here any) and of all type guards that must hold on a specific variable access (which is here only type X). Keeping the original types any or Object is not necessary and could be optimised later. In case the original type is different, it is necessary to include it in the resulting intersection type. The reason is that the type guard could check for an interface only. If so, property accesses to properties of the original types would cause errors.


Re-definition of a type guarded variable

Two distinct differences between type guards and type declarations are (1) their data flow nature and (2) their read-only effects. Firstly, when redefining (in the sense of the data flow) a variable, the type guard information gets lost. Consequently, subsequent accesses to the variable will no longer benefit from the type guard, since the type guard was invalidated by the re-definition. Secondly, only the original type information is considered for a redefinition. That means that the type guard does not change the expected type and, hence, does not limit the set of types that can be assigned to a type guarded variable.


Further examples for instanceof type guards in N4JS

Data flow analysis is essential for type guards and has been presented in a previous post. Based upon this information, type information for each variable access is computed. Since also complicated data flows are handled correctly, such as in for loops or short circuit evaluation, type guard information is already available in composed condition expressions (see function f3 and f5 above). Aside from being able to nest instanceof type guards (see function f4 above), they also can be used as a filter at the beginning of a function (see function f6 above) or inside a loop: Negating a type guard and then exiting the function or block leaves helpful valid type guard information on all the remaining control flow paths.

by Marcus Mews

August 29, 2019

Redux App Development and Testing in N4JS (Chess Game Part 2)

In large applications, Redux - an implementation of Flux architecture created by Facebook - is often used to organise application code by using a strict data flow in one direction only. Redux is UI agnostic, and can be used in conjunction with any UI library. As a continuation of our chess game tutorial with React, we show how to extract the entire program state out of React components, store it with Redux, and test it with N4JS. The full tutorial is available at eclipse.org/n4js and the sources can be found at github.com/Eclipse/n4js-tutorials.


The first part of the chess game tutorial discussed how to develop a chess game app with React and JSX in N4JS. We have stored the program state - which for instance contains information about the locations of all chess pieces - in the state of the React components directly. As applications become larger, however, the mix of program state and UI makes the application hard to comprehend and difficult to test. To address these issues, we extract the program state from the UI components in the second part of the tutorial.

When using React with Redux, we store the application state in Redux store instead of the state of React components. As a result, React components become stateless UI elements and simply render the UI using the data retrieved from the Redux store. In a Redux architecture, data flows strictly in one direction. The diagram below graphically depicts the action/data flow in a React/Redux app.

Strict data flow of flux architecture application


The action/data flow in the diagram can be roughly understood as follows:
  • When a user interaction is triggered on the React component (e.g. button clicked, text field edited etc.), an action is created. The action describes the changes needed to be updated in the application state. For instance, when a text field is edited, the action created may contain the new string of the text field.
  • Then the action is dispatched to the Redux store whereby the Redux store stores the application state, usually as a hierarchical tree of state.
  • The reducers take the action and the current application state and create an updated application state.
  • If the changes in the application state are to a certain React component, they are forwarded into the component in form of props. The change in props causes the component to re-render.

In the second part of the tutorial we further elaborate on the interaction of React and Redux and migrate the original chess non-Redux app. The tutorial explains the role of the reducer, and how the game state is stored and maintained in the Redux store. Based on storing the game using Redux, the tutorial shows how to test the game application with the N4JS test library Mangelhaft, by for instance checking that valid move destination squares are updated after a chess piece was selected.

Note that the way of testing the game logics is completely UI-agnostic and no React components are involved at all. This is thanks to the decoupling of game logics from UI with the help of Redux.


by Minh Quang Tran

July 16, 2019

React App Development in N4JS (Chess Game Part 1)

React is a popular JavaScript library created by Facebook widely used for developing web user interfaces. In this post we introduce a tutorial on how to develop a chess game based on React, JSX and N4JS. The full tutorial is available (and playable) at eclipse.org/n4js and the sources can be found at github.com/Eclipse/n4js-tutorials.


Chess game implemented in N4JS with React

The chess game app implements the following requirements:
  • When the chess application is started, a chess board of 8x8 squares shall be showed containing 16 white pieces and 16 black pieces in their initial positions.
  • A player in turn shall be able to use the mouse to pick one of the pieces that she/he wants to move. A picked piece shall be clearly recognisable. Moreover, to aid players, especially beginners, whenever a piece is picked, all possible valid destination squares shall be visually highlighted as well.
  • In addition to the game board, there shall be a game information area that shows which player is in turn. Moreover, the game information area shall show a complete history of the game protocolling each move made by the players. As a bonus, jumping back to a previous state of the history shall be possible.

In the tutorial you will learn how to use npm, webpack and React to develop a web application with N4JS and the N4JS IDE. Most of the tutorial will elaborate on specific parts of the implementation and explain for example the graphical representation of the chess board and chess pieces, and how to use React to model the UI. Also, it will explain the game logic, i.e. how possible moves for the different piece types are computed, how the turn history is maintained, and how the end of the game (i.e. a win situation) is detected. In the end, the tutorial will make suggestions on how to improve the chess game e.g. by adding support for the en passant move.

Have fun with implementing this game!

by Minh Quang Tran

March 22, 2019

Automated rename refactoring in N4JS IDE

Refactoring is probably one of the most important tools for us, software developers since we constantly need to change the structure of the code to improve the code quality or to prepare the code for new features etc. The most used refactoring operation is arguably rename refactoring. Find and replace could be used for renaming but the risk of renaming unrelated names is pretty high.

N4JS IDE provides a powerful way of automatically renaming a definition and all its references with a comparable user experience as rename refactoring of Eclipse Java Development Tool (JDT).  The slogan is: I want to rename this thing, do it for me however you like but please in a safe manner so that I can move on! This will exactly be your experience with rename refactoring in N4JS IDE.

Simple rename example

Let's have look at a simple example to see how rename refactoring works in N4JS IDE in action. Assume that we have an N4JS file with the following content.


When the cursor is at A of the constructor new A()and we press Cmd + Shift + R to rename A to B, the rename refactoring suggests that it would rename A to B at 3 different locations. After entering the new name B and pressing enter, the class A and all its usages are renamed to B, fully automatically :-)

Name conflicts detection 

Renaming an element may cause name conflicts. The rename refactoring in N4JS IDE provides comprehensive checks for detecting name conflicts. If the new name would cause name conflicts, the rename refactoring is disallowed.

In the example above,  renaming class A to class C would cause a name conflict because in the script scope the name C already exists. The rename refactoring provided by N4JS IDE can recognize this conflict and shows an error message.



In a large code base, these checks are a true life saver. Imagine having to manually verify these kinds of name conflicts across hundred of files.

Additionally, N4JS IDE's rename refactoring is capable of recognizing name conflicts when renaming

  • members of classifier
  • formal parameters of a function or method
  • field of a structural type
  • enum literal
  • local variable, constant
  • global variable, constant
  • etc.

Rename composed members

N4JS language supports composed elements. Renaming a composed element is somewhat special.




In this example, ab.foo is a composed member because ab is of the intersection type A & B which is composed of both A and B. Renaming ab.foo would rename all the definitions that contribute to the creation of ab.foo as well as all references of these definitions.

Preview of changes

When you start rename refactoring operation, you have the possibility to see the preview of changes before actually executing the operation.



Note that the preview shows the changes in each file in a very recognizable manner.

Undo changes

After the rename refactoring, if you feel regret and would like to undo the operation, simply press Cmd + Z. The undo will undo all the changes in affected files previously done by the rename refactoring.

Current limitations

As the time of this writing, the rename refactoring in N4JS IDE still has several limitations:
  • Renaming alias is not supported
  • Checking name conflicts do not take into account shadowing

By Minh Quang Tran

October 17, 2018

Short-Circuit Evaluation in N4JS

Short-circuit evaluation is a popular feature of many programming languages and also part of N4JS. In this post, we show how the control-flow analysis of the N4JS-IDE deals with short-circuit evaluation, since it can have a substantial effect on the data flow and execution of a program.



Short circuit evaluation is a means to improve runtime performance when evaluating boolean expressions. This improvement is a result of skipping code execution. The example above shows an if-statement whose condition consists of two boolean expressions that combine the values of 1, 2 and 3, and its control flow graph. Note that the number literals are placeholders for more meaningful subexpressions.

First the logical and, then the logical or gets evaluated: (1 && 2) || 3. In case the expression 1 && 2 evaluates to true, the evaluation of the subclause 3 will be skipped and the evaluation of the entire condition results to true. This skipping of nested boolean expressions is called short circuit evaluation.

However, instead of skipping expression 3, expression 2 might be skipped. In case condition 1 does not hold, the control flow will continue with condition 3 right away. This control flow completely takes places within the if-condition, whereas the former short circuit targets the then block.

The reasoning behind short circuit evaluation is that the skipped code does not affect the result of the whole boolean expression. If the left hand side of the logical or expression evaluates to true, the whole or expression also does. Only if the left hand side is false, the right hand side will be evaluated. Complementary, the right hand side of a logical and expression is skipped in case the left hand side evaluates to false.


Side Effects


Risks of short circuit evaluation might arise in case a subexpression has side effects: These side effects will not occur if the subexpression is skipped. However, a program that relies on side effects of expressions inside an if-condition can be called fragile (or adventurous). In any case it is recommended to write side-effect free conditions.


Have a look at the example above. In case variable i has a value of zero, the right hand side expression i++ is executed, otherwise, it is skipped. The side effect here is the post-increment the value of i. If the value of i is other than zero, this value will be printed out. Otherwise, the value will be incremented but not printed. The control flow shows this behavior with the edge starting at i and targeting the symbol console.


Loops


Loop conditions also benefit from short circuit evaluation. This is important to know when reasoning about the all possible control flow paths through the loop: Each short circuit will introduce another path. Combining all of them makes data flow in loops difficult to understand in case of side effects in the subconditions.


Creative use of short circuit evaluation


Misusing short circuit evaluation can mimic if-statements by using expressions but without using the language feature of conditional expressions (i.e. condition() ? then() : else()). This could be used when if-statements should be executed e.g. when passing arguments to method calls, or when computing the update part of for-loops.





The picture above shows the two versions: the first uses an if-statement and the second uses an  expression statement. These two statements call the functions condition, then and end. Depending on the return value of condition, the function then is executed or not. Consequently, the printouts are either "condition then end" or "condition end", depending on the control flow.

The corresponding control flows are depicted on the right: The upper three lines refer to the if-statement, and the lower three lines to the expression statement. They reveal that the expression statement behaves similar to the if-statement. Note that the control flow edge in the last line that skips the nodes end and end() is never traversed since the logical or expression always evaluates to true.

The interested reader would find more details about the N4JS flow graphs and their implementation in the N4JS Design Document, Chapter: Flow Graphs.


by Marcus Mews