A new proposal within the OpenJDK community aims to improve Java application startup times by introducing an API to save and restore the state of the Java runtime. Known as the CRaC (Coordinated Restore at Checkpoint) project, this initiative was introduced by Anton Kozlov, a senior software engineer at Azul Systems. The goal of the project is to enable Java applications to bypass the lengthy startup and warm-up phases by preserving the state of the Java runtime. By saving this state, instances of the application can be started more quickly, thus significantly reducing startup time.
The CRaC project would implement an API that coordinates between the Java application and the runtime environment to handle state saving and restoration. The proposed approach includes various methods for saving state, such as virtual machine snapshots, container snapshots, and leveraging the CRIU (Checkpoint/Restore In Userspace) project on Linux. By using these methods, the Java runtime could efficiently save the state at specific checkpoints, allowing applications to resume execution without the usual delays associated with cold starts.
While this solution offers promising performance improvements, the proposal acknowledges several challenges in its implementation. One of the main concerns is ensuring the saved state remains consistent even when the execution environment undergoes changes. Furthermore, the simultaneous startup of multiple instances from a saved state raises the issue of maintaining uniqueness in their execution. The proposal suggests that these problems could be mitigated by making Java applications aware of when the state is saved and restored. This awareness would enable the application to handle changes in the environment and ensure that each instance behaves independently.
To address these challenges, the proposal outlines the development of a flexible API that can work with various underlying mechanisms for state saving and restoring. Additionally, the proposal emphasizes the importance of incorporating safety checks into the API and runtime environment to ensure that state saving does not occur if it could result in incorrect behavior after the restore. This careful handling would help prevent errors and ensure that the feature works reliably across different environments and use cases.