Sample Header Ad - 728x90

Why are four Jenkins build nodes suddenly having fatal problems finding things on the PATH?

0 votes
0 answers
61 views
In our overall organization, we run Jenkins 2.303.1 onprem. We run thousands of builds a day. The project I work on uses one Jenkins master and a set of about ten build nodes. We build a few hundred Maven/Java/Spring applications with similar architectures. In the build process, we have a "tools image" that contains java and mvn and some other tools. Yesterday, we updated the build process to reference a newer version of the tools image that has some additional tools we need to use. A little later after we made that update, we noticed that there were now four build nodes where builds were all failing in the same way, with this approximate command line and output: + bash -o pipefail -c mvn -U -s ... -Duser.home=/ clean compile test-compile 2>&1 | tee mvn.out The JAVA_HOME environment variable is not defined correctly, this environment variable is needed to run this program. Note that this command is run by a "sh" pipeline step. This error message comes from inside the "mvn" script. This error will occur if it finds that $JAVA_HOME/bin/java doesn't exist. I then added several "sh" calls before this to show the following: * which java * which mvn * ls -lt $JAVA_HOME/bin/java On the "bad" nodes, the result from both of the first two commands was an empty string. That means that neither "java" nor "mvn" are found in the PATH, or they are not executable. On the "good" nodes, they print the expected location of the "java" and "mvn" executable. The output from third command is this: -rwxr-xr-x. 1 root root 12768 Oct 17 21:48 /opt/java/openjdk/bin/java I also added the "env" output before this. It shows that "JAVA_HOME" is equal to "/opt/java/openjdk", and that PATH has the bin directories of both the mvn and java distribution in the PATH. This evidence shows multiple factors that just don't make sense together. The "mvn" script is clearly complaining that $JAVA_HOME/bin/java does not exist, but the sh output clearly shows it does. The "which mvn" output says that "mvn" is not found in the PATH, but the bash command line above is executing just "mvn" without an absolute path, so the only way it could get to it is from the PATH, and it clearly shows that it is finding it, otherwise that error message would not be printed from inside the "mvn" script. I've tried to compare several aspects of the builds running on the "good" nodes with the ones running on the "bad" nodes. For instance, I copied the list of env vars from both and compared them, and there were no significant differences. We tried restarting the bad build nodes. We tried purging the entire local docker cache and restarting docker. Neither of these steps made a difference. I'm looking for any ideas of possible areas to explore to explain this problem. We've had several people staring at this for quite a while now, including one person who maintains the Jenkins build nodes, one person who maintains the tools image, and several others with considerable experience. We are all drawing a blank here.
Asked by David M. Karr (1173 rep)
Jan 19, 2024, 09:09 PM
Last activity: Jan 19, 2024, 09:30 PM