Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Redesign Jenkinsfiles #12000

Merged
merged 8 commits into from
Aug 8, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
684 changes: 294 additions & 390 deletions Jenkinsfile

Large diffs are not rendered by default.

57 changes: 14 additions & 43 deletions ci/Jenkinsfile_docker_cache
Original file line number Diff line number Diff line change
Expand Up @@ -22,34 +22,18 @@

// timeout in minutes
total_timeout = 300
git_timeout = 15
// assign any caught errors here
err = null

// initialize source codes
def init_git() {
deleteDir()
retry(5) {
try {
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of
// retries as this will increase the amount of requests and worsen the throttling
timeout(time: git_timeout, unit: 'MINUTES') {
checkout scm
sh 'git submodule update --init --recursive'
sh 'git clean -x -d -f'
}
} catch (exc) {
deleteDir()
error "Failed to fetch source codes with ${exc}"
sleep 2
}
}
node('restricted-mxnetlinux-cpu') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inconsistent with the constant labels style from the other files, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I missed replacing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's explained below:

// Loading the utilities requires a node context unfortunately

We can not use the constants at that point because the utility to load a constant has not been loaded at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a node to define some constants? Could we have a separate file for them and load it before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes. It's not possible to load other groovy scripts without being in a node context

// Loading the utilities requires a node context unfortunately
checkout scm
utils = load('ci/Jenkinsfile_utils.groovy')
}
utils.assign_node_labels(linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu')


try {
utils.main_wrapper(
handler: {
stage("Docker cache build & publish") {
node('restricted-mxnetlinux-cpu') {
node(NODE_LINUX_CPU) {
ws('workspace/docker_cache') {
timeout(time: total_timeout, unit: 'MINUTES') {
init_git()
Expand All @@ -58,24 +42,11 @@ try {
}
}
}

// set build status to success at the end
currentBuild.result = "SUCCESS"
} catch (caughtError) {
node("restricted-mxnetlinux-cpu") {
sh "echo caught ${caughtError}"
err = caughtError
currentBuild.result = "FAILURE"
}
} finally {
node("restricted-mxnetlinux-cpu") {
// Only send email if master failed
if (currentBuild.result == "FAILURE") {
emailext body: 'Generating the Docker Cache has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[DOCKER CACHE FAILED] Run ${BUILD_NUMBER}', to: '${EMAIL}'
}
// Remember to rethrow so the build is marked as failing
if (err) {
throw err
}
,
failure_handler:
{
if (currentBuild.result == "FAILURE") {
emailext body: 'Generating the Docker Cache has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[DOCKER CACHE FAILED] Run ${BUILD_NUMBER}', to: '${EMAIL}'
}
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

151 changes: 151 additions & 0 deletions ci/Jenkinsfile_utils.groovy
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
// -*- mode: groovy -*-

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

// initialize source codes
def init_git() {
deleteDir()
retry(5) {
try {
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of
// retries as this will increase the amount of requests and worsen the throttling
timeout(time: 15, unit: 'MINUTES') {
checkout scm
sh 'git submodule update --init --recursive'
sh 'git clean -d -f'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was -x -d -f before, but it would be better it would be -xdff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
} catch (exc) {
deleteDir()
error "Failed to fetch source codes with ${exc}"
sleep 2
}
}
}

def init_git_win() {
deleteDir()
retry(5) {
try {
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of
// retries as this will increase the amount of requests and worsen the throttling
timeout(time: 15, unit: 'MINUTES') {
checkout scm
bat 'git submodule update --init --recursive'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this check if there's an error? I found that bat doesn't. powershell does to some degree. Why don't we use PS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. The scope of this PR is restructuring the Jenkinsfiles and not a refactor of what we do in there.

bat 'git clean -d -f'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
} catch (exc) {
deleteDir()
error "Failed to fetch source codes with ${exc}"
sleep 2
}
}
}

// pack libraries for later use
def pack_lib(name, libs=mx_lib) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from where is the default value? this was confusing, but not it's more, every file defines its own defaults? I would rather make it explicit for every case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll make it explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sh """
echo "Packing ${libs} into ${name}"
echo ${libs} | sed -e 's/,/ /g' | xargs md5sum
"""
stash includes: libs, name: name
}

// unpack libraries saved before
def unpack_lib(name, libs=mx_lib) {
unstash name
sh """
echo "Unpacked ${libs} from ${name}"
echo ${libs} | sed -e 's/,/ /g' | xargs md5sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we ever had problems that the files were not what we expected? Not sure who introduced this md5sum step. It adds complexity to stash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall having any issues with that. But let's move that to a later discussion

"""
}

def publish_test_coverage() {
// Fall back to our own copy of the bash helper if it failed to download the public version
sh '(curl --retry 10 -s https://codecov.io/bash | bash -s -) || (curl --retry 10 -s https://s3-us-west-2.amazonaws.com/mxnet-ci-prod-slave-data/codecov-bash.txt | bash -s -)'
}

def collect_test_results_unix(original_file_name, new_file_name) {
if (fileExists(original_file_name)) {
// Rename file to make it distinguishable. Unfortunately, it's not possible to get STAGE_NAME in a parallel stage
// Thus, we have to pick a name manually and rename the files so that they can be stored separately.
sh 'cp ' + original_file_name + ' ' + new_file_name
archiveArtifacts artifacts: new_file_name
}
}

def collect_test_results_windows(original_file_name, new_file_name) {
// Rename file to make it distinguishable. Unfortunately, it's not possible to get STAGE_NAME in a parallel stage
// Thus, we have to pick a name manually and rename the files so that they can be stored separately.
if (fileExists(original_file_name)) {
bat 'xcopy ' + original_file_name + ' ' + new_file_name + '*'
archiveArtifacts artifacts: new_file_name
}
}


def docker_run(platform, function_name, use_nvidia, shared_mem = '500m') {
def command = "ci/build.py --docker-registry ${env.DOCKER_CACHE_REGISTRY} %USE_NVIDIA% --platform %PLATFORM% --docker-build-retries 3 --shm-size %SHARED_MEM% /work/runtime_functions.sh %FUNCTION_NAME%"
command = command.replaceAll('%USE_NVIDIA%', use_nvidia ? '--nvidiadocker' : '')
command = command.replaceAll('%PLATFORM%', platform)
command = command.replaceAll('%FUNCTION_NAME%', function_name)
command = command.replaceAll('%SHARED_MEM%', shared_mem)

sh command
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace



def assign_node_labels(args) {
NODE_LINUX_CPU = args.linux_cpu
NODE_LINUX_GPU = args.linux_gpu
NODE_LINUX_GPU_P3 = args.linux_gpu_p3
NODE_WINDOWS_CPU = args.windows_cpu
NODE_WINDOWS_GPU = args.windows_gpu
}

// assign any caught errors here
err = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the variable outside the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy & paste -> moved it inside

def main_wrapper(args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this main_wrapper? Can we add a documentation string? is not clear without greping around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀🤔

// hander: Core logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling. Plus if we're going to have two handlers I'd give them both descriptive names. i.e. failure_handler, build_handler. I'm also not sure about the name handler, since we're only handling errors in one case and not responding to explicit events. Maybe build and handle_error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, renamed

// failure_handler: Failure handler

try {
// Call actual handler
args['handler']()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make the strings constants in this function (as you have in the other Jenkinsfile)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these strings are only used once, does it make sense to make them constants? It would then be along the lines of

FAILURE_HANDLER_ARG = 'failure_handler'
...
args[FAILURE_HANDLER_ARG]()

Given that situation, do you think it adds value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People often do, but I particularly don't find it super valuable. I think it's a good idea to extract constants when it's used twice, let's do that.


// set build status to success at the end
currentBuild.result = "SUCCESS"
} catch (caughtError) {
node(NODE_LINUX_CPU) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify this? It looks strange, why call bash to do echo? We can print from groovy right? Why match this particular node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question

sh "echo caught ${caughtError}"
err = caughtError
currentBuild.result = "FAILURE"
}
} finally {
node(NODE_LINUX_CPU) {
// Call failure handler
args['failure_handler']()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is in a finally. Why not call this from the catch? Then we wouldn't need the check in each failure_handler to see if it's result failed. We also wouldn't have to store the err and then re-throw it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. In the past, we had nested functions which were expected to continue although they had a failure and then exit at a later point in time. I don't think this applies anymore.

Could we save this up for a refactor? I'm trying not to make any behavioural modifications here.


// Remember to rethrow so the build is marked as failing
if (err) {
throw err
}
}
}
}
return this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline at end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

54 changes: 13 additions & 41 deletions docs/Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,18 @@

// timeout in minutes
max_time = 60
// assign any caught errors here
err = null

// initialize source code
def init_git() {
deleteDir()
retry(5) {
try {
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of
// retries as this will increase the amount of requests and worsen the throttling
timeout(time: 15, unit: 'MINUTES') {
checkout scm
sh 'git submodule update --init --recursive'
sh 'git clean -d -f'
}
} catch (exc) {
deleteDir()
error "Failed to fetch source codes with ${exc}"
sleep 2
}
}
node('restricted-mxnetlinux-cpu') {
// Loading the utilities requires a node context unfortunately
checkout scm
utils = load('ci/Jenkinsfile_utils.groovy')
}
utils.assign_node_labels(linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu')

try {
utils.main_wrapper(
handler: {
stage('Build Docs') {
node('restricted-mxnetlinux-cpu') {
node(NODE_LINUX_CPU) {
ws('workspace/docs') {
init_git()
timeout(time: max_time, unit: 'MINUTES') {
Expand All @@ -58,24 +44,10 @@ try {
}
}
}

// set build status to success at the end
currentBuild.result = "SUCCESS"
} catch (caughtError) {
node("restricted-mxnetlinux-cpu") {
sh "echo caught ${caughtError}"
err = caughtError
currentBuild.result = "FAILURE"
}
} finally {
node("restricted-mxnetlinux-cpu") {
// Only send email if master failed
if (currentBuild.result == "FAILURE") {
emailext body: 'Generating the website has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[WEBSITE FAILED] Build ${BUILD_NUMBER}', to: '${EMAIL}'
}
// Remember to rethrow so the build is marked as failing
if (err) {
throw err
}
,
failure_handler: {
if (currentBuild.result == "FAILURE") {
emailext body: 'Generating the website has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[WEBSITE FAILED] Build ${BUILD_NUMBER}', to: '${EMAIL}'
}
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Loading