Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Option for JNLP: Pass Slave Connection Arguments to Entrypoint #596

Merged
merged 1 commit into from
Jan 16, 2018

Conversation

odavid
Copy link

@odavid odavid commented Dec 28, 2017

Hi,

I am using the docker jnlp slave to to build docker images by attaching the /var/run/docker.sock to the slave container.
In order for it to work, I created an image derived from jenkins/jnlp-slave.

Basically the user that runs the entrypoint is 'root' and during startup, it checks for the GID that owns the /var/run/docker.sock on the host. Then, it changes the 'jenkins' user to be member of that group.

# To enable docker cloud based on docker socket,
# we need to add jenkins user to the docker group
if [ -S /var/run/docker.sock ]; then
  DOCKER_SOCKET_OWNER_GROUP_ID=$(stat -c %g /var/run/docker.sock)
  groupadd -for -g ${DOCKER_SOCKET_OWNER_GROUP_ID} docker
  id jenkins -G -n | grep docker || usermod -aG docker jenkins
fi

At the end, the entrypoint replaces the user to be 'jenkins' and passes the arguments it got back to the original entrypoint

...
exec gosu jenkins "jenkins-slave" "$@"

It works very well with kubernetes-plugin and amazon-ecs-plugin, however, with docker-plugin it does not work in lot of situations.

After investigation, I noticed that the problem is the way the docker-plugin runs the entrypoint with /bin/sh in detached mode and then exec the slave.jar.
Although I understand the reasoning behind that - (i.e to be able to inject the slave.jar), it causes the 'jenkins' user to start the java process before it was a member of the docker group.

Pseudo Example:

docker run -t -d -v /var/run/docker.sock:/var/run/docker.sock --name my-jenkins-slave odavid/jenkins-jnlp-slave /bin/sh
docker exec -t -u jenkins my-jenkins-slave bash -c id

# can result (depends on how fast it ran...)
# uid=10000(jenkins) gid=10000(jenkins) groups=10000(jenkins)

docker run -t -d --name my-jenkins-slave -v /var/run/docker.sock:/var/run/docker.sock odavid/jenkins-jnlp-slave bash -c id
# will always result
# uid=10000(jenkins) gid=10000(jenkins) groups=50(docker),10000(jenkins)

In order to solve that, I wanted to have an additional option (while keeping the current behaviour as the default one):

  • Pass Slave Connection Arguments to Entrypoint - When this option is enabled, the plugin will only run the container without any "docker exec" processes and pass the slave.jar arguments to the entrypoint. Using this way can help people do some initial preparation within the entrypoint before running the slave.jar. This also simplifies the execution and make it aligned with with kubernetes-plugin and amazon-ecs-plugin.

In order to implement this feature, I had to re-factor the code in the following steps:

  • Make the ability for the Launcher to start a container and not just get a container Id
  • Move the beforeContainerCreated(), beforeContainerStarted(), afterContainerStarted() to a dedicated DockerContainerLifecycleHandler class, so it will be used as a callback
  • Move the docker container start to a dedicated class: i.e. DockerContainerExecuter
  • Enable the CreateContainerCmd to be manipulated with a reference to the SlaveComputer before the container starts, so I can pass the computer.JnlpMac and computer.name to the entrypoint
  • In order to keep everything work with Serialization, I needed also to pass the template to the createLauncher and use DockerTemplate.createContainerCmd within the launcher.launch()
  • Since the containerId is not available during the laucher object creation, I had to change the unique slave name to a different unique

I will be very happy if I can this PR approved, since currently I have only 2 options:

  • Use root user instead of jenkins - this makes it hard with cache volumes. Also I don't really like root to be the process owner.
  • Use jenkins-swarm or other static slaves (There are several places I work who do not have kubernetes or ECS)

I would really appericiate if you can approve this PR.

Best,
Ohad

@ndeloof
Copy link
Contributor

ndeloof commented Jan 9, 2018

The reason launcher is executed in an exec is not to copy the remoting.jar (this can happen before container is started) but to ensure we can reconnect after jenkins is restarted, typically within a pipeline.

I personally dislike this approach for two reasons:

  1. using gosu require to initially run container as root
  2. exposing /var/run/docker.sock inside container is a major security issue for underlying docker infrastructure.

That being said I understand the intent here.

Copy link
Contributor

@ndeloof ndeloof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a high level point of view.
Some notes:

  • assuming a JNLP agent will reconnect to master automatically if the later get restarted, running remoting.jar as main command should be the standard behaviour, not just an option. As we already require the entrypoint to run /bin/sh as a custom command there's no risk for backward compatibility issue.
  • I'd like we keep the slave name being docker-{containerId}. For sure we can't predict the assigned ID before container is created, and we need agent name to create container. But we also can also create container with an explicit name and use this name for API calls. Just need to define a unique ID generator, maybe using hex value of system.nanotime, or better: UUIDs.

@@ -10,5 +10,14 @@
<f:textbox/>
</f:entry>

<f:entry title="${%Jenkins URL}" field="jenkinsUrl">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the goal of this parameter. Jenkins external URL can be configured globaly, and tunnel option is designed to cover reverse-proxy/firewall constraints

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ndeloof,
I've added that parameter (actually just exposed it, since it was there before for the sake of Unit Tests), just because it existed in both kubernetes-plugin and aws-ecs-plugin, so I thought it would be nice as standard way to override the current Jenkins URL.
It is not related to this pull request, so I can remove it if you think it is not needed.
As a side comment, sometime you don't have a real DNS name, but just an IP and you don't want it to be defined in the global JENKINS configuration, since it is affecting other plugins. IMHO, have this parameter as an optional one would be nice, but again, I can remove it if think it should be removed.

@ndeloof
Copy link
Contributor

ndeloof commented Jan 9, 2018

@odavid
Copy link
Author

odavid commented Jan 9, 2018

@ndeloof, I believe that hex value of system.nanotime would be enough here and will serves us well.
In any case, names-generator can be an overkill for this purpose. I do think that we can add the template name as a prefix for the slave name.

Copy link
Contributor

@ndeloof ndeloof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me there's many unnecessary refactoring and intermediate objects introduced here, which makes revue harder. DockerTransientNode can keep track of the target container ID/name. Also, java JNLP agent as main process should just be the default in replacement for exec-based mechanism, which would make this even simpler


private final String containerId;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not remove this useful information. Just make this non-final and assign as container has been created

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored.

// if something went wrong, cleanup aborted container
client.removeContainerCmd(containerId).withForce(true).exec();
throw e;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to move this code away and introduce DockerContainerLifecycleHandler(which makes DockerComputerConnector role unclear). Just compute containerName, create DockerTransientNode object, then enforce here CreateContainerCmd.name is set to this unique ID and pass node object to connector methods for launcher setup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to the DockerComputerConnector, since it must be called from the launcher itself and cannot run before the launcher is being created. In this iteration, I tried to keep the original code as it was when I could (sorry for the many refactoring - did not try to make your review hard, it was just multiple iterations).
Also removed the DockerContainerLifecycleHandler and restored the original methods inside the DockerComputerConnector)

super("docker-" + containerId.substring(0,12), remoteFS, launcher);

public DockerTransientNode(@Nonnull String uniqueName, String workdir, ComputerLauncher launcher) throws Descriptor.FormException, IOException {
super("docker-" + uniqueName, workdir, launcher);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't hurt to also store containerName as an attribute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if(!passSlaveConnectionArgs){
launchAndInjectJar(computer, listener, handler, api, containerExecuter, workdir, cmd);
}else{
launchStandardJNLPImage(computer, listener, handler, api, containerExecuter, workdir, cmd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just replace the exec-based implementation with java remoting.jar set as main process in container

<f:textbox/>
</f:entry>

<f:entry title="${%Pass Slave Connection Arguments to Entrypoint}" field="passSlaveConnectionArgs">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, just make this the default behaviour for JNLP connector

@@ -0,0 +1,17 @@
<div>
<p>
<span class="info">Prerequisites:</span> Docker image must derive from <a href="https://github.com/jenkinsci/docker-jnlp-slave/">jenkins/jnlp-slave</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong. This isn't a MUST requirement. Just need java available on arbitrary image.
(also not required assuming you make "passSlaveConnectionArgs" the standard behaviour, without such an option)

@@ -34,6 +37,7 @@
private String user;
private final JNLPLauncher jnlpLauncher;
private String jenkinsUrl;
private boolean passSlaveConnectionArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, just make this the standard behaviour

@ndeloof
Copy link
Contributor

ndeloof commented Jan 15, 2018

please drop "passSlaveConnectionArgs" option to just make this the default mechanism, and please also squash your commits so we can have a more readable history for those changes.

@odavid odavid force-pushed the jnlp-simple-slave branch from d1fd9c9 to d5a8346 Compare January 15, 2018 23:59
@@ -412,35 +400,40 @@ public void onNext(PullResponseItem item) {

}

@Restricted(NoExternalUse.class)
public DockerTransientNode provisionNode(DockerAPI api, TaskListener listener) throws IOException, Descriptor.FormException, InterruptedException {
public class ContainerCommandCreator implements Serializable{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ndeloof, I had to add this ContainerCommandCreator, since the CreateContainerCmd is not Serializable and therefore when passing it to the anonymous DelegatingComputerLauncher(new JNLPLauncher())) through the createLauncher method, Jenkins will fail serialization.
So the trick was to do it lazily using the DockerTemplate which is serializable. So this class is a holder for the uniqueContainerName and when needed, the it creates the CreateContainerCmd and fills it will all the required data from the DockerTemplate instance.

@odavid
Copy link
Author

odavid commented Jan 16, 2018

Hi,
I've dropped the passSlaveConnectionArgs and made it the default one.
Also squashed my commits into one.
Best regards,
Ohad

@ndeloof ndeloof merged commit 4ae1f17 into jenkinsci:master Jan 16, 2018
@odavid
Copy link
Author

odavid commented Jan 16, 2018

Thank you!

@odavid odavid deleted the jnlp-simple-slave branch January 16, 2018 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants