Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(repo/config-migration): error fetching config file when repo cache is enabled #16664

Conversation

Gabriel-Ladzaretti
Copy link
Collaborator

@Gabriel-Ladzaretti Gabriel-Ladzaretti commented Jul 20, 2022

Changes

Use platform.readLocalFile to fetch repo config file.
currently we are using fs.getLocalFile which fails when repoCache=enabled. which in turn causes migration PR to not be created/autoclosed.

DEBUG: MigratedDataFactory.getAsync() Error initializing renovate MigratedData
{
  "err": {
    "message": "Expected a string",
    "stack": "TypeError: Expected a string\n    at module.exports (/home/ubuntu/renovateapp/node_modules/detect-indent/index.js:134:9)\n    at Function.build (/home/ubuntu/renovateapp/node_modules/renovate/dist/workers/repository/config-migration/branch/migrated-data.js:80:56)\n    at async Function.getAsync (/home/ubuntu/renovateapp/node_modules/renovate/dist/workers/repository/config-migration/branch/migrated-data.js:54:26)\n    at async finaliseRepo (/home/ubuntu/renovateapp/node_modules/renovate/dist/workers/repository/finalise/index.js:18:36)\n    at async Object.renovateRepository (/home/ubuntu/renovateapp/node_modules/renovate/dist/workers/repository/index.js:58:13)\n    at async renovateRepository (/home/ubuntu/renovateapp/app/worker/index.js:309:26)\n    at async /home/ubuntu/renovateapp/app/worker/index.js:569:5"
  }
}
DEBUG: checkConfigMigrationBranch()
DEBUG: checkConfigMigrationBranch() Config does not need migration
...
INFO: Autoclosing PR
{
  "branchName": "renovate/migrate-config",
  "prNo": 13,
  "prTitle": "Configure Renovate"
}
DEBUG: updatePr(13, Configure Renovate - autoclosed, body)
DEBUG: PR updated
{
  "pr": 13
}
DEBUG: Deleted remote branch
{
  "branchName": "renovate/migrate-config"
}

Context

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please tick one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • ran on a real repository

@Gabriel-Ladzaretti Gabriel-Ladzaretti changed the title buf(repo/config-migration): error fetching config file when repo cache is enabled fix(repo/config-migration): error fetching config file when repo cache is enabled Jul 20, 2022
@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

These functions sound confusingly similar. Should we be looking deeper at why we have both and why they don't behave the same?

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Gabriel-Ladzaretti commented Jul 20, 2022

These functions sound confusingly similar. Should we be looking deeper at why we have both and why they don't behave the same?

on top of that, im not sure why fs.readLocalFile fails when repo cache is enabled though.
i mean we do clone the repository so the file should exist 🤔

@Gabriel-Ladzaretti
Copy link
Collaborator Author

this workaround is getting the file from github instead of getting it locally.
I will try to figure out why fs is failing when repo cache is enabled.

@Gabriel-Ladzaretti
Copy link
Collaborator Author

So the problem was that whenever repo cache is enabled, the repo content isnt being saved under the regular file based cache but rather in the repoCache. thats why fs.readLocalFile returns null and in turn, detectIndent throws as it is expecting a string as an arg but gets null.

@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

We shouldn't trigger a clone just for the purpose the checking migration though. This would be really wasteful.

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Gabriel-Ladzaretti commented Jul 20, 2022

Its not a full clone, when it comes to github for example its only one additional api call.
we are getting a json response with the raw content encoded in base64, we decode it and thats it.

image

@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

We don't want to be cloning or using an extra API call every run if it's only for the purpose of this migration PR and such a PR already exists and doesn't need updating. Ie I prefer to make our cache smarter instead of consuming extra GitHub calls

@Gabriel-Ladzaretti
Copy link
Collaborator Author

current repo cache already have the config name, is saving its content an option?

@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

Sure, if that can avoid unnecessary cloning or API calls

@viceice
Copy link
Member

viceice commented Jul 20, 2022

instead of saving the config to the cache, we should simply check if current branch sha is the same as before and skip the whole migration check, as the branch is already up to date.
We maybe need to store the renovate version too, so we can detect that some config has changed. Or add a new manual version flag to invalidate when we add a new migration.

@Gabriel-Ladzaretti Gabriel-Ladzaretti marked this pull request as draft July 20, 2022 12:31
@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

As long as it works in both cache and non cache scenarios

@Gabriel-Ladzaretti
Copy link
Collaborator Author

instead of saving the config to the cache, we should simply check if current branch sha is the same as before and skip the whole migration check, as the branch is already up to date.

Isnt the branch sha is changed on new commits to it? if so, we shouldnt expect too many sha changes to the migration branch.
what about comparing current parent sha to the saved parent sha in the branch cache?

@rarkins
Copy link
Collaborator

rarkins commented Jul 20, 2022

It's only the default branch sha which should matter, because it holds the config. And if it has changed then we would have cloned in most cases.

Could you summarise again what exactly causes this problem? Eg is it that we haven't fully synced got on some runs, or something else?

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Gabriel-Ladzaretti commented Jul 20, 2022

The root of it is that this feature was written with the onboarding PR as it's model, therefore repository cache was not taken into consideration and honestly i wasnt fully aware of it yet.

The current main logic is as follows:
we use the already existing migrateConfig to determine if the config needs migration, if it does not - null is returned (to the caller of MigratedDataFactory.getAsync).

In MigratedDataFactory we have -

// get migrated config
const { isMigrated, migratedConfig } = migrateConfig(configFileParsed);
if (!isMigrated) {
return null;
}

This null indicates that the config does not need migration

export async function checkConfigMigrationBranch(
config: RenovateConfig,
migratedConfigData: MigratedData | null
): Promise<string | null> {
logger.debug('checkConfigMigrationBranch()');
if (!migratedConfigData) {
logger.debug('checkConfigMigrationBranch() Config does not need migration');
return null;
}

These two function calls are invoked here -

if (config.configMigration) {
const migratedConfigData = await MigratedDataFactory.getAsync();
const migrationBranch = await checkConfigMigrationBranch(
config,
migratedConfigData!
); // null if migration not needed

So the main problem is that it is assumed that the repository config file always exists, which it doesnt when repo cache is enable, so in turn fs.readLoaclFile fails and causes exception in detectIndent which causes null to be return -> which mistakably treated as if the repo does not need migration and basically the branch is getting pruned as "it shouldn't exist" and the PR is auto closed.

The suggested solution, as per this discussion would be to cache this branch, and check the parent sha to determine whether to try migration and fetch the raw config or not.


Above is the PR closing cause.
the reopening occurs whenever the repo was cloned before finalizing it's run, in this case fs.readRawFile succeeds and a migration is triggered.

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Also, the migration flow happens after the cache is set, therefore the branch data isnt saved.
Unlike the onboarding PR which does get cached.

cache is set here -

await ensureOnboardingPr(config, packageFiles, branches);
const res = await updateRepo(config, branches);
setMeta({ repository: config.repository });
addSplit('update');
await setBranchCache(branches);

migration done in the repo finalize stage -

if (config.configMigration) {
const migratedConfigData = await MigratedDataFactory.getAsync();
const migrationBranch = await checkConfigMigrationBranch(
config,
migratedConfigData!
); // null if migration not needed
if (migrationBranch) {
branchList.push(migrationBranch);
await ensureConfigMigrationPr(config, migratedConfigData!);
}
MigratedDataFactory.reset();
}
await repositoryCache.saveCache();

@Gabriel-Ladzaretti
Copy link
Collaborator Author

In the current state, we cant use repository cache in the migration flow.
This is because the cache is set before the finalize flow which contains the migration, therefore we cant cache the migration branch.

As i see it, we have three options:

  1. Move the migration flow to the worker's main flow, just before the cache is set
  2. Save a snapshot of the cache before it is being overwritten + alter the cache after it is being set in order to save the migration branch data
  3. Set the repository cache in the finalize stage

1,2 are doable, 2 requires cache manipulation. 3 i assume would break quite a few things.

To sum it up, can we move the migration flow into the repo worker's main flow? Note that we call renovateRepository recursively after the cache is set.

WDYT?

once cache is available, we can compare parent Sha in order to determine if new migration is required.

config = await initRepo(config);
addSplit('init');
const { branches, branchList, packageFiles } = await extractDependencies(
config
);
if (
GlobalConfig.get('dryRun') !== 'lookup' &&
GlobalConfig.get('dryRun') !== 'extract'
) {
await ensureOnboardingPr(config, packageFiles, branches);
const res = await updateRepo(config, branches);
setMeta({ repository: config.repository });
addSplit('update');
await setBranchCache(branches);
if (res === 'automerged') {
if (canRetry) {
logger.info('Renovating repository again after automerge result');
const recursiveRes = await renovateRepository(repoConfig, false);
return recursiveRes;
}
logger.debug(`Automerged but already retried once`);

@rarkins
Copy link
Collaborator

rarkins commented Jul 24, 2022

Is (1) a small change? I'm not sure I can fully visualize the impact but if it's simple to show in a PR then let's just take a look

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Is (1) a small change? I'm not sure I can fully visualize the impact but if it's simple to show in a PR then let's just take a look

This is all there is to it.

now we cache the branch and can utilize the repository cache within the migration flow.

@rarkins
Copy link
Collaborator

rarkins commented Jul 24, 2022

That looks like an elegant solution to me. Do you have any concerns or doubts about it?

@Gabriel-Ladzaretti
Copy link
Collaborator Author

That looks like an elegant solution to me. Do you have any concerns or doubts about it?

Currently the only thing i can think of is that this flow might be run more than once in a current run due to the recursive call to renovateRepository.

that said, thats not much different than running self hosted with cache disable, what i mean is that we detect that the branch is not modified and we basically skip it. same goes for cache, if we utilize it correctly this shouldnt be a problem.

other than that the logic stays the same, we just change execution point

@rarkins
Copy link
Collaborator

rarkins commented Jul 24, 2022

Recursive call within Renovate OSS or within the app's wrapper?

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Gabriel-Ladzaretti commented Jul 24, 2022

OSS -

declaration -

// istanbul ignore next
export async function renovateRepository(
repoConfig: RenovateConfig,
canRetry = true
): Promise<ProcessResult | undefined> {

recursive call -

if (res === 'automerged') {
if (canRetry) {
logger.info('Renovating repository again after automerge result');
const recursiveRes = await renovateRepository(repoConfig, false);
return recursiveRes;
}
logger.debug(`Automerged but already retried once`);

rubber ducking here, seems that the recursive depth is 2 at most due to the canRetry arg

@viceice
Copy link
Member

viceice commented Jul 24, 2022

and it's only done on automerge

@Gabriel-Ladzaretti
Copy link
Collaborator Author

Gabriel-Ladzaretti commented Aug 6, 2022

The base-branch SHA comparison won't help us as much. I've tested it and I think it won't work as easily, as the parentSha saved for a given branch is the SHA at the moment of commit to the new branch.
Therefore, all it takes to be out of sync in this approach is a single commit to the base branch.
This obviously happens quite a bit, while the config is unchanged most of the time, but in this scenario we will try to migrate on every run which will lead for some extra api/fs work.

A possible solution might be in the form of

as the migration flow is already dependent on detectRepoFileConfig which itself fetches the config, therefore caching and returning an additional raw data from it saves us doing so again and we are not reliant on the repo cache at all.

This is done regardless of repo cache.

@Gabriel-Ladzaretti
Copy link
Collaborator Author

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants