Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add workflow and script to check edit links on docs #3557

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .github/workflows/check-edit-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Weekly Link Checker

on:
schedule:
- cron: '0 0 * * 0' # Runs every week at midnight on Sunday
workflow_dispatch:

jobs:
check-links:
name: Run Link Checker and Notify Slack
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'

- name: Install dependencies
run: npm install

- name: Run link checker
id: linkcheck
run: |
npm run test:editlinks | tee output.log

anshgoyalevil marked this conversation as resolved.
Show resolved Hide resolved
- name: Extract 404 URLs from output
id: extract-404
run: |
ERRORS=$(sed -n '/URLs returning 404:/,$p' output.log)
echo "errors<<EOF" >> $GITHUB_OUTPUT
echo "$ERRORS" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: Notify Slack
if: ${{ steps.extract-404.outputs.errors != '' }}
uses: rtCamp/action-slack-notify@v2
env:
SLACK_WEBHOOK: ${{ secrets.WEBSITE_SLACK_WEBHOOK }}
SLACK_TITLE: 'Edit Links Checker Errors Report'
SLACK_MESSAGE: |
🚨 The following URLs returned 404 during the link check:
```
${{ steps.extract-404.outputs.errors }}
```
MSG_MINIMAL: true
4 changes: 4 additions & 0 deletions components/layout/DocsLayout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ interface IDocsLayoutProps {
*/
function generateEditLink(post: IPost) {
let last = post.id.substring(post.id.lastIndexOf('/') + 1);

if (last.endsWith('.mdx')) {
last = last.replace('.mdx', '.md');
}
const target = editOptions.find((edit) => {
return post.slug.includes(edit.value);
});
Expand Down
4 changes: 2 additions & 2 deletions config/edit-page-config.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[
{
"value": "/tools/generator",
"href": "https://github.com/asyncapi/generator/tree/master/docs"
"href": "https://github.com/asyncapi/generator/tree/master/apps/generator/docs"
},
{
"value": "reference/specification/",
Expand All @@ -19,4 +19,4 @@
"value": "reference/extensions/",
"href": "https://github.com/asyncapi/extensions-catalog/tree/master/extensions"
}
]
]
11 changes: 8 additions & 3 deletions jest.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ module.exports = {
coverageReporters: ['text', 'lcov', 'json-summary'],
coverageDirectory: 'coverage',
collectCoverageFrom: ['scripts/**/*.js'],
coveragePathIgnorePatterns: ['scripts/compose.js', 'scripts/tools/categorylist.js', 'scripts/tools/tags-color.js'],
coveragePathIgnorePatterns: [
'scripts/compose.js',
'scripts/tools/categorylist.js',
'scripts/tools/tags-color.js',
'scripts/markdown/check-editlinks.js'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the new script to ignore Jest coverage to pass CI. Will create a new good first issue for other contributors to add test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a practice to add test along with the code, so don't add the file here. Instead add the relevant tests for the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm okay. will add it

],
anshgoyalevil marked this conversation as resolved.
Show resolved Hide resolved
// To disallow netlify edge function tests from running
testMatch: ['**/tests/**/*.test.*', '!**/netlify/**/*.test.*'],
};
testMatch: ['**/tests/**/*.test.*', '!**/netlify/**/*.test.*']
};
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"generate:tools": "node scripts/build-tools.js",
"test:netlify": "deno test --allow-env --trace-ops netlify/**/*.test.ts",
"test:md": "node scripts/markdown/check-markdown.js",
"test:editlinks": "node scripts/markdown/check-editlinks.js",
"dev:storybook": "storybook dev -p 6006",
"build:storybook": "storybook build"
},
Expand Down
170 changes: 170 additions & 0 deletions scripts/markdown/check-editlinks.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
const fs = require('fs').promises;
const path = require('path');
const fetch = require('node-fetch-2');
const editUrls = require('../../config/edit-page-config.json');

const ignoreFiles = [
'reference/specification/v2.x.md',
'reference/specification/v3.0.0-explorer.md',
'reference/specification/v3.0.0.md'
];

/**
* Introduces a delay in the execution flow
* @param {number} ms - The number of milliseconds to pause
*/
async function pause(ms) {
return new Promise((res) => {
setTimeout(res, ms);
});
}

/**
* Process a batch of URLs to check for 404s
* @param {object[]} batch - Array of path objects to check
* @returns {Promise<string[]>} Array of URLs that returned 404
*/
async function processBatch(batch) {
return Promise.all(
batch.map(async ({ filePath, urlPath, editLink }) => {
try {
if (!editLink || ignoreFiles.some((ignorePath) => filePath.endsWith(ignorePath))) return null;

const response = await fetch(editLink, { method: 'HEAD' });
if (response.status === 404) {
return { filePath, urlPath, editLink };
}
return null;
} catch (error) {
console.error(`Error checking ${editLink}:`, error.message);
return editLink;
}
})
);
}

/**
* Check all URLs in batches
* @param {object[]} paths - Array of all path objects to check
* @returns {Promise<string[]>} Array of URLs that returned 404
*/
async function checkUrls(paths) {
const result = [];
const batchSize = 5;

for (let i = 0; i < paths.length; i += batchSize) {
console.log(`Processing batch ${Math.floor(i / batchSize) + 1}/${Math.ceil(paths.length / batchSize)}`);
const batch = paths.slice(i, i + batchSize);
const batchResults = await processBatch(batch);
await pause(1000);
Comment on lines +58 to +59
Copy link

@coderabbitai coderabbitai bot Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring to avoid 'await' inside a loop

Static analysis tools have flagged the use of await inside a loop on lines 58-59. Using await in a loop can lead to sequential execution and potential performance issues. Consider refactoring the code to process batches concurrently.

Apply this diff to refactor the code:

-  for (let i = 0; i < paths.length; i += batchSize) {
-    console.log(`Processing batch ${Math.floor(i / batchSize) + 1}/${Math.ceil(paths.length / batchSize)}`);
-    const batch = paths.slice(i, i + batchSize);
-    const batchResults = await processBatch(batch);
-    await pause(1000);
-
-    // Filter out null results and add valid URLs to results
-    result.push(...batchResults.filter((url) => url !== null));
-  }
+  const batches = [];
+  for (let i = 0; i < paths.length; i += batchSize) {
+    const batch = paths.slice(i, i + batchSize);
+    batches.push(batch);
+  }
+
+  console.log(`Processing ${batches.length} batches concurrently...`);
+  const batchResultsArray = await Promise.all(
+    batches.map(async (batch) => {
+      const batchResults = await processBatch(batch);
+      // Introduce a pause between batches if necessary
+      await pause(1000);
+      return batchResults.filter((url) => url !== null);
+    })
+  );
+
+  // Flatten the array of results
+  result.push(...batchResultsArray.flat());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const batchResults = await processBatch(batch);
await pause(1000);
const batches = [];
for (let i = 0; i < paths.length; i += batchSize) {
const batch = paths.slice(i, i + batchSize);
batches.push(batch);
}
console.log(`Processing ${batches.length} batches concurrently...`);
const batchResultsArray = await Promise.all(
batches.map(async (batch) => {
const batchResults = await processBatch(batch);
// Introduce a pause between batches if necessary
await pause(1000);
return batchResults.filter((url) => url !== null);
})
);
// Flatten the array of results
result.push(...batchResultsArray.flat());
🧰 Tools
🪛 eslint

[error] 58-58: Unexpected await inside a loop.

(no-await-in-loop)


[error] 59-59: Unexpected await inside a loop.

(no-await-in-loop)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use such a logic inside our Dashboard script and it works great. let's keep a uniform style

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anshgoyalevil Understood, I'll respect the existing pattern used in the Dashboard script. Thanks for the clarification!

(`・ω・´)ゞ


✏️ Learnings added
Learnt from: anshgoyalevil
PR: asyncapi/website#3557
File: scripts/markdown/check-editlinks.js:58-59
Timestamp: 2025-01-08T15:15:00.620Z
Learning: In the AsyncAPI codebase, batch processing operations (like in the Dashboard script and check-editlinks.js) follow a sequential pattern using await in loops, which is the preferred approach for maintaining consistency across the codebase.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anshgoyalevil The pattern used in the dashboard script is not appropriate. I think CodeRabbitAI has suggested a more configurable approach where you can change the size of the batch, using a variable. Try to implement this suggestion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshatnema I don't get it. The coderrabit is not suggesting adding a variable for configurable batch size. In fact, we have it already


// Filter out null results and add valid URLs to results
result.push(...batchResults.filter((url) => url !== null));
}

return result;
}

/**
* Determines the appropriate edit link based on the URL path and file path
* @param {string} urlPath - The URL path to generate an edit link for
* @param {string} filePath - The actual file path
* @param {object[]} editOptions - Array of edit link options
* @returns {string|null} The generated edit link or null if no match
*/
function determineEditLink(urlPath, filePath, editOptions) {
// Remove leading 'docs/' if present for matching
const pathForMatching = urlPath.startsWith('docs/') ? urlPath.slice(5) : urlPath;

const target =
editOptions.find((edit) => pathForMatching.includes(edit.value)) || editOptions.find((edit) => edit.value === '');

if (!target) return null;

// Handle the empty value case (fallback)
if (target.value === '') {
return `${target.href}/docs/${urlPath}.md`;
}

// For other cases with specific targets
return `${target.href}/${path.basename(filePath)}`;
}

/**
* Recursively processes markdown files in a directory to generate paths and edit links
* @param {string} folderPath - The path to the folder to process
* @param {object[]} editOptions - Array of edit link options
* @param {string} [relativePath=''] - The relative path for URL generation
* @param {object[]} [result=[]] - Accumulator for results
* @returns {Promise<object[]>} Array of objects containing file paths and edit links
*/
async function generatePaths(folderPath, editOptions, relativePath = '', result = []) {
try {
const files = await fs.readdir(folderPath);

await Promise.all(
files.map(async (file) => {
const filePath = path.join(folderPath, file);
const relativeFilePath = path.join(relativePath, file);

// Skip _section.md files
if (file === '_section.md') {
return;
}

const stats = await fs.stat(filePath);

if (stats.isDirectory()) {
// Process directory
await generatePaths(filePath, editOptions, relativeFilePath, result);
} else if (stats.isFile() && file.endsWith('.md')) {
// Process all markdown files (including index.md)
const urlPath = relativeFilePath.split(path.sep).join('/').replace('.md', '');
result.push({
filePath,
urlPath,
editLink: determineEditLink(urlPath, filePath, editOptions)
});
}
})
);

return result;
} catch (err) {
console.error(`Error processing directory ${folderPath}:`, err);
throw err;
}
}

async function main() {
const editOptions = editUrls;

try {
const docsFolderPath = path.resolve(__dirname, '../../markdown/docs');
const paths = await generatePaths(docsFolderPath, editOptions);
console.log('Starting URL checks...');
const invalidUrls = await checkUrls(paths);

if (invalidUrls.length === 0) {
console.log('All URLs are valid.');
process.exit(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use process.exit functions like this. Make the conditional rendering more appropriate. Like, make this if block for invalidUrls only.

}

console.log('\nURLs returning 404:\n');
invalidUrls.forEach((url) => console.log(`- ${url.editLink} generated from ${url.filePath}\n`));
console.log(`\nTotal invalid URLs found: ${invalidUrls.length}`);

if (invalidUrls.length > 0) {
process.exit(1);
}
} catch (error) {
console.error('Failed to check edit links:', error);
process.exit(1);
}
}

if (require.main === module) {
main();
}

module.exports = { generatePaths, determineEditLink, main };
Loading