Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fabric E2E Sample] Adding CI pipelines #1013

Open
wants to merge 1 commit into
base: feat/e2e-fabric-dataops-sample-v0-2
Choose a base branch
from

Conversation

camaderal
Copy link

Type of PR

  • Code changes
  • Test changes
  • CI-CD changes

Purpose

Added the following CI pipelines

  • devops/azure-pipelines-ci-artifacts.yml
    • Triggered on commits to the main branch. This should trigger the linting/testing for the python modules and publish fabric environment and ADLS artifacts.
    • Steps
      • Publish Fabric Environment Config Artifacts
        • Lint and run test for custom libraries (python modules)
        • Build Fabric Environment Config Artifacts
          • Output:
          fabric_env
          |_environment.yaml
          |_custom_libraries
             |_ddo_transform_standardize.py
             |_ddo_transform_transform.py
             |_otel_monitor_invoker.py
          
      • Publish ADLS Artifacts
        • Create the application.cfg from template
        • Build ADLS Artifacts
          • Output:
          adls
          |_config
             |_application.cfg
             |_lakehouse_ddls.yaml
          |_reference
             |_dim_date.csv
             |_dim_time.csv
          
  • devops/azure-pipelines-ci-qa.yml:
    • Triggered on PR against the main branch. This would test the python modules, create an ephemeral workspace, and run the workspace tests.
    • Steps
      • BuildLibraries
        • Lint and run test for custom libraries (python modules)
      • BuildFabric
        • Interactive Azure CLI Login (Most fabric APIs needed user identity so need to login)
        • Get Access Tokens for Fabric, Azure Management, Azure Storage
        • Build Workspace
          • Create new feature workspace if $FABRIC_WORKSPACE_NAME-$PR_ID doesn’t exist.
            • Idea is the pipeline will only create workspace per PR. Cleanup of feature workspace will be done when merged.
          • Add workspace role assignments for group admin
          • Provision Workspace Identity
          • Sync newly created workspace to the feature branch.
          • Create a custom work pool with name $FABRIC_CUSTOM_POOL_NAME if it doesn't exist. Assign it to the workspace.
            • work pool details will be derived from fabric/fabric_environment/spark_pool_settings.yml file of the feature branch.
          • Create ADLS storage container with name feature-$PR-ID if it doesn't exist.
          • Add role assignment to the feature workspace identity for the ADLS storage account.
          • Create the ADLS Cloud connection with name $FABRIC_CONNECTION_NAME-$PR_ID if it doesn't exist.
          • Add collection role assignments for group admin
          • Create the ADLS shortcut if it doesn't exist.
        • Update environment
          • Check staged and published compute settings and libraries.
          • If there are unpublished changes, publish the environment. (First time the pipeline is run for a PR, the pipeline updates spark pool and all libraries are unpublished so it will publish the environment. For subsequent runs, it shouldn't re-publish the environment.)
          • If there are changes in the PR for the the environment config files or custom libraries, then it will always re-publish the environment every time PR is run.
        • Update ADLS Config files
          • Create the application.cfg from template
          • Upload the following to the feature-$PR-ID storage container
        adls
        |_config
           |_application.cfg
           |_lakehouse_ddls.yaml
        |_reference
           |_dim_date.csv
           |_dim_time.csv
        
        • Test Fabric Workspace
          • Run workspace tests (run notebooks, pipeline, etc)
            • FABRIC_WORKSPACE_NAME will be the feature workspaceFABRIC_WORKSPACE_NAME-$PR_ID
  • devops/azure-pipeline-ci-qa-cleanup:
    • Should be triggered when PR against the main branch is completed/abandoned. This would remove resources created during the azure-pipelines-ci-qa pipeline. I still haven't figured how to trigger but I am thinking service hooks?
    • Steps
      • CleanupWorkspace
        • Interactive Azure CLI Login (Most fabric APIs needed user identity so need to login)
        • Get Access Tokens for Fabric, Azure Management, Azure Storage
        • Remove storage container, storage role assignments, ADLS cloud connection and workspace

Added a setup repository script

  • This is a temporary script to setup the repo files inAzure DevOps to test the pipelines.
  • Note that I moved some files so uploading the files is easier:
    • config/fabric_environment/environment.yaml ⇨ fabric/fabric_environment/environment.yaml
    • config/fabric_environment/*.py ⇨ libraries/src/
    • src/test/* ⇨ libraries/test/
  • I also modified the deploy script to sync the fabric workspace in the $GIT_DIRECTORY_NAME/fabric/workspace instead.

Does this introduce a breaking change? If yes, details on what can break

No.

Author pre-publish checklist

  • Added test to prove my fix is effective or new feature works
  • No PII in logs
  • Made corresponding changes to the documentation

Validation steps

To test the pipelines, you need to have your git repositories to be a certain way. Here are steps on how to set it up.

  • Generate your Azure Devops Credentials then add the following env variables GIT_USERNAME and GIT_TOKEN. Other than that make sure, these env variables are also set:

    • GIT_ORGANIZATION_NAME
    • GIT_PROJECT_NAME
    • GIT_REPOSITORY_NAME
    • GIT_BRANCH_NAME
    • GIT_DIRECTORY_NAME
  • Run the scripts/setup_repository.py This would add all the necessary files to the GIT_DIRECTORY_NAME path in the Azure DevOps repo. This will also create the fabric/workspace folder where the fabric workspace will be synced. The file structure should look like this (minus the fabric/workspace part which would be generated later):
    Screenshot 2025-01-04 at 5 17 16

  • Either:

    • Clean deploy the fabric workspace with the instructions from the README.
    • Sync your existing fabric workspace to the fabric/workspace folder in the Azure Devops repo.
  • Create the variable group. Here are the required values:

    • WORKING_DIR: Folder path where you committed your repository. Same as GIT_DIRECTORY_NAME in the env variables
    • SUBSCRIPTION_ID : Same as the SUBSCRIPTION_ID in the env variables
    • RESOURCE_GROUP_NAME: Same as the RESOURCE_GROUP_NAME in the env variables
    • STORAGE_ACCOUNT_NAME: Created storage account name
    • STORAGE_ACCOUNT_ROLE_DEFINITION_ID (Role definition id given to fabric workspace to access storage account)
    • STORAGE_CONTAINER_NAME : Default is ”main”
    • KEYVAULT_NAME: Created key vault name
    • ORGANIZATIONAL_NAME: Same as the GIT_ORGANIZATION_NAME in the env variables
    • PROJECT_NAME: Same as the GIT_PROJECT_NAME in the env variables
    • REPO_NAME: Same as the GIT_REPOSITORY_NAME in the env variables
    • FABRIC_WORKSPACE_GROUP_ADMIN : Principal Id of the Group Admin for the workspace and cloud connection
    • FABRIC_WORKSPACE_NAME: Created workspace name
    • FABRIC_CAPACITY_NAME: Created capacity name
    • FABRIC_ENVIRONMENT_NAME: Created environment name
    • FABRIC_LAKEHOUSE_NAME: Created lakehouse name
    • FABRIC_SHORTCUT_NAME: Default is “sc-adls-main”
    • FABRIC_CUSTOM_POOL_NAME: Created custom pool name.
    • FABRIC_CONNECTION_NAME: Created ADLS Cloud Connection name.
  • Modify the following values in the pipelines and commit it.

    • <VARIABLE_GROUP_NAME>: the variable group you created
    • <DEV_BRANCH_NAME>: the branch name you want the pipeline to be triggered.
  • Setup the pipeline then run it.

  • Known issue:

    • In the CI QA pipeline, there is a part in the build workspace script where a new storage container is built then a shortcut is created. Sometimes, the creation of container or the addition of role assignment takes time? and when the shortcut is created it will result in the message below. Without changing anything, you can rerun the pipeline and it should work the second time.:
    Exception: [Error] Failed to create shortcut 'sc-adls-main': 400 - {"requestId":"XXX","errorCode":"BadRequest","moreDetails":[{"errorCode":"RequestBodyValidationFailed","message":"Unauthorized. Access to target location https://xxx.blob.core.windows.net/feature-XXX// denied."}],"message":"The request could not be processed due to missing or invalid information"}

Issues Closed or Referenced

@camaderal camaderal changed the title Adding CI pipelines [Fabric E2E Sample] Adding CI pipelines Jan 10, 2025
@camaderal camaderal self-assigned this Jan 10, 2025
@camaderal camaderal added the e2e: fabric Related with E2E Fabric Sample label Jan 10, 2025
@camaderal camaderal linked an issue Jan 10, 2025 that may be closed by this pull request
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e: fabric Related with E2E Fabric Sample
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Basic AzDO CI pipeline
1 participant