Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-Hosted Pixie Install Script #238

Closed
zasgar opened this issue May 4, 2021 · 22 comments
Closed

Self-Hosted Pixie Install Script #238

zasgar opened this issue May 4, 2021 · 22 comments

Comments

@zasgar
Copy link
Member

zasgar commented May 4, 2021

Is your feature request related to a problem? Please describe.
We would like to have an install experience for the self-hosted version of Pixie that is as easy to use as the one hosted on withpixie.ai.

Additional context
Our team has been busy at work this month open sourcing Pixie's source code, docs, website, and other assets, We are also actively applying to be a CNCF sandbox project!

One of our last remaining items is to publish an install script to deploy a self-hosted version of Pixie.

Who offers a hosted version of Pixie?

New Relic currently offers a 100% free hosted version of Pixie Cloud. This hosting has no contingencies and will be offered indefinitely to the Pixie Community. All the code used for hosting is open source, including out production manifest files.

What will the Self-Hosted install script do?

The Self-Hosted install script will deploy Pixie Cloud so that you can use Pixie without any external dependencies. This is the exact version of Pixie Cloud we deploy, so it'll behave exactly as the hosted version, but will require management/configuration.

What is the timeline? 

Good question. :) We had planned to open source this script by 5/4. Unfortunately, we didn’t make it. We need more time to ensure that the Pixie Cloud deploy script will be just as easy to install Pixie Cloud as it is to install the hosted version of Pixie (in < 2 minutes!)

But I really want to run a Self-Hosted Pixie...now!

Technically you can build and run a self-hosted Pixie using Skaffold. Check out:

https://github.com/pixie-labs/pixie/blob/main/skaffold/skaffold_cloud.yaml
https://github.com/pixie-labs/pixie/tree/main/k8s/cloud
https://github.com/pixie-labs/pixie/tree/main/k8s/cloud_deps

These directions are not fully documented and the team is choosing to focus on quickly delivering the self-hosted install script. We'll constantly be iterating on the documentation to make the project more open source friendly.

@shellfu
Copy link

shellfu commented May 21, 2021

Just curious how this is going :)

@zasgar
Copy link
Member Author

zasgar commented May 27, 2021

We are planning to discuss this in our Pixienaut meeting today!

It's most of the way there, some more finishing touches and it should be good to go by next week. It's mostly good to go today except for a small UI bug.

@htroisi
Copy link
Contributor

htroisi commented May 27, 2021

As of today, there are two ways to get Pixie:

  1. With the fully-managed Community Cloud for Pixie (free forever, hosted by New Relic). Pixie Community Cloud makes getting started with Pixie even easier and faster. You can find the Community Cloud Quick Start guide here.

  2. With the Self-managed Pixie Cloud run on your own infrastructure. The Self-managed Cloud option requires you to manage your own certificates, set up DNS, and manage authentication. You can find the Self-managed Pixie Cloud Quick Start guide here.

The Self-managed Pixie Cloud install script is an Alpha release. Please report any bugs you encounter. Known issues include:

  • Self-managed Pixie Cloud has only been tested on GKE. There is a known GCS bug that prevents from deploying to Minikube.
  • Update: Self-managed Pixie Cloud should now work with the Live UI.

@narioinc
Copy link

@htroisi @zasgar

First, Thank you so much to the PIXIE team for creating such an awesome tool !

While following the self install scripts for takign pixie dev (self hosted) for spin, I found that when running this on mickrok8s or minikube for testing, you have to pass through a few errors related to PVs not being there (espeically when trying to run this on a localhost machine with LocalStorage support). I was able to resolve them with the appropriate quick k8s PV resources be created but end up finally in two secrets not being found

  1. pl-elastic-es-http-certs-public - needed by auth server (I guess, forgot to note it down)
  2. pl-elastic-es-elastic-user - needed by the indexer server

I know this is too early and I definitely dont want to create issues on the self-hosted scripts. But I though of bringing this to your attentions. I did manage to solve the pls elastic certs public secret by creating a certs secret with that name and addign certs create by mkcerts. However for the elastic user, i am not pretty sure what all key-values does it contain. I though o ftaking a quick peek at elastics documentation to see if i can roughly create this but didnt make much progress.

I guess that the script called create_cloud_secrets may need edits to fully create all the necessary secrets. Thanks !!! :)

@vihangm
Copy link
Member

vihangm commented Jun 2, 2021

@narioinc
Thanks for testing this out. We have noticed issues with minikube and newer versions of kubernetes and this seems to be reported elsewhere in the community too. (See kubernetes/minikube#7828 (comment))
Setting --kubernetes-version=v1.16.1 when creating the minikube cluster seems to help for now, and hopefully this is fully addressed in the newer versions of k8s/minikube.

As for the secrets for elastic, those are created by the elastic operator.
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-security.html#k8s-authentication
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-tls-certificates.html
If you give it time to settle those should get created by the operator, and the services that need them will come up after that.

In general I'd recommend waiting between the deploy of cloud_deps and cloud to make sure all the resources created by cloud_deps are up and ready before the rest of cloud comes up.

@shellfu
Copy link

shellfu commented Jun 19, 2021

A couple of issues I have noticed so far when installing on Digital Ocean.

  1. StorageClass "standard" does not exist on DigitalOcean and must first be created.
  2. The 100M Size in k8s/cloud_deps/public/postgres/postgres_persistent_volume.yaml does not work in DigitalOcean as the minimum size appears to be 1Gi, which is here:
    https://github.com/pixie-labs/pixie/blob/main/k8s/cloud_deps/public/postgres/postgres_persistent_volume.yaml#L14

These two items were easy enough to overcome, next when the job create-admin-job starts, it errors with the following:
time="2021-06-19T19:57:16Z" level=fatal msg="Unable to create admin user" func=main.main file="src/cloud/jobs/create_admin_user/main.go:102" error="rpc error: code = InvalidArgument desc = identity provider must not be empty"

Upon inspecting the test, I can see the value being passed is github
https://github.com/pixie-labs/pixie/blob/1dd207fed8f51e4626b3ff59f6b263583215c60e/src/cloud/profile/controller/server_test.go#L79

You can also see in the create_admin_user source, that it does not set an identity provider nor is there a flag to pass one.
https://github.com/pixie-labs/pixie/blob/main/src/cloud/jobs/create_admin_user/main.go#L91-L96

Other than modification to the source code and hosting my own image for the create admin job, I am not sure how to provide the required information to the Go binary.

Everything else APPEARS to be working, but since I cannot login to look I am not 100% certain.

@vihangm
Copy link
Member

vihangm commented Jun 28, 2021

@shellfu Thanks for testing out Digital Ocean and letting us know your findings.

The latest images for create-admin-job should be fixed and now set an appropriate value for IdentityProvider. Feel free to try out an install again and let us know if you run into any issues.

@shellfu
Copy link

shellfu commented Jul 8, 2021

The self-install has a lot of issues in a variety of environments. I have tested in Digital Ocean, and am currently attempting to install in a self-hosted K8s cluster.

executing the following results in an error unless the dev_dns_updater is patched to include the OIDC method

./dev_dns_updater --domain-name="dev.withpixie.dev"  --kubeconfig=$HOME/.kube/config --n=plc
FATA[0000] Could not create k8s clientset                error="no Auth Provider found for name \"oidc\""

Is there an update on the self-install script that you guys are working on?

In addition, in an environment where IPv6 is disabled you need to modify the openresty container to comment out the listen entries in the nginx.conf to avoid this error

Address family not supported by protocol

I can only get around the above by modifying the proxy_server_image and hosting it myself.

The only issue that remains now is the nginx config in the above container seems to have an issue with infinite redirects and the UI cannot be displayed, nor can the password be changed. Everything else appears online.

@shellfu
Copy link

shellfu commented Jul 26, 2021

Hey guys, has any more testing or thoughts gone into the self installation script noted in the first comment?

Everything from the manual install seems ok but each environment does require tweaks to the k8s manifests to properly install.

The issue above I am still not able to get past but the product appears operational.

Is there a way to change the password without the webUI? Can this be done directly in the database? I'd like at lease use the CLI tools to start viewing some of this stuff on my clusters but have so far not been able to take Pixie for a proper spin.

@HighWatersDev
Copy link

HighWatersDev commented Jul 29, 2021

Hi,
I attempted to do self-hosted install on AKS and ran across similar issues. Particularly, I had to change PVC to use azurefile. When attempting to re-deploy, I get the following error:
initdb: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted
Update:
I was able to solve this by using Azure Disk storage class by setting storageClassName: default on AKS.

I still would like to know if it's possible to run self-hosted option in production mode.

@zhantaof
Copy link

zhantaof commented Aug 9, 2021

Hi, I was reading through the docs and was wondering if the Self-managed Pixie Cloud is able to generate the API key that is necessary to integrate with New Relic One?

I came across this article that states "Auto-telemetry with Pixie is New Relic One's integration of Community Cloud for Pixie". It does not mention the Self-managed Pixie Cloud option so it isn't very clear if possible or not.
Link to article: https://docs.newrelic.com/docs/auto-telemetry-pixie/pixie-data-security-overview/

@zasgar
Copy link
Member Author

zasgar commented Oct 24, 2021

Closing this as completed.

@zasgar zasgar closed this as completed Oct 24, 2021
@shellfu
Copy link

shellfu commented Oct 25, 2021

"Pixie Cloud deploy script will be just as easy to install Pixie Cloud as it is to install the hosted version of Pixie "

Is this considered complete? even with the issues multiple community members have faced? Curious what the motivation is as the tool is good but a lot of orgs simply do not allow gsuite and the provided instructions do not contain any helpful troubleshooting around commonly encountered issues during the install process.

I will try the most recent install scripts today and see what I encounter. I would at least like to see some sort of commonly encountered issues and how to address them.

@zasgar
Copy link
Member Author

zasgar commented Oct 25, 2021

Hi @shellfu, The self-hosted version works with username/password.

I was just grooming issues over the weekend and was under the impression that this has been resolved. If you find any issues in the latest pass can you please ping this issue again and we can re-open it? We would like to get this to be as seamless as possible, but the differences in environments do make it slightly more challenging.

@zasgar zasgar reopened this Oct 25, 2021
@zasgar
Copy link
Member Author

zasgar commented Oct 25, 2021

@nserrino verified there is an issue.

@nserrino
Copy link

Currently getting this error:

nserrino: ~/pixie main ⚡
$ kustomize build k8s/cloud_deps/public/ | kubectl apply -f - --namespace=plc                                                                                                                         [14:30:58]

namespace/elastic-system created
error: error validating "STDIN": error validating data: [ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[0]): unknown field "jsonPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[0]): missing required field "JSONPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[1]): unknown field "jsonPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[1]): missing required field "JSONPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[2]): unknown field "jsonPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[2]): missing required field "JSONPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[3]): unknown field "jsonPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition, ValidationError(CustomResourceDefinition.spec.versions[0].additionalPrinterColumns[3]): missing required field "JSONPath" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceColumnDefinition]; if you choose to ignore these errors, turn validation off with --validate=false

@nserrino
Copy link

@vihangm I believe you addressed the above error

@nserrino nserrino removed their assignment Oct 29, 2021
@zasgar zasgar added help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Nov 1, 2021
@maxdml
Copy link

maxdml commented Dec 2, 2021

One thing we had to do for self hosted install was to remove a bunch of checks from our gatekeeper (like allowing NodePort). We also specifically added a tag to gcr.io/pixie-oss/pixie-dev/cloud/job/create_admin_job_image, so that it doesn't use the "latest" tag.

Still in progress: trying to solve all the "TLS handshake errors" spamming the vzconn server logs :-)

@wsszh
Copy link

wsszh commented Jun 15, 2022

Hi, I'm trying to install self-hosted pixie on self-hosted k8s cluster in my company, and I find the same problem as issue #347 posted. As I'm using a real k8s not like minikube, I don't know how to fix it. Is it possible to change the service from loadbalancer to nodeport?
And I follow the "Self-Hosted Pixie Install Guides" in the doc, and noticed that in k8s/cloud/public/kustomization.yaml it uses overlays/exposed_services_ilb, but there's no ingress related yaml. I'm confused...
@zasgar @htroisi @vihangm @nserrino @sethtroisi

@aimichelle
Copy link
Member

Closing this, as the issues with the self-hosted install script have been fixed.

@1006er
Copy link

1006er commented Aug 10, 2022

Hi, I'm trying to install self-hosted pixie on self-hosted k8s cluster in my company, and I find the same problem as issue #347 posted. As I'm using a real k8s not like minikube, I don't know how to fix it. Is it possible to change the service from loadbalancer to nodeport? And I follow the "Self-Hosted Pixie Install Guides" in the doc, and noticed that in k8s/cloud/public/kustomization.yaml it uses overlays/exposed_services_ilb, but there's no ingress related yaml. I'm confused... @zasgar @htroisi @vihangm @nserrino @sethtroisi

Hi, I'm having the same problem. and I wonder if you have found a solution now.

@TBBle
Copy link

TBBle commented Aug 10, 2022

Since this is a closed issue, I suggest you probably should open a new issue for documenting/supporting the use-case of non-minikube setups without functional LoadBalancer support. It won't get the attention it needs here.

ksrikanthreddy40 added a commit to ksrikanthreddy40/pixie that referenced this issue Oct 6, 2023
for the issue described in pixie-io#238 (comment)

Signed-off-by: ksrikanthreddy40 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests