Cloud native install
We've moved! To improve customer experience, the Collibra Data Quality User Guide has moved to the Collibra Documentation Center as part of the Collibra Data Quality 2022.11 release. To ensure a seamless transition, dq-docs.collibra.com will remain accessible, but the DQ User Guide is now maintained exclusively in the Documentation Center.
Once you have created a namespace and added all of the required secrets, you can begin the deployment of Collibra DQ.
Install Web, Agent, and metastore. Collibra DQ is inaccessible until you manually add an Ingress or another type of externally accessible service.
All of the following examples will pull containers directly from the Collibra DQ secured container registry. In most cases, InfoSec policies require that containers are sourced from a private container repository controlled by the local Cloud Ops team. Make sure to to add
--set global.image.repo=</url/of/private-repo>
so that you use only approved containers.helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=ClusterIP \
--set global.web.usageMeter.pendo.accountId=<your-license-name> \
<deployment-name> \
/path/to/chart/owldq
The metastore container must start first as the other containers use it to write data. On your initial deployment, the other containers might start before the metastore and fail.
Perform the minimal install and add a preconfigured NodePort or LoadBalancer service to provide access to the Web.
A LoadBalancer service type requires that the Kubernetes platform is integrated with a Software Defined Network solution. This will generally be true for the Kubernetes services offered by major cloud vendors. Private cloud platforms more commonly use Ingress controllers. Check with the infrastructure team before attempting to use LoadBalancer service type.
helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=<NodePort || LoadBalancer> \
<deployment-name> \
/path/to/chart/owldq
Perform the install with external service but with SSL enabled.
Ensure you have already deployed a keystore containing a key to the target namespace with a secret name that matches the
global.web.tls.key.secretName
argument (owldq-ssl-secret by default). Also, ensure that the secret's key name matches the global.web.tls.key.store.name
argument (dqkeystore.jks by default).helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.web.tls.enabled=true \
--set global.web.tls.key.secretName=owldq-ssl-secret \
--set global.web.tls.key.alias=<key-alias> \
--set global.web.tls.key.type=<JKS || PKCS12> \
--set global.web.tls.key.pass=<keystore-pass> \
--set global.web.tls.key.store.name=keystore.jks \
<deployment-name> \
/path/to/chart/owldq
Perform the install with external service and Spark History Server enabled. In this example, the target log storage system is GCS.
helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.spark_history.enabled=true \
--set global.spark_history.logDirectory=gs://logs/spark-history/ \
--set global.spark_history.service.type=<NodePort || LoadBalancer> \
--set global.cloudStorage.gcs.enableGCS=true \
<deployment-name> \
/path/to/chart/owldq
Perform the install with external service and Spark History Server enabled. In this example, the target log storage system is S3.
For Collibra DQ to be able to write Spark logs to S3, makes sure that an Instance Profile IAM Role with access to the log bucket is attached to all nodes serving the target namespace.
helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.spark_history.enabled=true \
--set global.spark_history.logDirectory=s3a://logs/spark-history/ \
--set global.spark_history.service.type=<NodePort || LoadBalancer> \
--set global.cloudStorage.s3.enableS3=true \
<deployment-name> \
/path/to/chart/owldq
Perform the install with external service and an external metastore, for example AWS RDS, Google Cloud SQL, or just PostgresSQL on its own instance.
Collibra DQ currently supports PostgreSQL 9.6 and newer.
helm upgrade --install --namespace <namespace> \
--set global.version.owl=<owl-version> \
--set global.version.spark=<owl-spark-version> \
--set global.configMap.data.license_key=<owl-license-key> \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.metastore.enabled=false
--set global.configMap.data.metastore_url=jdbc:postgresql://<host>:<port>/<database>
--set global.configMap.data.metastore_user=<user> \
--set global.configMap.data.metastore_pass=<password> \
<deployment-name> \
/path/to/chart/owldq
This guide is to provide the most common commands to run when troubleshooting a DQ environment that is deployed on Kubernetes. For a basic overview of Kubernetes and other relevant knowledge.
### Provide documentation on syntax and flags in the terminal
kubectl help
### To see how to use Kubernetes resources
kubectl api-resources -o wide
### Get Pods, their names & details in all Namespaces
kubectl get pods -A -o wide
### Get all Namespaces in a cluster
kubectl get namespaces
### Get Services in all Namespaces
kubectl get services -A -o wide
### List all deployments in all namespaces:
kubectl get deployments -A -o wide
### List Events sorted by timestamp in all namespaces
kubectl get events -A --sort-by=.metadata.creationTimestamp
### Get logs from a specific pod:
kubectl logs [my-pod-name]
### If the Kubernetes Metrics Server is installed,
### the top command allows you to see the resource consumption for nodes or podscode
kubectl top node
kubectl top pod
### If the Kubernetes Metrics Server is NOT installed, use
kubectl describe nodes | grep Allocated -A 10
### Get current-context
kubectl config current-context
### See all configs for the entire cluster
kubectl config view
### Check to see if I can read pod logs for current user & context
kubectl auth can-i get pods --subresource=log
### Check to see if I can do everything in my current namespace ("*" means all)
kubectl auth can-i '*' '*'
Last modified 4mo ago