Collibra DQ User Guide
2022.10
Search
⌃K

Agent

DQ Agent Configuration Guide
We've moved! To improve customer experience, the Collibra Data Quality User Guide has moved to the Collibra Documentation Center as part of the Collibra Data Quality 2022.11 release. To ensure a seamless transition, dq-docs.collibra.com will remain accessible, but the DQ User Guide is now maintained exclusively in the Documentation Center.

How to Install a New DQ Agent

Setting up a DQ Agent using setup.sh as part of DQ package

Use setup.sh script located in /opt/owl/ (or other Base Path that your installation used). See example code block for installing a DQ Agent with Postgres server running localhost on port 5432 with database postgres and Postgres username/password combo postgres/password
# PATH TO DIR THAT CONTAINS THE INSTALL DIR
export BASE_PATH=/opt
# PATH TO AGENT INSTALL DIR
export INSTALL_PATH=/opt/owl
# DQ Metadata Postgres Storage settings
export METASTORE_HOST=localhost
export METASTORE_PORT=5432
export METASTORE_DB=postgres
export METASTORE_USER=postgres
export METASTORE_PASSWORD=password
cd $INSTALL_PATH
# Install DQ Agent only
./setup.sh \
-owlbase=$BASE_PATH \
-options=owlagent \
-pguser=$METASTORE_USER \
-pgpassword=$METASTORE_PASSWORD \
-pgserver=${METASTORE_HOST}:${METASTORE_PORT}/${METASTORE_DB}
The setup script will automatically generate the /opt/owl/config/owl.properties file and encrypt the provided password.

Setting up a DQ Agent manually

  • Passwords to DQ Metadata Postgres Storage should be encrypted before being stored in /opt/owl/config/owl.propertiesfile.
# PATH TO AGENT INSTALL DIR
export INSTALL_PATH=/opt/owl
cd $INSTALL_PATH
# Encrypt DQ Metadata Postgres Storage password
./owlmanage.sh encrypt=password
owlmanage.sh will generate an encrypted string for the plain text password input. The encrypted string can be used in the /opt/owl/config/owl.propertiesconfiguration file to avoid exposing the DQ Metadata Postgres Storage password.
  • To complete Owl Agent configuration, edit the /opt/owl/config/owl.propertiesconfiguration file with basic agent values:
vi $INSTALL_PATH/config/owl.properties
  • and add the following properties
spring.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
spring.datasource.username={METASTORE_USER}
spring.datasource.password={METASTORE_PASSWORD}
spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
spring.agent.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
spring.agent.datasource.username={METASTORE_USER}
spring.agent.datasource.password={METASTORE_PASSWORD}
spring.agent.datasource.driver-class-name=org.postgresql.Driver
  • Restart the web app

How To Configure Agent via UI

  • Login to DQ Web and navigate to Admin Console.
Fig 1: Home Page
  • From the Admin Console, click on the Remote Agent tile.
Fig 2: Admin Console
  • Identify the row with the agent to edit.
Fig 3: Agent Management Table
  • Click on the pencil icon to edit.
Fig 4: DQ Agent with default values
When you add a new Database Connection, the DQ Agent must be given the permission to run DQ Job via the specified agent.
From Fig 3, select the chain link icon next to the DQ Agent to establish link to DB Connection. A modal to add give that agent permission to run DQ Jobs by DB Connection name will show (Fig 5). The left-side panel is the list DB Connection names that has not been linked to the DQ Agent. The right-side panel is the list of DB Connection names that has the permission to run DQ Job.
Double click the DQ Connection name to move from left to right. In Fig 5, DB Connection named "metastore" is being added to DQ Agent. Click the "Update" button to save the new list of DB Connections.
Fig 5: Adding DB Connection named "metastore" to the DQ Agent
Fig 6: How to add all connections to the selected DQ Agent

Agent Configuration Parameters

Parameter
Description
Is Local
For Hadoop only
Is Livy
Deprecated. Not used.
Base Path
The installation folder path for DQ. All other paths in DQ Agent are relative to this installation path
This is the location that was set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if setup command was export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.
Default: /opt/owl/
Owl Core JAR
The file path to DQ Core jar file. Default <Base Path>/owl/bin/
Owl Core Logs
The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder. Default: <Base Path>/owl/log
Owl Web Logs
The folder path where DQ Web logs are stored. Logs from DQ Web App is stored in this folder. Default: <Base Path>/owl/log
Owl Script
The file path to DQ execution script owlcheck.sh. This script is used to run DQ Job via command line without using agent. Usingowlcheck.shfor running DQ Jobs is superseded by DQ Agent execution model. Default: <Base Path>/owl/bin/owlcheck
Deploy Deployment Mode
The Spark deployment mode that takes one of Client or Cluster
Default Master
The Spark Master URL copied from the Spark cluster verification screen (spark://...)
Default Queue
The default resource queue for YARN
Dynamic Spark Allocation
Deprecated. Not used.
Spark Conf Key
Deprecated. Not used.
Spark Conf Value
Deprecated. Not used.
Number of executor(s)
The default number of executors allocated per DQ Job when using this Agent to run DQ Scan
Executer Memory (GB)
The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scan
Number of Core(s)
The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scan
Driver Memory (GB)
The default driver RAM allocated per DQ Job when using this Agent to run DQ Scan
Free Form (Appended)
Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scan
Fig 2: Edit DQ Agent mode in D

Setting up an HA Group

If you have multiple DQ Agents, then you can establish them as an HA Group. When doing so, make sure both DQ Agents have the same connections established to them.
  • Click on the "AGENT GROUPS (H/A)" Tab name your HA Group and add the Agents you'd like to participate as Group. NOTE: HA GROUPS will execute jobs in a round robin fashion.
  • When the Agents have been registered, associated with DB connections, users can now execute a job via the explorer page.
Fig 7: Executing an Ad Hoc job via DQ Web Explorer

Diagram

Fig 1: High level depiction of DQ Agents using CDH, HDP, and EMR within a single DQ Web App
Fig 1 provides a high level depiction of how agents work within DQ. A job execution is driven by DQ Jobs that are written to an agent_q table inside the DQ Metadata Postgres Storage (Owl-Postres database in Fig 1) via the Web UI or REST API endpoint. Each agent available and running queries the Owl-Postgres table every 5 seconds to execute the DQ Jobs the agent is responsible for. For example, the EMR agent Owl-Agent3 in Fig 1 only executes DQ Jobs scheduled to run on EMR.
When an agent picks up a DQ Job to execute, the agent will launch the job either locally on the agent node itself or on a cluster as a spark job (if the agent is setup as an edge node of a cluster). Depending on where the job launches, the results of the DQ Job will write back to the DQ Metadata Storage (Owl-Postgres database). The results are then displayed on the DQ Web UI, exposed as REST API, and available for direct SQL query against Owl-Postgres database.