2021.10
Collibra DIC Integration
Powered By GitBook
Agent
DQ Agent Configuration Guide

Agent Configuration Parameters

Parameter
Description
Is Local
For Hadoop only
Is Livy
Deprecated. Not used.
Base Path
The installation folder path for DQ. All other paths in DQ Agent are relative to this installation path
This is the location that was set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if setup command was export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.
Default: /opt/owl/
Owl Core JAR
The file path to DQ Core jar file. Default <Base Path>/owl/bin/
Owl Core Logs
The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder. Default: <Base Path>/owl/log
Owl Web Logs
The folder path where DQ Web logs are stored. Logs from DQ Web App is stored in this folder. Default: <Base Path>/owl/log
Owl Script
The file path to DQ execution script owlcheck.sh. This script is used to run DQ Job via command line without using agent. Usingowlcheck.shfor running DQ Jobs is superseded by DQ Agent execution model. Default: <Base Path>/owl/bin/owlcheck
Deploy Deployment Mode
The Spark deployment mode that takes one of Client or Cluster
Default Master
The Spark Master URL copied from the Spark cluster verification screen (spark://...)
Default Queue
The default resource queue for YARN
Dynamic Spark Allocation
Deprecated. Not used.
Spark Conf Key
Deprecated. Not used.
Spark Conf Value
Deprecated. Not used.
Number of executor(s)
The default number of executors allocated per DQ Job when using this Agent to run DQ Scan
Executer Memory (GB)
The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scan
Number of Core(s)
The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scan
Driver Memory (GB)
The default driver RAM allocated per DQ Job when using this Agent to run DQ Scan
Free Form (Appended)
Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scan
Fig 2: Edit DQ Agent mode in D

How to Install a New DQ Agent

Setting up a DQ Agent using setup.sh as part of DQ package

Use setup.sh script located in /opt/owl/ (or other Base Path that your installation used). See example code block for installing a DQ Agent with Postgres server running localhost on port 5432 with database postgres and Postgres username/password combo postgres/password
1
# PATH TO DIR THAT CONTAINS THE INSTALL DIR
2
export BASE_PATH=/opt
3
4
# PATH TO AGENT INSTALL DIR
5
export INSTALL_PATH=/opt/owl
6
7
# DQ Metadata Postgres Storage settings
8
export METASTORE_HOST=localhost
9
export METASTORE_PORT=5432
10
export METASTORE_DB=postgres
11
export METASTORE_USER=postgres
12
export METASTORE_PASSWORD=password
13
14
cd $INSTALL_PATH
15
16
# Install DQ Agent only
17
./setup.sh \
18
-owlbase=$BASE_PATH \
19
-options=owlagent \
20
-pguser=$METASTORE_USER \
21
-pgpassword=$METASTORE_PASSWORD \
22
-pgserver=${METASTORE_HOST}:${METASTORE_PORT}/${METASTORE_DB}
Copied!
The setup script will automatically generate the /opt/owl/config/owl.properties file and encrypt the provided password.

Setting up a DQ Agent manually

    Passwords to DQ Metadata Postgres Storage should be encrypted before being stored in /opt/owl/config/owl.propertiesfile.
1
# PATH TO AGENT INSTALL DIR
2
export INSTALL_PATH=/opt/owl
3
4
cd $INSTALL_PATH
5
6
# Encrypt DQ Metadata Postgres Storage password
7
./owlmanage.sh encrypt=password
Copied!
owlmanage.sh will generate an encrypted string for the plain text password input. The encrypted string can be used in the /opt/owl/config/owl.propertiesconfiguration file to avoid exposing the DQ Metadata Postgres Storage password.
    To complete Owl Agent configuration, edit the /opt/owl/config/owl.propertiesconfiguration file with basic agent values:
1
vi $INSTALL_PATH/config/owl.properties
Copied!
    and add the following properties
1
spring.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
2
spring.datasource.username={METASTORE_USER}
3
spring.datasource.password={METASTORE_PASSWORD}
4
spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
5
6
spring.agent.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
7
spring.agent.datasource.username={METASTORE_USER}
8
spring.agent.datasource.password={METASTORE_PASSWORD}
9
spring.agent.datasource.driver-class-name=org.postgresql.Driver
Copied!
    Restart the web app

How To Configure Agent via UI

    Login to DQ Web and navigate to Admin Console.
Fig 1: Home Page
    From the Admin Console, click on the Remote Agent tile.
Fig 2: Admin Console
    Identify the row with the agent to edit .
Fig 3: Agent Management Table
    Click on the pencil icon to edit.
Fig 4: DQ Agent with default values

How To Link DB Connection to Agent via UI

When you add a new Database Connection, the DQ Agent must be given the permission to run DQ Job via the specified agent.
From Fig 3, select the chain link icon next to the DQ Agent to establish link to DB Connection. A modal to add give that agent permission to run DQ Jobs by DB Connection name will show (Fig 5). The left-side panel is the list DB Connection names that has not been linked to the DQ Agent. The right-side panel is the list of DB Connection names that has the permission to run DQ Job.
Double click the DQ Connection name to move from left to right. In Fig 5, DB Connection named "metastore" is being added to DQ Agent. Click the "Update" button to save the new list of DB Connections.
Fig 5: Adding DB Connection named "metastore" to the DQ Agent
Fig 6: How to add all connections to the selected DQ Agent

Setting up an HA Group

If you have multiple DQ Agents, then you can establish them as an HA Group. When doing so, make sure both DQ Agents have the same connections established to them.
    Click on the "AGENT GROUPS (H/A)" Tab name your HA Group and add the Agents you'd like to participate as Group. NOTE: HA GROUPS will execute jobs in a round robin fashion.
    When the Agents have been registered, associated with DB connections, users can now execute a job via the explorer page.
Fig 7: Executing an Ad Hoc job via DQ Web Explorer

Diagram

Fig 1: High level depiction of DQ Agents using CDH, HDP, and EMR within a single DQ Web App
Fig 1 shows provides a high level depiction of how agents work within DQ. A job execution is driven by DQ Jobs that are written to an agent_q table inside the DQ Metadata Postgres Storage (Owl-Postres database in Fig 1) via the Web UI or REST API endpoint. Each agent available and running queries the Owl-Postgres table every 5 seconds to execute the DQ Jobs the agent is responsible for. For example, the EMR agent Owl-Agent3 in Fig 1 only executes DQ Jobs scheduled to run on EMR.
When an agent picks up a DQ Job to execute, the agent will launch the job either locally on the agent node itself or on a cluster as a spark job (if the agent is setup as an edge node of a cluster). Depending on where the job launches, the results of the DQ Job will write back to the DQ Metadata Storage (Owl-Postgres database). The results are then displayed on the DQ Web UI, exposed as REST API, and available for direct SQL query against Owl-Postgres database.
Last modified 14d ago