Collibra DQ User Guide
2022.10
Search
⌃K

Rules (user-defined)

Apply custom monitoring with SQL
We've moved! To improve customer experience, the Collibra Data Quality User Guide has moved to the Collibra Documentation Center as part of the Collibra Data Quality 2022.11 release. To ensure a seamless transition, dq-docs.collibra.com will remain accessible, but the DQ User Guide is now maintained exclusively in the Documentation Center.

SQL Rule Engine

Introduction

Collibra Data Quality takes a strong stance that data should first be profiled, auto-discovered and learned before applying basic rules. This methodology commonly removes thousands of rules that will never need to be written and evolve naturally overtime. However there are still many cases to add a simple rule, complex rule or domain specific rule. Simply search for any dataset and add a rule. You can use the optional Column Name/Category/Description to add meta-data to your rules for future reporting.
Customized discovery routines can be run using the rule library together with data concepts and semantics.

Query Builder

Query builder will help generate SQL for more complex rules. You can apply to one or two tables (Table A on left and Table B on right). The query builder can help build up multi-part conditions.
(Optional) Start by searching for table B on the right, to set a key for the join condition
Input conditions and click SQL statement to generate example syntax
As with any SQL generator, there are limitations for more complex scenarios.

Break Records

Storing break records is only available for Freefrom and Simple rule types. Rule library rules uses one of these types as well.
Enable additional storage with the -linkid flag. This allows you to store complete sets of break records. See the linkid section for more details.
Stat, Native, and Data Type (global) rules are not eligible for storing exception records.

Quick Tips

If joining more than one data source, make sure both sets of drivers are in the -lib. Or separately supply a -libsrc pointing to the appropriate directory/jar file location. Versions later than 2021.11 use the -addlib for additional directories to add to the classpath.
Native SQL uses your native DB syntax. The score is total break records / rows from the scope (query / -q) of the defined DQ job.

Spark SQL

This is a complete list of Spark SQL operators and functions available. https://spark.apache.org/docs/latest/api/sql/index.html