Collibra DIC Integration
Powered By GitBook
Validating Data Movement
Validate Data Integrity Between Distinct Storage Systems

Record-for-Record Reconciliation

When you’re copying or moving data between distinct storage systems such as multiple HDFS clusters or between non-HDFS storage and cloud storage, it’s a good idea to perform some type of validation to guarantee data integrity. This validation is essential to be sure data wasn’t altered during transfer.
Detect potential data corruption caused, for example, by older versions of drivers, parsing errors, connection limits, noisy network links, memory errors on server computers and routers along the path, or software bugs (such as in a library that customers use).

Common Data Copying/Movement Scenarios

    Landing, Loading, Persisting third-party files
      Landing daily files.
      Loading daily files into staging location.
      Finally, persisting data in lake or warehouse.
    Cloud Migrations
      Between existing database storage to optimized cloud storage formats.
      Between local file systems and cloud relational database
    Data Lake or Data Warehouse
      Migrating data from single storage system to distributed storage
      Consolidating storage systems to a single lake or warehouse
    Same Storage, Different Environments
      Copying same data between Dev, QA, and Prod environments.
How do you easily validate the same data exists in distinct locations?

Shortcomings of Existing Validation Checks

    Low-level integrity checks like row counts and column counts may not be sufficient.
    No easy way to reconcile between across non-HDFS files and database.
    Chunk verification requires storage size, format, and metadata to be exactly equal.
    Different data types in two distinct databases (Oracle and Teradata) will not reconcile.
    Two different copies of the same files in HDFS, but with different per-file block sizes configured.
    Two different instances of HDFS with different block or chunk sizes configured.
    Copying across non-HDFS Hadoop-compatible file systems
    (HCFS) such as Cloud Storage.
Explicit end-to-end data integrity validation adds protection for cases that may go undetected by typical in-transit mechanisms.

Enter, OwlDQ Integrity Validation!

To ensure and protect against target systems getting out of sync or not matching the originating source, turn on -vs to validate that the source matches the target. Read More
Validate Source
Collibra DQ User Guide
Complete row, column, conformity, and value checks between any two distinct storage systems. Can be run against high-dimension or low-dimension datasets. Works between Files and/or Database storage, On-premise, or across Cloud environments.

Get Started Today

We don’t want you to get stuck writing a bunch of reconciliation checks we’ve already written! Focus on other stuff that actually moves your project forward.
For more information, please contact [email protected] or schedule a demo at www.owldq.com
Last modified 14d ago