Collibra DQ User Guide
2022.10
Search
⌃K

Connecting to Hadoop Distributed File System (HDFS)

We've moved! To improve customer experience, the Collibra Data Quality User Guide has moved to the Collibra Documentation Center as part of the Collibra Data Quality 2022.11 release. To ensure a seamless transition, dq-docs.collibra.com will remain accessible, but the DQ User Guide is now maintained exclusively in the Documentation Center.

Prerequisites

To configure the HDFS connector, you need:
  • Admin privileges in your Collibra Data Quality instance.
  • Access to an HDFS cluster.

Steps

  1. 1.
    In the main menu, hover over the gear icon and click Connection. >>The Connections page opens.
  2. 2.
    Scroll down to the HDFS card.
  3. 3.
    Click the Add button to add a new HDFS connection. >> The New Remote File Connection (HDFS) modal opens.
  4. 4.
    Enter the values for each property.
Property
Description
Name
The unique name of your HDFS connector.
Connection URL
The HDFS URL used for your connection.
Target Agent
The target agent lets you select an agent for your connection.
Auth Type
The method used to authorize your connection.
Note: If you use an Unsecured Auth Type, no other authorization fields are required. This is not recommended.
Principal
The service principal used to let Collibra Data Quality access your connection.
Keytab
The keytab used to authorize your connection. Note: Only applicable when you select Keytab as the Auth Type.
TGT
The Ticket Granting Ticket used to authorize your connection.
Note: Only applicable when you select TGT Cache as the Auth Type.
Driver Properties
The configurable driver properties for your connection. Note: This is an optional configuration.
5. Click Save to establish your connection.

What's next?

Once you save your HDFS connection:
  • A confirmation message tells you that your connection is saved and valid.
  • You can immediately access your HDFS connection from Explorer.