2021.10
Collibra DIC Integration
Powered By GitBook
DQ Job Kafka

Kafka Requires Zookeeper

Apache Kafka typically requires zookeeper. This file and cmd can be run from inside /kafka/bin
1
# Start the ZooKeeper service
2
# Note: Soon, ZooKeeper will no longer be required by Apache Kafka.
3
$ bin/zookeeper-server-start.sh config/zookeeper.properties
Copied!

Start a Kafka Server

Precursor step to Owl (you likely already have this step completed if you use Kafka)
1
bin/kafka-server-start.sh config/server.properties
Copied!

Start a Kafka Topic

Precursor step to Owl (you likely already have this step completed if you use Kafka)
1
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
2
3
# prefered cmd is below
4
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Copied!

Put a msg on "test" Topic

1
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Copied!

Kafka Consumer or Owl Consumer

Kafka works as a topic so you can have many consumers. Here is a basic cmdline consumer but we can add Owl as a second consumer.
1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Copied!
1
/opt/owl/bin/owlcheck.sh
2
-kafkatopic test
3
-ds machine1
4
-streamformat csv
5
-kafkaport 9092
6
-kafkabroker localhost
7
-streaminterval 60
8
-stream -kafka
9
-header first_name
10
-master local
Copied!

Streams vs Sensors

Technically speaking anything moving in real-time is a stream of data but Owl classifies streams and IoT sensors as slightly different for the following reasons:

Sensors

Sensors are commonly a standard time-series. Signal, Time, Value
Signal
Time
Value
device1-CPU
2019-02-11 13:40:55
4
device1-CPU
2019-02-11 14:33:20
2

Streams

Streams commonly look like messages, jsons, avro or batch data but constantly flowing. Another way to think of it is a multiple time-series
1
[
2
trade: {
3
price: 23.75,
4
qty: 20,
5
symbol: HDP
6
},
7
trade: {
8
}
9
]
Copied!
fname
age
networth
email
Joe
45
$130,000
Mark
33
$125,000
The difference between a Sensor and a Stream in the above example is that in the case of the sensor the user is primarily concerned with the actual value of the "Value". Meaning a spike in temperature or a drop in CPUs. But in a stream of customer data there isn't a time "X" and value "Y" there are many values "Y" and you a user is interested in the overall quality of both the entire stream and the individual values. Relationship analysis and other correlative functions apply here. If you were to chart a "stream" what would you chart? The row count volume or just one of the columns or the count of something? But if you were to chart a sensor you know exactly what you would chart... the "Value" over "Time".
Fortunately Owl has already thought and worked through the many nuances required to understand, monitor and predict accurately for all of these use-case. All that is required is to subscribe the stream.
Last modified 3mo ago