Steps for discovering and visualizing data in HDP on IBM Power Systems using QlikView
Qlik provides a business intelligence (BI) solution called QlikView. QlikView provides many
features beyond the typical BI reports and dashboards. Example capabilities include guided
analytics, security, customization, and scalability. QlikView supports accessing data in
Apache Hadoop environments. Validation testing was performed to verify QlikView’s
ability to integrate with and visualize data specifically with Hortonworks Data Platform (HDP)
on IBM® POWER8® processor-based servers. This article provides an overview of the
validation tests that were completed.
The key objectives for the validation testing of QlikView were to:
- Configure QlikView to connect to HDP 2.6 running on an IBM POWER8 processor-based
- Extract and visualize sample data from the Hadoop Distributed File System (HDFS) of HDP
running on a POWER8 processor-based server.
This section lists the high-level components of QlikView and HDP used in the test
- QlikView Personal Edition 12 for Microsoft Windows
- Hortonworks ODBC Driver for Apache Hive v2.1.5
- A notebook running Windows 7
Hortonworks Data Platform
- HDP version 2.6
- Red Hat Enterprise Linux (RHEL) version 7.2
- Minimum resources: Eight virtual processors, 24 GB memory, 50 GB disk space
- IBM PowerKVM™
- IBM POWER8 processor-based server
The deployment architecture is quite simple. QlikView and the Hortonworks ODBC driver were
installed and run on a Windows 7 system. HDP was installed and run on a POWER8 server.
QlikView and the ODBC driver were configured to connect to HDP. Data in HDP was accessed and
visualized by QlikView. Tests were run in a single-node HDP environment and a multi-node HDP
Installation and configuration
The section covers installation and configuration of a HDP cluster and QlikView
Installing and configuring the HDP cluster
Here are the high-level steps to install and configure the HDP cluster:
- Follow the installation guide for HDP on Power Systems (see Resources) to install and configure the HDP cluster.
- Log in to the Ambari server and ensure that all the services are running.
- Monitor and manage the HDP cluster, Hadoop, and related services through Ambari.
Setting up test data and Hive tables
Download the MovieLens and driver test data, copy the data to HDFS, and create Hive
- Download the MovieLens data set from here (see the
citation in Resources)
- Follow the instructions here to copy the
MovieLens dataset data to HDFS and set up Hive external tables. Use hive user ID
for the same.
- Download the driver data file from the Driver Behavior data file from here.
- Copy the driver data to HDFS.
# su – hive # hadoop fs -mkdir -p /user/hive/dataset/drivers # hadoop fs -copyFromLocal /home/np/u0014213/Data/truck_event_text_partition.csv /user/hive/dataset/drivers # hadoop fs -copyFromLocal /home/np/u0014213/Data/drivers.csv /user/hive/dataset/drivers # hadoop fs -ls /user/hive/dataset/drivers Found 2 items -rw-r--r-- 3 hive hdfs 2043 2017-05-21 06:30 /user/hive/dataset/drivers/drivers.csv -rw-r--r-- 3 hive hdfs 2272077 2017-05-21 06:30 /user/hive/dataset/drivers/truck_event_text_partition.csv
- Create Hive tables for driver data.
# su – hive # hive hive>create database trucks; hive> use trucks; hive> create table drivers (driverId int, name string, ssn bigint, location string, certified string, wageplan string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE TBLPROPERTIES("skip.header.line.count"="1"); hive> create table truck_events (driverId int, truckId int, eventTime string, eventType string, longitude double, latitude double, eventKey string, correlationId bigint, driverName string, routeId int, routeName string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE TBLPROPERTIES("skip.header.line.count"="1"); hive> show tables; OK drivers truck_events
- Load the data into the tables from the files in HDFS.
hive> LOAD DATA INPATH '/user/hive/dataset/drivers/truck_event_text_partition.csv' overwrite into table truck_events; hive> LOAD DATA INPATH '/user/hive/dataset/drivers/drivers.csv' overwrite into table drivers;
- Cross check the tables to ensure that the data is present by running queries on the
Installing and configuring Hortonworks ODBC driver
Here are the steps to install and configure the ODBC driver:
- Download the Hortonworks ODBC driver on Windows 7 (see Resources for the download website).
- Install and configure the ODBC driver. Follow the instructions in the guide listed in
the Resources section.
Installing and configuring QlikView
Here are the steps to install and configure QlikView:
- Go to the QlikView download page (see Resources) to
download QlikView Personal Edition on Windows 7.
- Follow the instructions to install it in the Windows 7 system.
Connecting HDP to QlikView
QlikView uses the following two methods for fetching data from the HIVE2 server running on
HDP. In the test, the first method (A) was used for ingesting data from Hive.
Method A: Data loaded to QlikView in-memory associative data store
Method B: QlikView hybrid solution – QlikView direct discovery on top of Hadoop
Here are the steps to configure the connection between HDP and QlikView.
- Launch the ODBC Administrator from Windows and add a data source for Hortonworks Hive as
shown in Figure 1.
Figure 1. Hortonworks Hive ODBC driver setup
- On the Windows 7 system, launch QlikView. From the Start menu, select
QlikView. Click File -> New and click
Cancel in the wizard that opens.
- Click File -> Edit Script, press Ctrl + E, or click Edit
Script on the tool bar. The Edit Script window opens as shown in Figure 2.
Figure 2. QlikView Edit Script window
- Connect to the HIVE2 server running on HDP 2.6 instance running on the IBM POWER8
processor-based server as shown in Figure 3. Select the ODBC data source added from the
ODBC Administrator in the previous step. Provide the Hive user name and password (use the
Hive DB password and not the Hive UNIX user password). The connection to HIVE2 must
succeed in order to continue.
Note: If you have already created a Hive DB and tables with a user name and password, use
the same user credentials here as well. In this test, we used hive as the
user name and Ibmpdp as the password.
Figure 3. Connecting to HDP
- From the Database drop-down list, select the Hive DB. Then select the tables columns for
analysis and visualization as shown in Figure 4. Then, click OK. Then,
click File -> Save Entire Document.
Figure 4. Connecting to HDP
- Click Reload to load the data from the Hive table into QlikView
application memory as shown in Figure 5 and Figure 6.
Figure 5. Loading the data into QlikView
Figure 6. Completion of data loading into QlikView
Visualization and analysis in QlikView
Here are the steps to visualize and analyze data using QlikView:
- Select the columns for visualization and analysis and add the columns in the next window
that appears. You can verify the data from the Hive table as shown in Figure 7.
Figure 7. Data in Hive table
- Analyze and visualize the data fetched from the Hive DB. Note that the
data is now in memory and analysis is done on the data in memory. Figures 8 – 11 show
example visualizations within the QlikView dashboard.
Figure 8. QlikView visualization example 1
Figure 9. QlikView visualization example 2
Figure 10. QlikView visualization example 3
Figure 11. QlikView visualization example 4
ArticleTitle=QlikView integrated with Hortonworks Data Platform (HDP) running on IBM Power Systems