Getting out of dependency hell in QlikView

Keanu-1


TL; DR;


The article describes how Apache Airflow was implemented to manage reporting update jobs built on QlikView in a fairly large implementation.


Dependency hell (English dependency hell) is the antipattern of configuration management, the expansion of the graph of mutual dependencies of software products and libraries, which leads to the difficulty of installing new and uninstalling old products.

Wikipedia


, 2018 , BIA-Technologies.


QlikView " ".


Qlik QlikSense, QlikView, 12.4, .


, , .



QlikView "" (application) — , "QVW" ( ), , , .


QlikView

QV Model


QV Script


QV Interface


, .


-- , , - QV. . , , . , "", ETL- , .


QV QVW , : , , . .


, , QVD. QlikView , , .


, ETL- , , QVD, , .


Qlik , , , (), - . QlikView QVD- . , QlikView, , , , .
, Qlik — NPrinting, , , , , SSRS, , QlikView, .


, QlikView:


  • , , QlikView Server (QVS). .QVW .
  • QlikView Distribution Service (QDS). Publisher , Reload Engine, QlikView, : , QVS, " ", , , , . , , , QlikView, , .
  • , QlikView Management Service (QMS) — .
  • , , - —

Qlik

QV Architecture



, MS SQL, ETL QlikView .


( ) , .QVW , . , . , , , QVD.


:


  • , , .
  • , - , 8-9 , . , , - , - 4 , , .

QMS , , , . , ( Reload Engine, . Publisher , rusbaron , ).


QMS :


QMS


, , .


, QMS . (, ).


, , , .


, .
, .


  • , .


    , , , ( ), , , - , ..


    , .


    - , , , . , , .


    , , ETL-, , , . , , , ETL- .


  • .


    , — , , ,- — , .


    - , , .



  • , . - , -. .



  • , . , , , . , , , . , , — .


    , QDS . , - .



  • , . : , .


    ? . . ? .



, QlikView , , , .


, , , QlikView, . , , Hadoop ETL- .



QlikView , . , , .


QlikView : QMS QlikView Management Service API, . EDX (Event Driven Execution). , , QMS .


, " ", , , , , NPrinting.


, , — (Workflow management systems). , Apache Airflow.


Airflow — , . , , :


Airflow interface



— , -.


Airflow , , Airflow QMS, Airflow — , , . , , : -> Airflow — QlikView.


API QMS . , .


Airflow — , . , , , , .


Airflow . , (, , ), QlikView, , , .


Airflow . , , - . , , : , . , , , , , , .


Airflow (DAG — directed acyclic graph), , — .


, "Application 7" "ETL_4" "ETL_5". "ETL_3" "TimeSensor_6_30" — , 6:30.
DAG


Python.


tasksDict = {
    u'ETL_1.qvw': {},

    u'ETL_2.qvw': {
        'Pool': 'Heavy_ETL_pool',
    },

    u'ETL_3.qvw': {
        'StartTime': [6, 30]},

    u'ETL_4.qvw': {
        'Priority': -5,
        'Dep': [
            u'ETL_1.qvw',
            u'ETL_3.qvw', ]},

    u'ETL_5.qvw': {
        'Dep': [
            u'ETL_2.qvw', ]},

    u'Application_6.qvw': {
        'Dep': [
            u'ETL_1.qvw',
            u'ETL_5.qvw', ]},

    u'Application_7.qvw': {
        'Dep': [
            u'ETL_4.qvw',
            u'ETL_5.qvw',
        ]},
}

Airflow Python, , , . QlikView . , ( , , ) DAG. , , DAG .


: (5 ), (1 ) (100 ). . Airflow — — , , , , , , , , .



, DAG,
from datetime import datetime, timedelta
from airflow import DAG
from airflow.sensors.time_delta_sensor import TimeDeltaSensor
from airflow.contrib.operators.qds_operator import QDSReloadApplicationOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 1, 19),
    'email': ['Airflow.Administrator@mycompany.ru'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'pool': 'default_pool',
}

dag = DAG('example_qds_operator',
          description='Reload applications at QlikView Distribution Service',
          catchup=False,
          schedule_interval='0 21 * * 5',
          default_args=default_args)

tasksDict = {
    u'ETL_1.qvw': {},

    u'ETL_2.qvw': {
        'Pool': 'Heavy_ETL_pool',
    },

    u'ETL_3.qvw': {
        'StartTime': [6, 30]},

    u'ETL_4.qvw': {
        'Priority': -5,
        'Dep': [
            u'ETL_1.qvw',
            u'ETL_3.qvw', ]},

    u'ETL_5.qvw': {
        'Dep': [
            u'ETL_2.qvw', ]},

    u'Application_6.qvw': {
        'Dep': [
            u'ETL_1.qvw',
            u'ETL_5.qvw', ]},

    u'Application_7.qvw': {
        'Dep': [
            u'ETL_4.qvw',
            u'ETL_5.qvw',
        ]},
}

airflowTasksDict = {}

for task in tasksDict.keys():
    task_id = task.replace(" ", "_").replace("'", "").replace("/", "_").replace("(", "_").replace(")", "_").replace(",", "_").replace(".qvw", "").replace("__",
                                                                                                                                                          "_")
    AirflowTask = QDSReloadApplicationOperator(document_name=task, task_id=task_id, qv_conn_id='qv_connection', dag=dag)
    airflowTasksDict[task] = AirflowTask

for task in tasksDict.keys():
    if 'Dep' in tasksDict[task]:
        for dep in tasksDict[task]['Dep']:
            airflowTasksDict[task].set_upstream(airflowTasksDict[dep])

    if 'Pool' in tasksDict[task]:
        airflowTasksDict[task].pool = tasksDict[task]['Pool']

    if 'Priority' in tasksDict[task]:
        airflowTasksDict[task].priority_weight = tasksDict[task]['Priority']

    if 'StartTime' in tasksDict[task]:
        hour = tasksDict[task]['StartTime'][0]
        minute = tasksDict[task]['StartTime'][1]
        sensorTime = timedelta(hours=hour, minutes=minute)
        sensorTaskID = u'TimeSensor_{}_{}'.format(hour, minute)

        if sensorTaskID not in airflowTasksDict:
            SensorTask = TimeDeltaSensor(delta=sensorTime, task_id=sensorTaskID, pool='Sensors', dag=dag)
            airflowTasksDict[sensorTaskID] = SensorTask
        airflowTasksDict[task].set_upstream(airflowTasksDict[sensorTaskID])

if __name__ == '__main__':
    dag.clear(reset_dag_runs=True)
    dag.run()

, Airflow QlikView:
DAG


Airflow .


.


- , - , .


, ETL- , , .


, Airflow . .
, , .


, — 1 . , -, .


. , , , . , .


.


Airflow builds up quite useful reporting on task performance statistics, and we can track the dynamics of job duration growth, as well as receive other reports, for example, a Gantt chart:


Gantt


Combat Airflow works with us in a virtual machine with two cores and 4 Gb of RAM, and this is enough for the smooth operation of the service.


In general, I believe that we have successfully resolved the issue of updating QlikView reporting, and Airflow helped us invaluable in this.


Keanu-2

Source: https://habr.com/ru/post/undefined/


All Articles