Introduction #
In this guide, we will walk you through the process of using xGen user interface by setting up source systems and then walk you through how to create syncs, data models, pipelines, jobs and schedules. Additionally, we will also cover how to set up data quality rules to check the data.
Getting Started #
Log in to xGen #
1. Go to http://xgendemo.uxli.com
2. Enter your username and password.
3. Click on Login
.
4. Select a subscription from the dropdown.
Create Space #
1. Setting up a new space in xGen is a quick and simple process! After logging in xGen, you’ll see the main navigation bar on the left side of your screen. Click on the Spaces tab to bring up the option to create space.
2. Click on + Space
followed by giving a unique name and description for the space. Click on + Create
.
3. On the main navigation bar on the left side of your screen, click on Spaces tab to bring up a list of all the created spaces. Here you select the desired space. Click on the star icon to make that space your default.
4. To add additional users to the Space, use the Manage Access option in the space settings.
Before creating a Space, you must ensure that Destination has been created (Data Lake and or Data Warehouse)
Setting up the Sources #
Create Postgres DB Source #
1. Click on the Sources tab on main navigation bar on the left side of your screen, to bring up a list of all available sources.
2. You can use the provided search bar or simply scroll down the list to find the source you want to replicate data from. Let’s use the PostgreSQL source.
3. The left half of the pop up contains a set of fields that you will have to fill out. Enter a name on the Source name field and a description to help you identify this instance of the connector.
4. Each connector in xGen will have its own set of authentication methods and configurable parameters.
5. For example, to set up PG_SALES_DB, provide the following settings:
Host | xgen-dev.ckvh7cnonk8s.us-east-1.rds.amazonaws.com |
Port | 5432 |
Database Name | f1race |
Schemas | sales |
Username | |
Password | |
SSL Modes | Disable |
SSH Tunnel Modes | No tunnel |
JDBC URL Parameters | |
Update Methods | Scan changes with user defined cursor |
6. Once all the required fields are filled out, click on the Validate
source button and xGen will run a check to verify the connection. After a successful validation, save the source by clicking on Create
.
Create File Source #
1. Click Sources on the main navigation bar on the left side of your screen and then click Upload File
.
2. On the File Upload screen, enter a name and description for the File Source.
3. Select the File Format (CSV, JSON, Excel, Parquet) from the dropdown.
4. Upload the file from your local system.
5. Click on Save
.
In preparation for the next steps, please use the below files and follow the above steps to create the 3 file sources:
- ‘Category_Forecast.xlsx’
- ‘Company_Data.csv’
- ‘Company Survey.xlsx’
Create Data Sync #
1. Click on the Spaces tab on the main navigation bar on the left side of your screen to bring up a list of all available spaces and select the desired space.
2. Once inside the space, click on the Sync menu option and click on + Add
.
3. From the Select Source
drop down choose the Source connector PG_SALES_DB and from the Select Destination
drop down choose the Destination connector UXLI_SALES DW. In the Sync name field, you can enter XDF_PG_SALES_DATA description as Sales Data.
4. Fill out all the required fields and select the entities that are to be synced from the source connector to the destination. Select the appropriate extraction mode for the entities, Full refresh | Overwrite, in this case.
5. Confirm or select primary key columns for the entities to be extracted. This action needs to performed only when primary key columns are not fetched from the source data.
6. Click on + Create
and xGen will create a Data Sync.
Similarly data sync for file sources need to be created. Please select the following file sources as Source Connectors and follow the above steps:
- ‘Category_Forecast.xlsx’ – XDF_CATEGORY_FORECAST (Key Columns – Forecast_Month and Category_Name)
- ‘Company_Data.csv’ – XDF_COMPANY_DATA (Key Columns – Customer ID and Survey_Date)
- ‘Company_Survey.xlsx’ – XDF_SURVEY_DATA (Key Column – CustomerID)
Create Data Model #
Create “DIM_PRODUCTS” Model #
1. Click on the Model menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Enter the model name DIM_PRODUCTS and a description.
3. Select the option Table and select Full mode in the Settings.
4. Click on Source Node
button, available on the models tools bar. Select PRODUCTS table from the drop down menu.
5. To add an additional table, click on Lookup Node
and select CATEGORIES table from the drop down menu.
6. Select the Join Node
to join the two tables. Select the appropriate columns from all the nodes. Join the PRODUCTS and CATEGORIES tables using a left outer join with the following join condition “s_products.CATEGORYID = l_categories.CATEGORYID”.
7. Connect the Join node to the ‘Target Node’ and select the ‘Key Columns’ to build the model.
8. Click on Show Node Data Preview
available on the Target node, to preview the data and Show Node SQL
to view the SQL code.
9. Click on + Create
to complete the build. Click on Run
to run the data load and Jobs
to view the job / run status of the data model.
Create “FCT_ORDERS” Model #
1. Click on the Model menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Enter the model name FCT_ORDERS and a description.
3. Select the option Table and select Full mode in the Settings.
4. Click on Source Node
button, available on the models tools bar. Select ORDER_DETAILS table from the drop down menu.
5. To add an additional table, click on Lookup Node
and select ORDERS table from the drop down menu.
6. Select the Join Node
to join the two tables. Select the appropriate columns from all the nodes. Join the ORDER_DETAILS and ORDERS tables using a left outer join with the following join condition “s_order_details.ORDERID = l_orders.ORDERID”.
7. On the join node, click on Calculations tab and add the following calculated field: “Revenue = s_order_details.Quantity * s_order_details.UnitPrice”.
8. Click on Show Node Data Preview
available on the Target node, to preview the data and Show Node SQL
to view the SQL code.
9. Click on + Create
to complete the build. Click on Run
to run the data load and Jobs
to view the job / run status of the data model.
Create “DIM_EMPLOYEE” Model #
1. Click on the Model menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Enter the model name DIM_EMPLOYEE and a description.
3. Select the option Table and select Full mode in the Settings.
4. Click on Source Node
button, available on the models tools bar. Select EMPLOYEES table from the drop down menu.
5. Click on Calculations
tab in the source node. Add following calculated field:
“Full_Name = concat( FirstName , ‘ ‘, LastName )”
6. Click on Show Node Data Preview
available on the Target node, to preview the data and Show Node SQL
to view the SQL code.
7. Click on + Create
to complete the build. Click on Run
to run the data load and Jobs
to view the job / run status of the data model.
Create “REP_SALES_DETAILS” Model #
1. Click on the Model menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Enter the model name REP_SALES_DETAILS and a description.
3. Choose to create a View. Click on Source Node
button. Select a FCT_ORDERS table from the drop down menu.
4. Click on Lookup Node
and select the tables DIM_PRODUCTS, DIM_EMPLOYEE, CUSTOMERS from the drop down menu.
5. Select the Star Node
to join all the tables. Select the appropriate columns from each of the nodes. Join the tables to Star Node with left outer join
“s_orders_data.CUSTOMERID = l_employee_dimension.CUSTOMERID” ,
“s_orders_data.PRODUCTID = l_employee_dimension.PRODUCTID”,
“s_orders_data.EMPLOYEEID = l_employee_dimension.EMPLOYEEID”
6. Join the node to the ‘Target Node’. Click on Show Node Data Preview
button to preview the node data and Show Node SQL
to view the node SQL.
7. Click on + Create
to complete the build. Click on ‘Run’ to run the data load into the model and ‘Jobs’ to view the job / run status of the data model.
Create “REP_SURVEY” Model #
1. Click on the Model menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Enter the model name REP_SURVEY and a description.
3. Choose to create a Table. Click on Source Node
button. Select a SURVEY table from the drop down menu.
4. To add an additional table, click on Lookup Node
and select COMPANY table from the drop down menu.
5. Select the Join Node
to join the two tables. Select the appropriate columns from all the nodes. Join the SURVEY and COMPANY tables using a left outer join with the following join condition “s_survey.CUSTOMERID = l_company.CUSTOMERID”.
6. Click on Show Node Data Preview
available on the Target node, to preview the data and Show Node SQL
to view the SQL code.
7. Click on + Create
to complete the build. Click on Run
to run the data load and Jobs
to view the job / run status of the data model.
Create Data Pipeline #
Create “DAILY_SALES_LOAD” Pipeline #
1. Click on the Pipeline menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Click on the Data Sync Connections
button from the toolbar and add XDF_PG_SALES_DATA to the pipeline from the list. Join it to the Start
node.
3. Click on Data Models
to add “DIM_PRODUCTS”, “FCT_ORDERS”, “DIM_EMPLOYEES” to the pipeline from the list.
4. Set the frequency of the Schedule and fill other required parameters like pipeline name as DAILY_SALES_LOAD, description and select Run with user ID
as Current User.
5. Click on + Create
to save the pipeline. Users can also trigger pipeline runs manually by clicking on Run
and also view the pipeline logs by clicking on Jobs
.
Govern #
Create “PROD-ID_UNIQUE_VALUE” Data Quality Rule #
1. Click on the Govern menu option, on the main navigation bar on the left side of your screen. Click on + Add
.
2. Select “DIM_PRODUCTS” from the Data Models node.
3. Navigate to Quality tab and click on +Add
.
4. Select ‘Column’ level data quality rule.
5. Select ProductID field.
6. Select “Completeness” as Rule Category.
7. Select “Verify Unique Values in Column”.
8. Select “String” as row condition.
9. Enter a name and a description and click on Apply
.
10. Click on Save
and Run
the Data Quality check.
Create “Employees” Data Quality Rule #
1. Select “Employees” from the Source Table.
2. Navigate to Quality tab and click on +Add
.
3. Select ‘Column’ level data quality rule.
4. Select PostalCode field.
5. Select “Accuracy” as Rule Category.
6. Select “Column Values are Not Null”.
7. Enter a name and a description and click on Apply
.
8. Click on Save
and Run
the Data Quality check.
Create “Order_Details” Data Quality Rule #
1. Select “Order_Deatils” from the Source Table.
2. Navigate to Quality tab and click on +Add.
3. Select ‘Table’ level data quality rule.
4. Select “Consistency” as Rule Category.
5. Select “Verify Row Count Range”.
6. Select “Min_value” as 1500, “Max_value” as 3000, “Group_by” as FieldLists, “Row_condition” as String.
7. Enter a name and a description and click on Apply
.
8. Click on Save
and Run
the Data Quality check.
Create “Orders” Data Quality Rule #
1. Select “Orders” from the Source Table.
2. Navigate to Quality tab and click on +Add
.
3. Select ‘Column’ level data quality rule.
4. Select OrderID field. Select “Completeness” as Rule Category.
5. Select “Verify Unique Values in Column”.
6. Select “String” as row condition.
7. Enter a name and a description and click on Apply
.
8. Click on Save
and Run
the Data Quality check.
Tags, Domain and Classification #
Create Tags, Domain and Classification #
1. Click on the Settings menu option, on the main navigation bar on the left side of your screen. Select Admin Settings
and navigate to Data Categories.
2. Type the tag name (e. ‘Source Data’) that needs to be created in Setup Tags dialog box and press ‘Enter‘ key. Click on Save
.
3. Similarly type the domain and classification name in the respective dialog box and press ‘Enter‘ key. Click on Save
.
Assign Tags, Domain and Classification #
1. Click on the Sync menu option, on the main navigation bar on the left side of your screen. Click on Show Details
button to open a sync of choice. Click on Edit
button. Click on the Tags
button and choose a tag from the drop down.
2. Similarly, tags can be assigned on Models. Navigate to Models, click on Show Details
button to open a model of choice. Click on Edit
button. Click on the Tags
button and choose a tag from the drop down.
3. To assign Domain, navigate to Govern and open a Data Model of choice. Click on Domain +
button and select a domain from the drop down.
4. To assign Classification, navigate to Govern and open a Data Model of choice. Click on Classification +
button and select a domain from the drop down.