Welcome to fusionSpan

Resources - The fusionSpan Blog

Building Pipelines with Tibco Cloud Integration
By Alan Luo | February 11, 2021

TIBCO Cloud Integration, formerly known as TIBCO Scribe Online, is a cloud-based data integration tool that helps businesses sync data between two existing applications. TIBCO Cloud Integration has a wide variety of connectors, allowing it to extract data from HubSpot, Google Analytics, Salesforce, Amazon Redshift, Amazon S3, Dropbox, databases, and much more.

While TIBCO Cloud Integration offers code-free customization, the fusionSpan team has been leveraging code to get the most out of the TIBCO Cloud Integration. Successful projects include creating pipelines using API calls, integrating pipeline notifications to Slack, automating FTP uploads, and writing complex SQL processes.

In one of our projects to send data from one system to another, we leverage SQL and the TIBCO Cloud Integration to ensure quick and efficient data transfer. The pipeline we built allows data to go through multiple stages of transformation before reaching its final destination.

So you might be wondering, how are these pipelines built with TIBCO Cloud Integration?

Creating the Staging Area

Before we start creating our ETL processes, we need to ensure that we have a staging area to hold the data. Since we were using SQL to transform our data, we chose Microsoft SQL Server as our staging area. Within our staging area, we create two databases, one to hold data in the process of being transformed and the other to store data ready to be uploaded.

Once the staging area is set up, we start building flows that transform the data. Within Tibco, data operations are controlled by blocks; you can set up a data flow by adding filters and quickly calculating measures through clicking and dragging blocks. To create more complex data transformations, embedding SQL queries in blocks seems the most efficient way so far.

In the screenshot below to the left, we have an example of an extraction flow. The flow queries (extracts) data from the starting application and stores the data in the staging table called “1.Promotions”.

After running the flow, we start transforming the data by utilizing native queries, as shown in the screenshot below to the right. We write the SQL code and embed it in the native query block to conduct operations such as filter, join, group by, unions, case when.

This flow will eventually transform the data within the staging table and store it in a new table called “2.Promotions”. Once that is complete, we copy the contents of that table to the “Final Upload Database.” Simply put, our data goes through these stages:

“Data Source” → “Staging Table 1” → “ Transformation via Native Query” → “Staging Table 2” → “Final Upload Table” → “Destination App”

If more transformations are needed, we rinse and repeat, adding more flows in Tibco and more tables in the staging area.

Net Change vs. Data Dumping

To ensure our data flows don’t take more than an hour or two to run, we utilize net change and data dumping to increase flow processing speed.

For net change, Tibco Cloud Integration gives users the option to progressively load and update data provided that a record change date or change data capture (CDC) exists. Every time Tibco runs a flow, it will bookmark the latest date that data was processed, and when new records come in, it will only process those new records.

By enabling net change, we avoid having Tibco compare millions of records to what already exists in our staging area. However, there are scenarios where a data source does not have a record change date; in that case, we would use data dumps instead.

Within Tibco, a “create” block allows users to process data in batch, by default, 2000 records per batch. This increases pipeline efficiency significantly. Before dumping, we utilize a native query to truncate the tables within the staging areas first. Once all data is cleared, we run the create flow to allow records to repopulate those tables with new data quickly. With Tibco, we can dump millions of records into the staging area in just a few minutes.

Documentation and Notifications

Careful documentation of what happens in the flow is considered best practice. Tibco offers blank fields where users can fill out the information about the flow process and explain what is happening during the ETL process. However, in addition to adding explanations in Tibco flows, we also recommend creating shared documentation that contains a list of all flows, their functionalities, and their runtimes. This allows you to see all ETL processes in one list and pinpoint necessary changes quickly.

Regarding notifications, Tibco users will be automatically notified of any flow failures. Take the time to go over any flow failures since data sync errors can cause many problems once an application fully relies on it.

Summary

Tibco Cloud Integration is a great tool that offers users options to connect to different data sources and rapidly build pipelines. As a Platform-as-a-Service (iPaaS), it empowers more people within the organization to utilize data pipeline processes, giving organizations an efficient option to sync data across all boards. fusionSpan, on the other hand, has been using Tibco Cloud Integration in conjunction with our scripts and code to help clients with more advanced data integration solutions. Whether it’s combining data sync with API calls, or creating a complex pipeline with SQL, or automating an FTP upload process, fusionSpan has been able to achieve all this with Tibco Cloud Integration. We look forward to seeing more Tibco features and improvements offered in the future.

New call-to-action

+ posts

Alan has been working in the field of Ecommerce for 4 years and has extensive knowledge in Online Marketing and Data Analytics. He has practical experience scraping and cleaning data and has tackled a few data science projects in his career. During his free time, he would listen to 80s pop and J-rock, or would try a few riffs on his guitar.

Follow Us on Twitter!

Latest News