Tableau Data Extracts are quick and easy to manage, but if your organization has a large data set that needs to be shared, a Tableau Published Data Source is an excellent solution.
Tableau Desktop allows the user to connect to and analyze many types of data sources. Excel files, JSON, CSV files, SQL Databases, Salesforce, Mitto, etc.
When a user establishes a connection to a data source, there are a couple of choices that face the user. This article covers two options for management of this data, Tableau Extracts* and Tableau Published Data Sources. Need help managing these data sources and extracts? We are Tableau masters!
Related Article: Tableau - Extracting Data
Tableau Data Extracts
Tableau data extracts are a “snapshot” of data that is compressed, stored, and loaded into the memory.
The best way to understand a Tableau data extract is to look at an example scenario. First, User 1 will connect to the Superstore PostgreSQL Table. Since this is an extensive database, a data extract will be created. Multiple members of the organization can use this table.
Some sheets/dashboards are then created using this data extract:
User 1 then publishes the Dashboard to Tableau Server and sets up a refresh extract to refresh this dashboard from the database every hour:
The next day User 2 comes along and makes the same connection with same data extract schedule to the same database table but creates a slightly different Dashboard:
So now there are two dashboards with two data extracts from the same database table. The data extract is also querying the database twice, when it really only needs to be querying it once.
If dashboards continued to be published in this manner the following situation arises.
In addition to this, on day 3, User 1 goes into her dashboard and creates the following calculation for Sales Commission, and uses this figure on a dashboard:
Unknown to each other and not to be outdone User 2, goes in and creates the same calculation, but uses a rate of 15%, and uses this figure on a dashboard:
User 1 and User 2 have the same boss, who looks at both User's dashboards. The boss is confused. Not only do we have inefficiencies with the querying of the same data, but now we have inconsistencies in the actual data.
This is where a Tableau Published Data Source can help.
The Boss sits User 1 and User 2 down and gets them to agree on how commission should be calculated. They agree that it's actually 12.5%. But now User 1 and User 2 now have to keep their Data Extracts in sync manually.
With a Tableau Published Data Source however, once they agree to the structure of the data, they can both access the same information with efficiency and correctness.
A Tableau Published Data Source does start with a data extract, but once all the checks have been made, it needs to be published to Tableau Server or Tableau Online:
Name the Data Source appropriately and put it in the appropriate project.
The data source now becomes available on Tableau Server. If the user has permission they can even create a Workbook using web edit with that data source.
However, if you're using Tableau Desktop to build your Dashboards, then you can connect to the data source.
So now, User 1 and User 2 can connect to this new Verified Data Source and use it as an established single source of truth of the data.
The data pipeline has also been made more efficient as the data source is refreshed once from PostGreSQL, not by every workbook that is connected to it. The pipeline from PostGreSQL to Dashboard now looks like this:
Certify the Published Data Source
As an additional step, you need to verify that the data source is indeed the source of truth. User 1 or User 2, or even their Boss, if permissions allow, can certify this Published Data Source by clicking on the details icon in the Tableau Server Data Source Page:
Certifying the data source puts a little green tick on the icon:
Other users can then be confident about using this Data source.
Data Quality Warnings
Users with appropriate permissions can also place warnings on Published Data Sources. This can be useful during periods when the data source might be going through changes:
If a Published Data Source has a warning on it, any attempt to connect to it, displays a warning:
Best Practices for Maintaining Tableau Published Data Sources
It does take some extra management to maintain a Tableau Published Data Source. At Zuar, the recommended practice is to separate the Published Data Source workbook from the workbook that is intended to be worked upon.
- User 1 creates dashboards but is also the Data Custodian over the data source created above.
- User 1 creates dashboards in Sales Analysis.twb. This workbook has a connection to the Tableau Published data source.
- User 1 should maintain a separate workbook (ie Master Verified Data Source - super_store_orders.twb) where the original Tableau Extract for the verified data source is contained. It is from this workbook, and only this workbook publishing to the Published Data Source on Tableau Server should be performed.
Common Pitfalls of using Tableau Published Data Sources
- Calculated fields cannot be edited - the user will be prompted to "Edit copy..."
- Calculated fields can be added to a given workbook, but they will be local to that workbook only and will not be available to others using that published data source. If the calculation needs to be added, it can be added in the Master workbook discussed in point 3 above
- Be careful with calculations involving row level security - it is best to leave these out of the Published Data Source and handle these at a workbook by workbook level.
- Dimension Aliases also need to be handled at the Master Data Source layer.
Tableau Workbooks and Sheets
Tableau workbooks are where you are going to store your collections of data. Worksheets contain data sets within the workbook. The dashboard is where you can view a collection of data from multiple worksheets. A “story” includes an organized series of worksheets or dashboards that contain data sources that relate to each other. Tableau workbooks are a great way to keep all of your data sources and extracts organized for maximum efficiency.
Tableau Data Extracts vs. Tableau Data Sources
There is a place for both Tableau extracts and Tableau Published Data Sources. It just depends on the use case. Data extracts are essentially a snapshot that is saved to your system memory and can be recalled quickly for visualization. This offers a much faster access to your workbooks. Tableau Published Data Sources are a great way to centralize data, establish a single source of truth, and allow users to have confidence in the data they are analyzing. These Tableau Data Sources offer real time-updates for your data, but the information is pulled straight from the database instead of your local memory; the performance usually isn’t as fast as Tableau extracts.
Still confused about Tableau extracts and data sources? Get in touch with Zuar today!
*Generally a Tableau Extract is used for large data sources. Live connections can be used with small (or optimized) data sources.