Tableau Data Extracts are quick and easy to create, however if your organization has a large dataset that could be shared by your Organization, a Published Datasource could be a good solution.
Tableau Desktop allows the user to connect to and analyse many types of data sources. Excel files, JSON, CSV files, SQL Databases, Salesforce, Mitto, etc.
When a user establishes a connection to a datasource, there are a couple of choices that face the user. This article covers two options for management of this data, Tableau Extracts* and Tableau Published Datasources.
Related Article: Tableau - Extracting Data
Tableau Data Extracts
Lets go through an example. User 1 makes a connection to the Superstore PostGreSQL Table. This is a large database table so an extract will be created. This table also has multiple uses for multiple users in the Organization.
Some sheets/dashboards are then created using this extract:
User 1 then publishes the Dashboard to Tableau Server and sets up a refresh extract to refresh this dashboard from the database every hour:
The next day User 2 comes along and makes the same connection with same extract schedule to the same database table but creates a slightly different Dashboard:
So now there are two dashboards with two data extracts from the same database table. The data extract is also querying the database twice, when it really only needs to be querying it once.
If dashboards continued to be published in this manner the following situation arises.
In addition to this, on day 3, User 1 goes into her dashboard and creates the following calculation for Sales Commission, and uses this figure on a dashboard:
Unknown to each other and not to be outdone User 2, goes in and creates the same calculation, but uses a rate of 15%, and uses this figure on a dashboard:
User 1 and User 2 have the same boss, who looks at both User's dashboards. The boss is confused. Not only do we have inefficiencies with the querying of the same data, but now we have inconsistencies in the actual data.
This is where a Tableau Published Datasource can help.
Tableau Published Datasources - connect Multiple workbooks to a single Datasource.
The Boss sits User 1 and User 2 down and gets them to agree on how commission should be calculated. They agree that it's actually 12.5%. But now User 1 and User 2 now have to keep their Data Extracts in sync manually.
With a Tableau Published Datasource however, once they agree to the structure of the data, they can both access the same information with efficiency and correctness.
A Tableau Published Datasource does start with an extract, but once all the checks have been made, it needs to be published to Tableau Server/Online:
Name the Data Source appropriately and put it in the appropriate project.
The Datasource now becomes available on Tableau Server. If the user has permission they can even create a Workbook using web edit with that datasource.
However if you're using Tableau Desktop to build your Dashboards, then you can connect to the Datasource
So now, User 1 and User 2 can connect to this new Verified Datasource and use it as an established single source of truth of the data.
The data pipeline has also been made more efficient as the Datasource is refreshed once from PostGreSQL, not by every workbook that is connected to it. The pipeline from PostGreSQL to Dashboard now looks like this:
Certify the Published Datasource
As an additional step if it is verified that the data is indeed the source of truth. User 1 or User 2, or even their Boss, if permissions allow, can certify this Published Datasource by clicking on the details icon in on the Tableau Server Datasource Page:
Certifying the datasource puts a little green tick on the Datasource icon:
Other users can then be confident about using this Datasource.
Data Quality Warnings
Users with appropriate permissions can also place warnings on Published Datasources. This can be useful during periods when the data might be going through changes:
If a Published Datasource has a warning on it, any attempt to connect to it, displays a warning:
Best Practices for Maintaining Tableau Published Datasources
It does take some extra management to maintain a Tableau Published Datasource. At Zuar, recommended practice is to separate the Published Datasource workbook from the workbook that is intended to be worked upon.
- User 1 creates dashboards but is also the Data Custodian over the datasource created above.
- User 1 creates dashboards in Sales Analysis.twb. This workbook has a connection to the Tableau Published datasource.
- User 1 should maintain a separate workbook (ie Master Verified Datasource - super_store_orders.twb) where the original Tableau Extract for the verified datasource is contained. It is from this workbook, and only this workbook publishing to the Published Datasource on Tableau Server should be performed.
Common Pitfalls of using Tableau Published Datasources
- Calculated fields cannot be edited - the user will be prompted to "Edit copy..."
- Calculated fields can be added to a given workbook, but they will be local to that workbook only and will not be available to others using that published datasource. If the calculation needs to be added, it can be added in the Master workbook discussed in point 3 above
- Be careful with calculations involving row level security - it is best to leave these out of the Published Datasource and handle these at a workbook by workbook level.
- Dimension Aliases also need to be handled at the Master Data Source layer.
There is a place for both Tableau Extracts and Tableau Published Datasources. It just depends on the use case. Tableau Published Datasources are a great way to centralize data, establish a single source of truth and allow users to have confidence in the data they are analysing!
If you'd like help with your data, contact us at Zuar!
*Generally a Tableau Extract is used for large data sources. Live connections can be used with small (or optimized) data sources.