Tableau Data Extracts are quick and easy to manage, but if your organization has a large data set that needs to be shared, a Tableau Published Data Source is an excellent solution.

Tableau Desktop allows the user to connect to and analyze many types of data sources.  

  • Excel files
  • JSON
  • CSV files
  • SQL Databases
  • Salesforce
  • Mitto
  • And More...  

When a user establishes a connection to a data source, there are a couple of choices that face the user.  This article covers two options for management of this data, Tableau Extracts* and Tableau Published Data Sources.  

Need help managing these data sources and extracts? We are Tableau masters!

Related Article: Tableau - Extracting  Data

What are Tableau Data Extracts?

Tableau data extracts are a “snapshot” of data that is compressed, stored, and loaded into the memory.

Understanding a Tableau Data Extract

The best way to understand a Tableau data extract is to look at an example scenario: 1. User 1 will connect to the Superstore PostgreSQL Table. Since this is an extensive database, a data extract will be created. Multiple members of the organization can use this table.

Connecting to PostgreSQL
Connection to PostGreSQL 

2. Some sheets/dashboards are then created using this data extract:

Tableau Workbook

3. User 1 then publishes the Dashboard to Tableau Server and sets up a refresh extract to refresh this dashboard from the database every hour:

Tableau Data Extract

4. The next day User 2 comes along and makes the same connection with same data extract schedule to the same database table but creates a slightly different Dashboard:

Tableau Project Folder

5. So now there are two dashboards with two data extracts from the same database table.  The data extract is also querying the database twice, when it really only needs to be querying it once.

Disadvantages of a Tableau Data Extract

Based on this example, we can address a couple of disadvantages of Tableau data extracts.

Redundant Multiple Data Source Queries

If dashboards continued to be published in this manner the following situation arises.

Tableau Data Extract Flow Diagram
Figure I: Tableau Data Extracts. Note that each extract in 1,2,3 or 4 could contain exactly the same data. However the data cannot be shared between workbooks.

Querying Inefficiencies and Inconsistencies

In addition to this, on day 3, User 1 goes into her dashboard and creates the following calculation for Sales Commission, and uses this figure on a dashboard:

Sales Commission
Sales Commission Calculation for User 1

Unknown to each other and not to be outdone User 2, goes in and creates the same calculation, but uses a rate of 15%, and uses this figure on a dashboard:

Sales Commission

User 1 and User 2 have the same boss, who looks at both User's dashboards.  The boss is confused.  Not only do we have inefficiencies with the querying of the same data, but now we have inconsistencies in the actual data.

This is where a Tableau Published Data Source can help.

What is a Tableau Published Data Source?

A Tableau published data source is a centralized source that allows users to share data connections that they have defined. The Tableau source establishes a single source of truth, and allow users to have confidence in the data they are analyzing.

Understanding a Tableau Published Data Source

The best way to understand a Tableau Published Data Source is to look at an example scenario:

  1. The Boss sits User 1 and User 2 down and gets them to agree on how commission should be calculated.  
  2. They agree that it's actually 12.5%.  

Two Options for Data Extracts Management

In this situation, there are two options to manage and maintain consistent data extracts:

  1. User 1 and User 2 now have to keep their Data Extracts in sync manually.
  2. Use a Tableau Published Data Source. Once they agree to the structure of the data, they can both access the same information with efficiency and correctness.  

Related Article: Publish Data Sources and Workbooks

How a Tableau Published Data Source Works

A Tableau Published Data Source does start with a data extract, but once all the checks have been made, it needs to be published to Tableau Server or Tableau Online:

Publishing a data source
Publish to Server...
  1. Name the Data Source appropriately and put it in the appropriate project.
Configuring a published data source
Name the datasource appropriately

2.  The data source now becomes available on Tableau Server.  If the user has permission they can even create a Workbook using web edit with that data source.

Tableau Server Ask Data UI

However, if you're using Tableau Desktop to build your Dashboards, then you can connect to the data source.

Connecting to the data source
Connect to Tableau Data Source in Tableau Server
Selecting a data source from Tableau Server or Tableau Online
Select the verified data source

3.  So now, User 1 and User 2 can connect to this new Verified Data Source and use it as an established single source of truth of the data.  

Connected to a published data source in Tableau Desktop
User 1 and User 2 can now connec to the Verified Data Source

The data pipeline has also been made more efficient as the data source is refreshed once from PostGreSQL, not by every workbook that is connected to it.  The pipeline from PostGreSQL to Dashboard now looks like this:

Data source workflow
Tableau Published data source only queries the data source once and becomes the Single Source of Truth for all connected workbooks

Certify the Published Data Source

As an additional step, you need to verify that the data source is indeed the source of truth. User 1 or User 2, or even their Boss, if permissions allow, can certify this Published Data Source by:

  1. Clicking on the details icon in the Tableau Server Data Source Page:
Data source details
Users with appropriate permissions can Certify the data source

2. Certifying the data source puts a little green tick on the icon:

Certified data source
Certified Data Source

Other users can then be confident about using this Data source.

Data Quality Warnings

Users with appropriate permissions can also place warnings on Published Data Sources. This can be useful during periods when the data source might be going through changes:

Data source quality warning
Data Quality Warnings can notify your users about the state of the data

If a Published Data Source has a warning on it, any attempt to connect to it, displays a warning:

Best Practices for Maintaining Tableau Published Data Sources

It does take some extra management to maintain a Tableau Published Data Source.  At Zuar, the recommended practice is to separate the Published Data Source workbook from the workbook that is intended to be worked upon.

For Example:

  1. User 1 creates dashboards but is also the Data Custodian over the data source created above.
  2. User 1 creates dashboards in Sales Analysis.twb. This workbook has a connection to the Tableau Published data source.
  3. User 1 should maintain a separate workbook (ie Master Verified Data Source - super_store_orders.twb) where the original Tableau Extract for the verified data source is contained.  It is from this workbook, and only this workbook publishing to the Published Data Source on Tableau Server should be performed.

Common Pitfalls of using Tableau Published Data Sources

There are several common pitfalls of using Tableau Published Data Sources that you
should be aware of before adopting them into your system:

Editing a data source
A user cannot edit calculated fields from a Tableau Data Source
  • Calculated fields can be added to a given workbook, but they will be local to that workbook only and will not be available to others using that published data source.  If the calculation needs to be added, it can be added in the Master workbook discussed in point 3 above
  • Be careful with calculations involving row level security.  It is best to leave these out of the Published Data Source and handle these at a workbook by workbook level.
  • Dimension Aliases also need to be handled at the Master Data Source layer.  

Tableau Workbooks and Sheets

Tableau workbooks are where you are going to store your collections of data. Worksheets contain data sets within the workbook. The dashboard is where you can view a collection of data from multiple worksheets.

A “story” includes an organized series of worksheets or dashboards that contain data sources that relate to each other. Tableau workbooks are a great way to keep all of your data sources and extracts organized for maximum efficiency.

Tableau Data Extracts vs. Tableau Data Sources

There is a place for both Tableau extracts and Tableau Published Data Sources. It just depends on the use case.

Data Extracts

Data extracts are essentially a snapshot that is saved to your system memory and can be recalled quickly for visualization. This offers a much faster access to your workbooks.

Tableau Published Data Sources

Tableau Published Data Sources are a great way to centralize data, establish a single source of truth, and allow users to have confidence in the data they are analyzing.

These Tableau Data Sources offer real time-updates for your data, but the information is pulled straight from the database instead of your local memory; the performance usually isn’t as fast as Tableau extracts.

Criteria Tableau Extracts Tableau Data Sources
Design Snippets of data Centralized Data
Performance Faster because data is saved in memory Slower because pulling data from database
Speed Faster access to workbooks Real-time updates for data
Trusted Sources Snapshot of some data Single source of truth for all data

Still confused about Tableau extracts and data sources? Get in touch with Zuar today!

Related: How to Embed Tableau Into Salesforce

*Generally a Tableau Extract is used for large data sources.  Live connections can be used with small (or optimized) data sources.