Data Extraction in Tableau
Do you know how to refresh our extracted data with newly updated data?
No worries…!
In this article, we are going to know the answers to all the above questions. So, please read till the end you will surely get a few tricks about data extraction as well as some basic concepts full understanding in simple and understandable words.
Data extraction is one of the best features in Tableau. Extract data is a subset of data that can be used to increase the performance of reports. Extract filters applied to increase performance.
However, the data extraction is the most useful feature to store the data locally on the drive and can be accessed by Tableau.
Creating Extract
Extraction of data can be done in the data pane just by Right-click on the data source connection. You will get the window with many options such as edit, refresh, view, etc. Click on the Extract Data option from the list.
If you want to extract all the data instantly then select the Extract option from the list.
Now it will populate with a window as below:
Select All rows to extract all data in a data source and press OK.
After data extraction completes, we can see there are two cylinders with an arrow checked on the data source name. It indicates the data is extracted and we are currently connected to the extracted data set.
Now if you want to save the data source locally, then you can save it just by clicking on the save menu or save as menu in the toolbar.
If anyone wants to replace the published data source, then select Replace Data Source by right-clicking on the data source.
Now before that we need to verify that the published data source will be replaced by the local data source, and then click OK.
Now, Right-click the published data source, and then click Close.
Select File > Save As.
Here, From the Save as a drop-down menu, select Tableau Packaged Workbook (*.twbx).
After the extract of the local copy is created and the packaged workbook saved, you can send your workbook to your colleague.
Apply Extract Filters
To restrict the data while pulling into the Tableau, we need to apply an extract filter on the data source. To apply the Extract filter, follow the below steps:
1. Right-click on the published data source.
2. Click on the Extract Data menu command.
3. On the pop-up window, click on the Add button.
4. Select the field on which you want to apply the filter and then choose the fields from the existing list to filter out.
Now, in data, we can see there is only filtered data is present. Unwanted data is removed.
An extract filter can be applied only if the data is extracted.
Extract filters can be applied directly from the Data Source page which is at the start of the tableau where there is a button to select extract connection and add the button to apply extract filter there itself.
Remove extract Filter
Removing extract filters is very simple and easy. To remove extract filters, follow the below steps:
1. Go to the data source page where we have two options to select Live and Extract.
2. Click on the Edit button to edit the filter. The window of Extract Data will pop-up.
3. Select the file which we want to remove from the filter and then click on the remove button from the window.
4. Yeah! We removed the Extract filter applied.
It’s very easy to apply extract filters on measures.
While you are adding a field into the filter, simply select the measures field into the filter.
Just change is that we will get a filter window to select the minimum and maximum range of values of the data.
Adding New Data to Extract
If we want to add more data manually to the extracted data source, then you can select the option Right click on Data -> Extract -> Append Data from File. After this step, just browse the file containing the data and click OK.
Now, the values and data type of columns in the data source file should be in sync with the existing data.
Extract History
After all the process as we have discussed above, if you want to see the history of the data extraction, then it is easy to check now.
Just click on Data->Extract History.
Here we can check the exact date and time when we had extracted the data and what action we have done on that extract and how many rows are present in the extract data file.
So, Tableau Data Extract is very much important to make dashboards or reports performance faster.
After creating the data extract, we can refresh the existing data with updated data by options Full Refresh and Incremental Refresh. While refreshing your data extract files, we need to aware of the file type which we are updating.
Suppose if you are updating the .tde file in Tableau version 2020.3, then it will get updated to the .hyper file automatically. The .hyper file is very beneficial and if you want to go to reverse back to your earlier version of the file then it will not possible.
Now, here we will go over some of the ways to refresh the data files and get updated data into reports or dashboards.
Refresh the data extracts in two ways
1. Full Refresh:
When we are refreshing our data, by default it is refreshing fully. That means every row in the data is getting updated as per the recent data present in the database.
The full refresh may take more hours to refresh the data as compared to an incremental refresh. Depending on the size of the extract it defines the period to complete the refresh process. If you are not defining anything on your dataset, what refresh you want to proceed, then usually it will be full refresh.
How we can set up this refresh?
First, we need to publish the data file on Tableau Server just by clicking on Publish data to Tableau Server.
Now, you can specify the Refresh Schedule (Full Extract) to whatever you want. And click on Publish button. After successful completion it will prompts as Publish complete.
2. Incremental Refresh:
As the databases are filling day by day, month by month, most of the time we prefer incremental refresh rather than a full refresh. Incremental refresh will refresh the newly added data. For example, if you have the e-commerce data and the sales data is getting added every day into the database, you want the fully updated data at the weekend, then we will schedule an incremental refresh every weekend. So that we will get all the newly added sales data with existing data on the weekend. Also, if the data structure or anything changes in the existing data, then we should schedule the full refresh before the incremental refresh.
You can follow the below steps to schedule incremental refresh on data:
1. Select Data -> Extract Data.
2. Now in Extract Data, Select All Rows as the number of rows to be selected to refresh. All rows should be refreshed when you are refreshing data incrementally.
3. Select check box of Incremental Refresh and then specify the column in the database by which we can recognize the newly added data in the database.
4. For instance, if you add there Order ID column, then what refresh will do here is that it will refresh the data which is coming after the Order ID from the previous one till which we have refreshed earlier.
5. After all these steps, click on Extract.
6. In all the above steps we are creating new extracts, defining new extract. If you want to ensure that the data is updating successfully, then we should check the extract history always and ensure that the data is updating with correct data and on time.
The extracted data is stored in TDE file that is Tableau Data Extract.
Advantages of data extract
Performance faster: Data extract helps in building dashboards with less time consuming and performance faster. Also, you do not have to worry about the database location as the data will be present locally on your machine.
Snapshot of Data: You can take snapshot of your data and can connect to live data whenever you want updated data.
Easy to handle larger data sources: We can create huge data extract file in less time.
Internet connectivity: We can access the data offline and no internet connectivity required to connect to local data. For example, when you are travelling and not able to access any internet connectivity then this extract data file will help you to analyze the existing data.
Helps optimizing extracts: When you create data extracts, if we want to create it faster then we need to optimize the extracts. What optimize extract will do is it will check your data and any calculation done within it and do that calculation at the first and avoid the fields to re calculate every time. So, this will be performance faster.
Conclusion:
Finally! We have almost covered all the things related to tableau extract files and processes. So, now we are all set to create a smooth environment and run our dashboards efficiently.
If you have any question in your mind, then please reach out to us. We would love to help you.
Happy Learning!
Thank you Very much!