Automating Your Extract Refresh Process
I’ve spoken with quite a few people who run into this problem so hopefully this will be helpful to you.
Setting Up the Refresh Job
The key to solving the issue is to set up a tableauserverclient
extract refresh job as the final step in your source refresh process.
How It Works
Step 1: SQL Server uses xpcmd
to execute a .bat
file.
Step 2: The .bat
file contains the command to run a Python script.
Step 3: The Python script kicks off the extract refresh job using tableauserverclient
, ensuring that your Tableau extracts are always up-to-date.
Running the Refresh Job with a Batch File
To automate the execution of your Python script, create a batch file (.bat) with the following content. This file, when executed, will run the Python script that initiates the extract refresh:
@echo off
REM Run the Python script to kick off the Tableau extract refresh job
python "C:\path\to\your_extract_refresh_script.py"
Python Script to Kick Off the Extract Refresh Job
Below is an example Python script that searches for a datasource by a name, retrieves its ID, and then triggers the extract refresh job. This script uses environment variables and the tableauserverclient
library:
import os
import tableauserverclient as TSC
from dotenv import load_dotenv
load_dotenv() # Load environment variables from .env file
SERVER_URL = os.getenv('TABLEAU_SERVER_URL')
PAT_NAME = os.getenv('TABLEAU_PAT_NAME')
PAT_SECRET = os.getenv('TABLEAU_PAT_SECRET')
SITE_ID = os.getenv('TABLEAU_SITE_ID')
DATASOURCE_NAME = 'rocket_launch' # Partial name of the datasource to search for
tableau_auth = TSC.PersonalAccessTokenAuth(PAT_NAME, PAT_SECRET, SITE_ID)
server = TSC.Server(SERVER_URL, use_server_version=True)
with server.auth.sign_in(tableau_auth):
# Create a RequestOptions object to filter datasources by partial name
options = TSC.RequestOptions()
options.filter.add(TSC.Filter(TSC.RequestOptions.Field.Name, TSC.RequestOptions.Operator.Equals, DATASOURCE_NAME))
all_datasources, pagination_item = server.datasources.get(req_options=options)
if all_datasources:
# Select the first matching datasource
datasource = all_datasources[0]
# Kick off the extract refresh job for the selected datasource
job = server.datasources.refresh(datasource)
print(f"Started extract refresh job for datasource: {datasource.name}")
else:
print("No datasource found with the provided partial name.")
The Benefits
By integrating this automated step, you eliminate mismatches between the source refresh times and the static extract refresh times—resulting in more reliable and consistent data updates.
Conclusion
Incorporating an extract refresh job at the end of your source refresh process not only streamlines your workflow but also boosts data accuracy. This simple yet effective setup can save time and reduce manual oversight, making your Tableau environment more efficient.