Skip to content

Snowflake

Connecting Snowflake to Tallyfy Analytics

Snowflake can read your Tallyfy Analytics data directly from S3 - no data duplication or ETL pipelines needed. If you already have Tallyfy Analytics enabled, you have everything you need to get started.

Your analytics data is stored as Apache Parquet1 files in Amazon S3. Snowflake reads these files natively through external stages, giving you full SQL access to your process and member data.

What you will need

When Tallyfy provisions analytics for your organization, you receive:

  • Access Key ID - Your AWS access key (token-based authentication)
  • Secret Access Key - Your AWS secret key
  • S3 Location - The path to your organization’s analytics data
  • Region - us-west-2 (all Tallyfy Analytics data is stored here)

These are the same credentials you may already use for Athena, Power BI, or Tableau connections. Snowflake uses the same Access Key and Secret Access Key.

Setting up Snowflake

Step 1: Create a storage integration

The storage integration tells Snowflake where your data lives and how to authenticate. Run this in your Snowflake worksheet:

CREATE OR REPLACE STORAGE INTEGRATION tallyfy_analytics_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = ''
STORAGE_ALLOWED_LOCATIONS = ('s3://YOUR_S3_LOCATION/')
STORAGE_AWS_EXTERNAL_ID = '';

Step 2: Create an external stage

The external stage points Snowflake to your specific analytics data folder. This is where you use the S3 location and AWS credentials provided by Tallyfy.

With key-based authentication (recommended for most setups):

CREATE OR REPLACE STAGE tallyfy_analytics_stage
URL = 's3://YOUR_S3_LOCATION/'
CREDENTIALS = (
AWS_KEY_ID = 'YOUR_ACCESS_KEY_ID'
AWS_SECRET_KEY = 'YOUR_SECRET_ACCESS_KEY'
)
FILE_FORMAT = (TYPE = PARQUET);

With storage integration (if using IAM roles):

CREATE OR REPLACE STAGE tallyfy_analytics_stage
STORAGE_INTEGRATION = tallyfy_analytics_integration
URL = 's3://YOUR_S3_LOCATION/'
FILE_FORMAT = (TYPE = PARQUET);

Replace YOUR_S3_LOCATION, YOUR_ACCESS_KEY_ID, and YOUR_SECRET_ACCESS_KEY with the values from your Tallyfy Analytics setup.

Step 3: Verify the connection

Check that Snowflake can see your files:

LIST @tallyfy_analytics_stage;

You should see a list of Parquet files - your process analytics data and member list.

Querying your data

Process and task data

Each Parquet file in the analytics folder contains data about a specific process run. To query across all your process data:

SELECT
$1:process_name::STRING AS process_name,
$1:process_status::STRING AS process_status,
$1:process_task_name::STRING AS task_name,
$1:process_task_status::STRING AS task_status,
$1:user_assigned::STRING AS assigned_to,
$1:completed_on::STRING AS completed_date,
$1:due_by::STRING AS due_date
FROM @tallyfy_analytics_stage/analytics/
(FILE_FORMAT => 'tallyfy_parquet');

Member data

Your organization’s member list is available as a separate Parquet file:

SELECT
$1:email::STRING AS email,
$1:full_name::STRING AS full_name,
$1:job_title::STRING AS job_title,
$1:is_active::STRING AS is_active,
$1:last_login_at::STRING AS last_login,
$1:team::STRING AS team
FROM @tallyfy_analytics_stage/members/
(FILE_FORMAT => 'tallyfy_parquet');

Create a file format for cleaner queries

To avoid repeating format options, create a named file format:

CREATE OR REPLACE FILE FORMAT tallyfy_parquet
TYPE = PARQUET
SNAPPY_COMPRESSION = TRUE;

Common analytics queries

Process completion summary

SELECT
$1:blueprint_name::STRING AS template_name,
COUNT(DISTINCT $1:process_id::STRING) AS total_runs,
COUNT(CASE WHEN $1:process_status::STRING = 'complete' THEN 1 END) AS completed,
COUNT(CASE WHEN $1:process_status::STRING = 'active' THEN 1 END) AS in_progress
FROM @tallyfy_analytics_stage/analytics/
(FILE_FORMAT => 'tallyfy_parquet')
GROUP BY template_name
ORDER BY total_runs DESC;

Overdue tasks

SELECT
$1:process_name::STRING AS process_name,
$1:process_task_name::STRING AS task_name,
$1:user_assigned::STRING AS assigned_to,
$1:due_by::STRING AS due_date,
$1:process_task_status::STRING AS status
FROM @tallyfy_analytics_stage/analytics/
(FILE_FORMAT => 'tallyfy_parquet')
WHERE $1:process_task_status::STRING != 'complete'
AND $1:due_by::STRING < CURRENT_DATE()::STRING
AND $1:due_by::STRING != ''
AND $1:due_by::STRING != 'nan';

Task completion by assignee

SELECT
$1:user_assigned::STRING AS assignee,
COUNT(*) AS tasks_completed,
COUNT(DISTINCT $1:process_id::STRING) AS processes_involved
FROM @tallyfy_analytics_stage/analytics/
(FILE_FORMAT => 'tallyfy_parquet')
WHERE $1:process_task_status::STRING = 'complete'
AND $1:user_assigned::STRING != ''
AND $1:user_assigned::STRING != 'nan'
GROUP BY assignee
ORDER BY tasks_completed DESC;

Form field responses

SELECT
$1:process_name::STRING AS process_name,
$1:question_in_form_field::STRING AS question,
$1:answer_in_form_field::STRING AS answer,
$1:completed_on::STRING AS answered_on
FROM @tallyfy_analytics_stage/analytics/
(FILE_FORMAT => 'tallyfy_parquet')
WHERE $1:form_field_type::STRING != ''
AND $1:form_field_type::STRING != 'nan';

Data fields reference

Process and task fields

All fields are stored as STRING type in the Parquet files. Cast them as needed in your queries.

FieldDescription
blueprint_idTemplate ID
blueprint_nameTemplate name
blueprint_versionTemplate version number
process_idUnique process run ID
process_nameName of the running process
process_tagProcess tag or label
process_ownerProcess owner name
process_statusCurrent status (active, complete, archived)
process_completed_atTimestamp when process was completed
process_task_idTask ID within the process
process_task_nameTask name
process_task_statusTask status (active, complete)
typeTask type
user_assignedAssigned user name
assigned_user_idAssigned user ID
guest_assignedAssigned guest email
total_assigneesTotal number of assignees
assigned_or_shown_onDate task was assigned
due_byTask deadline
completed_onDate task was completed
completed_byWho completed the task
has_form_fieldsWhether task has form fields
form_field_typeType of form field
question_in_form_fieldForm field label/question
answer_in_form_fieldForm field response
issue_reportedWhether an issue was reported
no_of_issue_reportedNumber of issues reported
total_commentsNumber of comments
commentComment text
milestone_nameMilestone name if applicable

Member fields

FieldDescription
idMember ID
emailEmail address
full_nameFull name
first_nameFirst name
last_nameLast name
job_titleJob title
teamTeam name
is_activeActive status
is_suspendedSuspended status
created_atAccount creation date
last_login_atLast login timestamp
timezoneUser timezone
country_idCountry identifier

Important notes

Data format - All fields in the Parquet files are cast to STRING. You will need to cast dates and numbers in your Snowflake queries as needed.

NULL handling - Empty values may appear as blank strings or the literal string nan (a side effect of the Pandas conversion). Filter both in your queries:

WHERE $1:field_name::STRING != '' AND $1:field_name::STRING != 'nan'

Timestamps - Date fields use ISO 8601 format without timezone (e.g., 2025-03-15T14:30:00). These represent UTC times.

Data freshness - Analytics data is updated when process events occur (task completions, status changes, form submissions). There is no fixed refresh schedule - data appears as events happen.

Existing Athena access - Snowflake and Athena can both read the same S3 data simultaneously. Adding Snowflake does not affect your existing Athena, Power BI, Tableau, or Looker connections.

Region requirement - Your Snowflake account should ideally be in us-west-2 (Oregon) for best performance, since the analytics data is stored in that region. Cross-region access works but adds latency.

Analytics > How Tallyfy Analytics works

Tallyfy Analytics transforms workflow data through a five-step pipeline that detects events extracts process details converts data to Apache Parquet format stores it securely in Amazon S3 and provides AWS credentials for connecting business intelligence tools like Power BI and Tableau through Amazon Athena.

Tableau > Connecting Tableau to analytics data

You can connect Tableau to Tallyfy Analytics data in about 15 minutes by installing Java and the Amazon Athena JDBC driver then configuring Tableau with your Access Key credentials to access workflow data tables for building dashboards that visualize process durations and user performance metrics.

Integrations > Tallyfy Analytics

Tallyfy Analytics is an add-on service that copies workflow data to a dedicated Amazon Athena infrastructure where users can connect BI tools like Tableau and Power BI to build custom reports and perform deep analysis on process performance trends and task completion metrics.

Powerbi > Connect Power BI

Power BI connects to Tallyfy Analytics through Amazon Athena using ODBC drivers and Access Key authentication to create interactive dashboards and visualizations from workflow data with proper configuration taking about 15 minutes to complete.

Footnotes

  1. Apache Parquet is a columnar storage format optimized for analytics queries, using Snappy compression for efficient storage