Analytics > How Tallyfy Analytics works
Snowflake
Snowflake can read your Tallyfy Analytics data directly from S3 - no data duplication or ETL pipelines needed. If you already have Tallyfy Analytics enabled, you have everything you need to get started.
Your analytics data is stored as Apache Parquet1 files in Amazon S3. Snowflake reads these files natively through external stages, giving you full SQL access to your process and member data.
When Tallyfy provisions analytics for your organization, you receive:
- Access Key ID - Your AWS access key (token-based authentication)
- Secret Access Key - Your AWS secret key
- S3 Location - The path to your organization’s analytics data
- Region -
us-west-2(all Tallyfy Analytics data is stored here)
These are the same credentials you may already use for Athena, Power BI, or Tableau connections. Snowflake uses the same Access Key and Secret Access Key.
The storage integration tells Snowflake where your data lives and how to authenticate. Run this in your Snowflake worksheet:
CREATE OR REPLACE STORAGE INTEGRATION tallyfy_analytics_integration TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'S3' ENABLED = TRUE STORAGE_AWS_ROLE_ARN = '' STORAGE_ALLOWED_LOCATIONS = ('s3://YOUR_S3_LOCATION/') STORAGE_AWS_EXTERNAL_ID = '';The external stage points Snowflake to your specific analytics data folder. This is where you use the S3 location and AWS credentials provided by Tallyfy.
With key-based authentication (recommended for most setups):
CREATE OR REPLACE STAGE tallyfy_analytics_stage URL = 's3://YOUR_S3_LOCATION/' CREDENTIALS = ( AWS_KEY_ID = 'YOUR_ACCESS_KEY_ID' AWS_SECRET_KEY = 'YOUR_SECRET_ACCESS_KEY' ) FILE_FORMAT = (TYPE = PARQUET);With storage integration (if using IAM roles):
CREATE OR REPLACE STAGE tallyfy_analytics_stage STORAGE_INTEGRATION = tallyfy_analytics_integration URL = 's3://YOUR_S3_LOCATION/' FILE_FORMAT = (TYPE = PARQUET);Replace YOUR_S3_LOCATION, YOUR_ACCESS_KEY_ID, and YOUR_SECRET_ACCESS_KEY with the values from your Tallyfy Analytics setup.
Check that Snowflake can see your files:
LIST @tallyfy_analytics_stage;You should see a list of Parquet files - your process analytics data and member list.
Each Parquet file in the analytics folder contains data about a specific process run. To query across all your process data:
SELECT $1:process_name::STRING AS process_name, $1:process_status::STRING AS process_status, $1:process_task_name::STRING AS task_name, $1:process_task_status::STRING AS task_status, $1:user_assigned::STRING AS assigned_to, $1:completed_on::STRING AS completed_date, $1:due_by::STRING AS due_dateFROM @tallyfy_analytics_stage/analytics/(FILE_FORMAT => 'tallyfy_parquet');Your organization’s member list is available as a separate Parquet file:
SELECT $1:email::STRING AS email, $1:full_name::STRING AS full_name, $1:job_title::STRING AS job_title, $1:is_active::STRING AS is_active, $1:last_login_at::STRING AS last_login, $1:team::STRING AS teamFROM @tallyfy_analytics_stage/members/(FILE_FORMAT => 'tallyfy_parquet');To avoid repeating format options, create a named file format:
CREATE OR REPLACE FILE FORMAT tallyfy_parquet TYPE = PARQUET SNAPPY_COMPRESSION = TRUE;SELECT $1:blueprint_name::STRING AS template_name, COUNT(DISTINCT $1:process_id::STRING) AS total_runs, COUNT(CASE WHEN $1:process_status::STRING = 'complete' THEN 1 END) AS completed, COUNT(CASE WHEN $1:process_status::STRING = 'active' THEN 1 END) AS in_progressFROM @tallyfy_analytics_stage/analytics/(FILE_FORMAT => 'tallyfy_parquet')GROUP BY template_nameORDER BY total_runs DESC;SELECT $1:process_name::STRING AS process_name, $1:process_task_name::STRING AS task_name, $1:user_assigned::STRING AS assigned_to, $1:due_by::STRING AS due_date, $1:process_task_status::STRING AS statusFROM @tallyfy_analytics_stage/analytics/(FILE_FORMAT => 'tallyfy_parquet')WHERE $1:process_task_status::STRING != 'complete' AND $1:due_by::STRING < CURRENT_DATE()::STRING AND $1:due_by::STRING != '' AND $1:due_by::STRING != 'nan';SELECT $1:user_assigned::STRING AS assignee, COUNT(*) AS tasks_completed, COUNT(DISTINCT $1:process_id::STRING) AS processes_involvedFROM @tallyfy_analytics_stage/analytics/(FILE_FORMAT => 'tallyfy_parquet')WHERE $1:process_task_status::STRING = 'complete' AND $1:user_assigned::STRING != '' AND $1:user_assigned::STRING != 'nan'GROUP BY assigneeORDER BY tasks_completed DESC;SELECT $1:process_name::STRING AS process_name, $1:question_in_form_field::STRING AS question, $1:answer_in_form_field::STRING AS answer, $1:completed_on::STRING AS answered_onFROM @tallyfy_analytics_stage/analytics/(FILE_FORMAT => 'tallyfy_parquet')WHERE $1:form_field_type::STRING != '' AND $1:form_field_type::STRING != 'nan';All fields are stored as STRING type in the Parquet files. Cast them as needed in your queries.
| Field | Description |
|---|---|
blueprint_id | Template ID |
blueprint_name | Template name |
blueprint_version | Template version number |
process_id | Unique process run ID |
process_name | Name of the running process |
process_tag | Process tag or label |
process_owner | Process owner name |
process_status | Current status (active, complete, archived) |
process_completed_at | Timestamp when process was completed |
process_task_id | Task ID within the process |
process_task_name | Task name |
process_task_status | Task status (active, complete) |
type | Task type |
user_assigned | Assigned user name |
assigned_user_id | Assigned user ID |
guest_assigned | Assigned guest email |
total_assignees | Total number of assignees |
assigned_or_shown_on | Date task was assigned |
due_by | Task deadline |
completed_on | Date task was completed |
completed_by | Who completed the task |
has_form_fields | Whether task has form fields |
form_field_type | Type of form field |
question_in_form_field | Form field label/question |
answer_in_form_field | Form field response |
issue_reported | Whether an issue was reported |
no_of_issue_reported | Number of issues reported |
total_comments | Number of comments |
comment | Comment text |
milestone_name | Milestone name if applicable |
| Field | Description |
|---|---|
id | Member ID |
email | Email address |
full_name | Full name |
first_name | First name |
last_name | Last name |
job_title | Job title |
team | Team name |
is_active | Active status |
is_suspended | Suspended status |
created_at | Account creation date |
last_login_at | Last login timestamp |
timezone | User timezone |
country_id | Country identifier |
Data format - All fields in the Parquet files are cast to STRING. You will need to cast dates and numbers in your Snowflake queries as needed.
NULL handling - Empty values may appear as blank strings or the literal string nan (a side effect of the Pandas conversion). Filter both in your queries:
WHERE $1:field_name::STRING != '' AND $1:field_name::STRING != 'nan'Timestamps - Date fields use ISO 8601 format without timezone (e.g., 2025-03-15T14:30:00). These represent UTC times.
Data freshness - Analytics data is updated when process events occur (task completions, status changes, form submissions). There is no fixed refresh schedule - data appears as events happen.
Existing Athena access - Snowflake and Athena can both read the same S3 data simultaneously. Adding Snowflake does not affect your existing Athena, Power BI, Tableau, or Looker connections.
Region requirement - Your Snowflake account should ideally be in us-west-2 (Oregon) for best performance, since the analytics data is stored in that region. Cross-region access works but adds latency.
Tableau > Connecting Tableau to analytics data
Integrations > Tallyfy Analytics
-
Apache Parquet is a columnar storage format optimized for analytics queries, using Snappy compression for efficient storage ↩
Was this helpful?
- 2025 Tallyfy, Inc.
- Privacy Policy
- Terms of Use
- Report Issue
- Trademarks