Photo by Mathieu Stern on Unsplash
Automating Data Processing with AWS: From S3 Upload to QuickSight Dashboard
In today's fast-paced financial environment, having timely and accurate data is crucial for making informed decisions. Automating data processing workflows can significantly enhance efficiency and ensure that stakeholders always have access to the latest insights. In this blog post, we'll explore how to set up an automated pipeline using AWS services to process daily financial data uploads, update Athena tables, and refresh QuickSight dashboards.
Overview
Our workflow involves several key AWS services:
Amazon S3: Storage for daily financial data files.
Amazon EventBridge: Monitors S3 for new file uploads and triggers the workflow.
AWS Step Functions: Orchestrates the entire process.
Amazon Athena: Queries the uploaded data and updates tables.
Amazon QuickSight: Visualizes the latest data in dashboards.
AWS Lambda: Executes specific tasks, including checking and updating the dashboard.
Amazon SNS: Sends notifications about the dashboard updates.
Use Case
Every day, a new financial data JSON file is uploaded to an S3 bucket. This upload triggers an EventBridge rule, which in turn starts an AWS Step Functions workflow. The workflow first drops and then recreates an Athena table and view to ensure the latest data is loaded. Once the data is updated in Athena, the workflow checks if a QuickSight dashboard exists. If it does, the dashboard is updated; if not, a new dashboard is created. Notifications are sent via SNS to inform stakeholders of the update or creation of the dashboard, ensuring they always have access to the latest financial insights
The Automated Workflow
Step 1: Upload Data to S3
Every day, a new JSON file containing financial data is uploaded to an S3 bucket (e.g., s3://fictitious-financial-data-reports/
). This file acts as the trigger for our workflow.
Step 2: EventBridge Rule
An EventBridge rule is configured to listen for PutObject
events in the S3 bucket, specifically for files with a .json
suffix. When a new file is detected, the rule triggers a Step Functions workflow.
{
"source": ["aws.s3"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["s3.amazonaws.com"],
"eventName": ["PutObject"],
"requestParameters": {
"bucketName": ["fictitious-financial-data-reports"],
"key": [{
"suffix": ".json"
}]
}
}
}
Step 3: Step Functions Workflow
The Step Functions workflow orchestrates the data processing:
Athena Create Database: Creates the Athena database if it does not already exist.
Athena Drop Table: Drops the existing Athena table to ensure it is clean before loading new data.
Athena Create Table: Creates a new Athena table with the updated schema.
Athena Create View: Creates or replaces a view in Athena to facilitate easier querying.
CheckDashboardExists: A Lambda function checks if a Quick Sight dashboard already exists.
Create or Update Dashboard: Another Lambda function creates a new dashboard or updates the existing one. If the dashboard is created for the first time, permissions are added so users can view the dashboard.
Send Notifications: Sends SNS notifications to inform stakeholders of the update or creation of the dashboard or errors.
Step 4: Athena Queries
Athena is used to manage and query the data. The following queries are executed within the Step Functions workflow:
Create Database:
CREATE DATABASE IF NOT EXISTS json_financials;
Drop Table:
DROP TABLE IF EXISTS json_financials.financials_raw;
Create Table:
CREATE EXTERNAL TABLE json_financials.financials_raw ( symbol string, financials ARRAY<struct< reportdate: string, open: decimal(13, 3), high: decimal(13, 3), low: decimal(13, 3), lastclose: decimal(13, 3), volume: bigint>> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://fictitious-financial-data-reports/';
Create View:
CREATE OR REPLACE VIEW json_financials.financial_reports_view AS SELECT symbol, CAST(report.reportdate AS date) reportdate, report.volume, report.lastclose FROM json_financials.financials_raw CROSS JOIN UNNEST(financials) AS t(report) ORDER BY 2 DESC, 1 ASC;
Step 5: QuickSight Dashboard
The workflow includes two Lambda functions to handle the QuickSight dashboard:
CheckDashboardExists: This Lambda function checks if a QuickSight dashboard exists.
CreateOrUpdateDashboard: This Lambda function either creates a new dashboard or updates an existing one. If the dashboard is created for the first time, the function also sets permissions so users can view the dashboard.
CreateOrUpdateDashboard Lambda Function:
Step 6: Notifications
Finally, SNS notifications are sent to inform stakeholders about the status of the dashboard update. This ensures that everyone is aware of the latest data availability. Notifications are sent using SNS to inform stakeholders about the successful creation or update of the dashboard or any errors that occur during the process.
Step Function Architecture Diagram
Conclusion
By leveraging AWS services such as S3, EventBridge, Step Functions, Lambda, Athena, QuickSight, and SNS, you can automate the entire process of data ingestion, processing, and visualization. This not only saves time but also ensures that your team always has access to up-to-date financial insights, enabling better and faster decision-making.
Implementing this workflow can greatly enhance the efficiency of your financial data processing pipeline, allowing your organization to stay ahead in the competitive financial landscape.
References
https://aws.amazon.com/step-functions/