Automating Data Processing with AWS: From S3 Upload to QuickSight Dashboard

In today's fast-paced financial environment, having timely and accurate data is crucial for making informed decisions. Automating data processing workflows can significantly enhance efficiency and ensure that stakeholders always have access to the latest insights. In this blog post, we'll explore how to set up an automated pipeline using AWS services to process daily financial data uploads, update Athena tables, and refresh QuickSight dashboards.

Overview

Our workflow involves several key AWS services:

  1. Amazon S3: Storage for daily financial data files.

  2. Amazon EventBridge: Monitors S3 for new file uploads and triggers the workflow.

  3. AWS Step Functions: Orchestrates the entire process.

  4. Amazon Athena: Queries the uploaded data and updates tables.

  5. Amazon QuickSight: Visualizes the latest data in dashboards.

  6. AWS Lambda: Executes specific tasks, including checking and updating the dashboard.

  7. Amazon SNS: Sends notifications about the dashboard updates.

Use Case

Every day, a new financial data JSON file is uploaded to an S3 bucket. This upload triggers an EventBridge rule, which in turn starts an AWS Step Functions workflow. The workflow first drops and then recreates an Athena table and view to ensure the latest data is loaded. Once the data is updated in Athena, the workflow checks if a QuickSight dashboard exists. If it does, the dashboard is updated; if not, a new dashboard is created. Notifications are sent via SNS to inform stakeholders of the update or creation of the dashboard, ensuring they always have access to the latest financial insights

The Automated Workflow

Step 1: Upload Data to S3

Every day, a new JSON file containing financial data is uploaded to an S3 bucket (e.g., s3://fictitious-financial-data-reports/). This file acts as the trigger for our workflow.

Step 2: EventBridge Rule

An EventBridge rule is configured to listen for PutObject events in the S3 bucket, specifically for files with a .json suffix. When a new file is detected, the rule triggers a Step Functions workflow.

{
  "source": ["aws.s3"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventName": ["PutObject"],
    "requestParameters": {
      "bucketName": ["fictitious-financial-data-reports"],
      "key": [{
        "suffix": ".json"
      }]
    }
  }
}

Step 3: Step Functions Workflow

The Step Functions workflow orchestrates the data processing:

  1. Athena Create Database: Creates the Athena database if it does not already exist.

  2. Athena Drop Table: Drops the existing Athena table to ensure it is clean before loading new data.

  3. Athena Create Table: Creates a new Athena table with the updated schema.

  4. Athena Create View: Creates or replaces a view in Athena to facilitate easier querying.

  5. CheckDashboardExists: A Lambda function checks if a Quick Sight dashboard already exists.

  6. Create or Update Dashboard: Another Lambda function creates a new dashboard or updates the existing one. If the dashboard is created for the first time, permissions are added so users can view the dashboard.

  7. Send Notifications: Sends SNS notifications to inform stakeholders of the update or creation of the dashboard or errors.

Step 4: Athena Queries

Athena is used to manage and query the data. The following queries are executed within the Step Functions workflow:

  • Create Database:

      CREATE DATABASE IF NOT EXISTS json_financials;
    
  • Drop Table:

      DROP TABLE IF EXISTS json_financials.financials_raw;
    
  • Create Table:

      CREATE EXTERNAL TABLE json_financials.financials_raw (
        symbol string,
        financials ARRAY<struct<
          reportdate: string,
          open: decimal(13, 3),
          high: decimal(13, 3),
          low: decimal(13, 3),
          lastclose: decimal(13, 3),
          volume: bigint>>
      )
      ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
      LOCATION 's3://fictitious-financial-data-reports/';
    
  • Create View:

      CREATE OR REPLACE VIEW json_financials.financial_reports_view AS
      SELECT symbol, CAST(report.reportdate AS date) reportdate, report.volume, report.lastclose
      FROM json_financials.financials_raw
      CROSS JOIN UNNEST(financials) AS t(report)
      ORDER BY 2 DESC, 1 ASC;
    

Step 5: QuickSight Dashboard

The workflow includes two Lambda functions to handle the QuickSight dashboard:

  1. CheckDashboardExists: This Lambda function checks if a QuickSight dashboard exists.

  2. CreateOrUpdateDashboard: This Lambda function either creates a new dashboard or updates an existing one. If the dashboard is created for the first time, the function also sets permissions so users can view the dashboard.

CreateOrUpdateDashboard Lambda Function:

Step 6: Notifications

Finally, SNS notifications are sent to inform stakeholders about the status of the dashboard update. This ensures that everyone is aware of the latest data availability. Notifications are sent using SNS to inform stakeholders about the successful creation or update of the dashboard or any errors that occur during the process.

Step Function Architecture Diagram

Conclusion

By leveraging AWS services such as S3, EventBridge, Step Functions, Lambda, Athena, QuickSight, and SNS, you can automate the entire process of data ingestion, processing, and visualization. This not only saves time but also ensures that your team always has access to up-to-date financial insights, enabling better and faster decision-making.

Implementing this workflow can greatly enhance the efficiency of your financial data processing pipeline, allowing your organization to stay ahead in the competitive financial landscape.

References

https://aws.amazon.com/blogs/big-data/analyze-and-visualize-nested-json-data-with-amazon-athena-and-amazon-quicksight/

https://aws.amazon.com/s3/

https://aws.amazon.com/step-functions/

https://aws.amazon.com/pm/lambda/

https://aws.amazon.com/sns/

https://aws.amazon.com/athena/