Sample Header Ad - 728x90

Snowflake/S3 Pipeline: ETL architecture Questions

0 votes
1 answer
320 views
I am trying to build a pipeline which is sending data from Snowflake to S3 and then from S3 back into Snowflake (after running it through a production ML model on Sagemaker). I am new to Data Engineering, so I would love to hear from the community what the recommended path is. The pipeline requirements are the following: 1. I am looking to schedule a monthly job. Do I specify such in AWS or on the Snowflake side? The monthly pulls should get the last full month (since this should be a monthly pipeline). 2. All monthly data pulls should be stored in own S3 subfolder like this query_01012020,query_01022020,query_01032020 etc. 3. The data load from S3 (query_01012020,query_01022020,query_01032020) back to a specified Snowflake table should be triggered after the ML model has successfully scored the data in Sagemaker. 4. I want to monitor the performance of the ML model in production overtime to catch if the model is decreasing its accuracy (some calibration-like graph perhaps). 5. I want to get any error notifications in real-time when issues in the pipeline occur. I hope you are able to guide me on what components the pipeline should include. Any relevant documentation/tutorials for this effort are truly appreciated. Thank you very much.
Asked by cocoo84hh (101 rep)
Jun 14, 2020, 06:54 PM
Last activity: Mar 13, 2025, 06:02 AM