Snowflake/S3 Pipeline: ETL architecture Questions
0
votes
1
answer
320
views
I am trying to build a pipeline which is sending data from Snowflake to S3 and then from S3 back into Snowflake (after running it through a production ML model on Sagemaker). I am new to Data Engineering, so I would love to hear from the community what the recommended path is. The pipeline requirements are the following:
1. I am looking to schedule a monthly job. Do I specify such in AWS or on the Snowflake side?
The monthly pulls should get the last full month (since this should be a monthly pipeline).
2. All monthly data pulls should be stored in own S3 subfolder like this
query_01012020,query_01022020,query_01032020
etc.
3. The data load from S3 (query_01012020,query_01022020,query_01032020
) back to a specified Snowflake table should be triggered after the ML model has successfully scored the data in Sagemaker.
4. I want to monitor the performance of the ML model in production overtime to catch if the model is decreasing its accuracy (some calibration-like graph perhaps).
5. I want to get any error notifications in real-time when issues in the pipeline occur.
I hope you are able to guide me on what components the pipeline should include. Any relevant documentation/tutorials for this effort are truly appreciated.
Thank you very much.
Asked by cocoo84hh
(101 rep)
Jun 14, 2020, 06:54 PM
Last activity: Mar 13, 2025, 06:02 AM
Last activity: Mar 13, 2025, 06:02 AM