Project detail
Project information
- Category: Data Warehousing and Cloud
- Tools: AWS, Python
- Project URL: Github
Designed and deployed a cloud data warehouse on Amazon Redshift for storing and analyzing goodreads booksdata. Deployed aws s3 bucket for initial pushing and storing of data. Used pyspark on amazon EMR Cluster to perform the ETL job of cleaning, transforming and storing the data into the final Redshift data warehouse.