Project detail

Project information

  • Category: Data Warehousing and Cloud
  • Tools: AWS, Python
  • Project URL: Github

Designed and deployed a cloud data warehouse on Amazon Redshift for storing and analyzing goodreads booksdata. Deployed aws s3 bucket for initial pushing and storing of data. Used pyspark on amazon EMR Cluster to perform the ETL job of cleaning, transforming and storing the data into the final Redshift data warehouse.