This repository is a curated collection of my Data Science projects, showcasing my analytical abilities, technical skills, and domain expertise. Each project is a self-contained study demonstrating methodologies for data processing, analysis, and model development. This portfolio is crafted under the guidance of my mentor, Miguel Fierro, a Principal Data Scientist at Microsoft, whose insights and experience have been invaluable.
- Objective: Develop a model capable of detecting hate speech and offensive language in textual data.
- Algorithm: DeBERTa (Decoding-enhanced BERT with Disentangled Attention).
- Tools Used: Python, PyTorch, Transformers, Pandas, Numpy.
- Folder Structure:
data: Contains the datasets used for model training and evaluation.notebooks: Jupyter notebooks with detailed code, comments, and analysis.utils: Helper functions used across the project for various tasks like data preprocessing, model evaluation, etc.
- Objective: The goal of this project is to develop a forecasting model that can predict the trend of COVID-19 cases, including confirmed cases, deaths, and recoveries. The model aims to provide insights into the trajectory of the pandemic and assist in public health planning and resource allocation.
- Algorithm: Prophet, which is an open-source forecasting tool developed by Facebook. Prophet is robust to missing data and shifts in trend, and can handle outliers well.
- Tools Used: Python, Pandas, Numpy, Matplotlib/Seaborn, Prophet.
data/: Datasets used in the projects.notebooks/: Jupyter notebooks containing project code and documentation.utils/: Utility scripts and modules to support data analysis and model operations.
Each project is encapsulated in its own directory with a dedicated README. To run a project:
- Navigate to the project's notebook directory.
- Follow the instructions in the project's README for setting up the environment.
- Execute the Jupyter notebooks step by step.
While this portfolio is a personal showcase, contributions in the form of feedback, bug reports, and even feature enhancements are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
- My mentor, Miguel Ferro, for his guidance and support throughout the learning process.
- The data science community for providing a platform to share and grow.