Jordan-Panda-AI is a professional-grade Data Engineering and Computer Vision project designed to automate the detection of Nike Jordan 1 "Panda" sneakers in secondary markets. This project serves as a comprehensive proof-of-concept for high-scale data acquisition and deep learning inference.
The system automates the entire machine learning lifecycle:
- Data Ingestion: Automated scraping of Nike (Official) and Wallapop (Market) using anti-bot bypass techniques.
- Preprocessing: Image augmentation to diversify training samples and prevent early-stage overfitting.
- Model Training: A custom Convolutional Neural Network (CNN) built with PyTorch, optimized through iterative evaluation.
- Inference: A production-ready script to classify local images with confidence scoring.
- Automation: Selenium &
undetected-chromedriverfor advanced web scraping. - AI/ML: PyTorch, Torchvision (Transforms), and PIL.
- Data Handling: Pandas for metadata management and Pathlib for robust file system navigation.
- Environment: Python 3.10+.
- Clone the repository:
git clone [https://github.com/YOUR_USERNAME/Jordan-Panda-AI.git](https://github.com/YOUR_USERNAME/Jordan-Panda-AI.git) cd Jordan-Panda-AI - Create a virtual environment:
python -m venv venv # Windows: venv\Scripts\activate # Linux/Mac: source venv/bin/activate
- Install dependencies:
pip install torch torchvision pillow selenium undetected-chromedriver webdriver-manager pandas requests
This project was developed as part of a Data Engineering specialization, focusing on mastering data pipelines and real-world AI challenges.
- Final Accuracy: 80.26% on the validation set after 10 epochs.
- Optimization: Identified a significant improvement (from 57% to 80%) by balancing real-world market data with high-quality catalog images.
-
Lessons Learned: Successfully navigated the "Overfitting" phase, identifying that extremely low training loss (
$0.0091$ ) requires a more diverse negative-sample dataset to maintain high real-world precision.
To transition this PoC into a 99% accuracy commercial tool, the following enhancements are required:
- Dataset Expansion: Scaling from hundreds to thousands of unique samples.
- Transfer Learning: Implementing pre-trained architectures like ResNet50 or EfficientNet.
- Hardware Acceleration: Training on high-performance GPU clusters to support deeper architectures.
Author: raess1593 – Data & AI Engineer Student