Skip to content

Nik-ui/Comprehensive-Exploratory-Analysis-and-Insights-from-E-Commerce-Transaction-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

E-Commerce Transaction Data Analysis

A comprehensive data analysis project exploring transactional patterns, customer behaviour, and product performance using real-world e-commerce data. This project demonstrates advanced data cleaning, feature engineering, and exploratory data analysis (EDA) to extract actionable business insights.

Dataset Overview

  • Filename: purchase_data.csv
  • Rows: 541,909 transactions
  • Columns: 8 (InvoiceNo, InvoiceDate, Quantity, UnitPrice, StockCode, Description, CustomerID, Country)
  • Timeframe: Approximately 2 years
  • Geography: 38 countries

Key Features

  • Invoice-level granularity
  • Customer-specific insights
  • Product descriptions and unit prices
  • Date and time stamps for time-series analysis

Data Cleaning Highlights

  • Missing CustomerID replaced with "Anonymous"
  • Description cleaned and title-cased
  • Negative or zero Quantity and UnitPrice values removed
  • InvoiceDate standardized using European format
  • Removed cancelled transactions (InvoiceNo starting with "C")

🛠️ Feature Engineering

  • Revenue: Quantity * UnitPrice
  • Temporal Columns: Month, Quarter, Season
  • Product Categories: Generated via clustering of Description

Exploratory Data Analysis (EDA)

1. Monthly Fluctuations

  • Detected revenue spikes in July and October 2011
  • Stable transaction count, but revenue increased due to high-value purchases

2. Top Product Categories

  • "Gifts & Decorations" leads revenue across all months
  • "Bags & Accessories" and "Household Essentials" show consistent growth

3. Seasonality

  • September peaks tied to seasonal demand for gifts
  • Stable product lines identified for year-round promotions

4. Customer Behaviour

  • High-value outliers and loyal customer clusters identified
  • Cross-category spending drives higher customer value

5. Anonymous Customers

  • Represent significant revenue (e.g., £640,000+ in "Gifts & Decorations")

Limitations

  • Lack of demographic data (age, income, etc.)
  • Outliers affect distribution, even after removal
  • Country-level analysis only—no granular geography
  • Product grouping relies on unsupervised text clustering

Recommendations

  • Add demographic fields to enable deeper segmentation
  • Use supervised models (e.g. Random Forest, XGBoost) for predictive insights
  • Integrate external data (holiday calendars, marketing spend)
  • Build interactive dashboards for real-time decision-making

Tools & Libraries

  • Pandas, NumPy – data wrangling
  • Matplotlib, Seaborn – visualisation
  • Scikit-learn – clustering and potential model integration
  • Jupyter Notebook – analysis environment

License

This project is released for academic and non-commercial use. Attribution is appreciated.

References


Author: Fatimo Adenike Adeniya

Exploratory Data Analyst | E-Commerce Insights | Python Enthusiast

About

E-commerce has revolutionized business operations by enabling seamless transactions and a global reach. Leveraging data insights is essential for optimizing operations in a competitive market. Businesses are able to evaluate product performance, identify sales trends, and comprehend consumer behavior through the use of e-commerce data analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors