Skip to content
View mehulimukherjee's full-sized avatar
💭
🚀 ExtractPDF4J Global Build Challenge 2026 The challenge is now live
💭
🚀 ExtractPDF4J Global Build Challenge 2026 The challenge is now live

Organizations

@ExtractPDF4J

Block or report mehulimukherjee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mehulimukherjee/README.md

Hi, I’m Mehuli Mukherjee 👋

Senior Software Engineer | Fintech Innovation Engineer | Creator of ExtractPDF4J

I build production-grade fintech and document intelligence systems with a strong foundation in Java, Spring Boot, distributed backend services, and enterprise workflow design.

I currently work in the Innovation team at Bank of New Zealand (BNZ), where I help design and deliver solutions that improve internal banking journeys, financial data usability, and digital customer support workflows.

Alongside my enterprise work, I’m the creator of ExtractPDF4J — a Java-first PDF table extraction library built to handle the messy reality of real-world documents, including complex layouts, inconsistent tables, and scanned PDFs.

My core focus is simple: solve difficult real-world problems with robust engineering.

  • 🔭 Currently building: innovation-led banking and document-processing solutions
  • 🌱 Exploring: AI-assisted product engineering, document intelligence, and fintech innovation
  • 📝 Writing on: technical engineering, PDF extraction, enterprise architecture, and practical innovation

What Defines My Work

I specialize in systems where business complexity and technical complexity meet:

  • fintech workflows
  • open banking and transaction-oriented systems
  • document extraction and structured data transformation
  • enterprise APIs and orchestration
  • innovation-led engineering
  • AI-assisted product thinking

I enjoy working on problems that are often underestimated in design conversations but critical in production — especially where reliability, accuracy, and maintainability matter.


Featured Project: ExtractPDF4J

ExtractPDF4J

ExtractPDF4J is a Java-first PDF table extraction library built for real enterprise use cases.

It was created to address a practical challenge: most PDF extraction approaches work only until they face real production documents. In banking and enterprise systems, documents are often inconsistent, partially scanned, poorly aligned, or structurally irregular. ExtractPDF4J is built with that reality in mind.

Why it matters

  • Handles text-based and scanned PDFs
  • Supports stream, lattice, OCR-assisted, and hybrid-style extraction approaches
  • Designed for enterprise reliability and extensibility
  • Built from experience solving real document-processing problems in regulated environments

What it represents

This project reflects how I like to build:

  • practical over theoretical
  • production-oriented over demo-oriented
  • reusable, maintainable, and engineering-led

Fintech Innovation Work

At BNZ, my work focuses on designing and supporting innovation initiatives that improve how financial data and digital workflows are used in practice.

This includes work across:

  • internal banker-facing journeys
  • structured financial data handling
  • service orchestration
  • digital process improvement
  • platform-style thinking for scalable solutions

My banking experience has shaped how I engineer systems:

  • reliability matters
  • traceability matters
  • usability matters
  • architecture must survive real operational conditions

Professional Snapshot

  • 💼 12+ years in software engineering
  • 🏦 Strong experience in banking and enterprise technology
  • ⚙️ Deep background in Java backend engineering
  • 📄 Creator and maintainer of ExtractPDF4J
  • 🚀 Interested in practical AI, workflow automation, and product innovation
  • 🤝 Active in mentoring, technical writing, and knowledge sharing

Tech Stack

Core Engineering
Java, Spring Boot, REST APIs, Microservices

Data & Integration
SQL, Kafka, API Integration, Event-Driven Patterns

Cloud & Platforms
AWS

Document Intelligence
PDF Parsing, OCR-Assisted Extraction, Table Detection, Structured Data Normalization

Full-Stack
React, JavaScript, End-to-End Application Development


Featured Writing & Talks

I actively share ideas and lessons from real engineering work through writing, speaking, and community contributions.

Writing

Talks & Community

  • SQL Saturday Wellington — speaker / contributor
  • Wellington Code Camp — speaker / contributor
  • Technical mentoring and knowledge sharing through engineering communities

This part of my work reflects something I value deeply: sharing practical lessons that help other engineers build better systems.


Selected Highlights

  • Built a Java-first open-source library to solve complex PDF extraction challenges
  • Worked on innovation-driven engineering in a major banking environment
  • Designed solutions around financial workflows, structured data, and system integration
  • Passionate about turning difficult operational problems into maintainable products

CV & Professional Links


Beyond the Job Title

I care about:

  • building tools that are genuinely useful
  • solving hard production problems well
  • creating reusable engineering assets
  • contributing through writing, mentoring, and open source
  • combining strong fundamentals with innovation

Explore My Work

On this profile, you’ll find:

  • open-source engineering projects
  • backend and platform-oriented work
  • practical problem-solving
  • ideas shaped by real enterprise challenges

Thanks for stopping by.

Pinned Loading

  1. ExtractPDF4J/ExtractPDF4J ExtractPDF4J/ExtractPDF4J Public

    Java PDF table extraction & OCR library. Extract structured tables from text-based and scanned PDFs using stream, lattice (OpenCV-style grid detection), and hybrid parsing.

    Java 36 26