Skip to content

AshinSMathew/ANUVAADHYA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 

Repository files navigation

ANUVAADHYA - AI-Powered Multilingual Subtitle Generator

ANUVAADHYA is an innovative web application that leverages cutting-edge AI technologies to generate accurate multilingual subtitles for audio and video content. This project combines multiple AI APIs with a modern full-stack architecture to create a powerful tool for content localization and accessibility.

Project Overview

  • ANUVAADHYA addresses the growing need for content accessibility and localization by providing:
  • Automatic speech-to-text transcription
  • Multilingual translation capabilities
  • Real-time subtitle generation
  • User-friendly dashboard for project management
  • Secure authentication system
  • Audio/video file upload and processing

Technology Stack

Frontend

  • Next.js 14 with App Router
  • TypeScript for type safety
  • Tailwind CSS for styling
  • React Context for state management
  • Custom Hooks for reusable logic

Backend

  • Python with audio processing capabilities
  • Firebase for authentication and data storage
  • Multiple AI APIs for comprehensive subtitle generation

AI Integration

  • Sarvam AI API - Indian language speech recognition and translation
  • Whisper - International language speech recognition and translation
  • Google Gemini API - Advanced AI processing and text refinement
  • Firebase - Backend services and authentication

Project Structure

ANUVAADHYA/
├── app/                         # Next.js frontend application
│   ├── dashboard/               # User dashboard page
│   ├── forgery-detection/       # MAIN SUBTITLE GENERATION INTERFACE
│   ├── login/                   # Authentication page
│   ├── player/                  # Media player with subtitle support
│   ├── signup/                  # User registration
│   ├── upload/                  # File upload interface
│   ├── components/              # Reusable React components
│   ├── contexts/                # React context providers
│   │   └── AuthContext/         # User authentication
│   ├── hooks/                   # Custom React hooks
│   │   ├── useSession/
│   ├── lib/                     # Utility libraries
│   │   ├── auth-utils.ts          
│   │   ├── fiebase.ts      
│   │   └── utils.ts          
│   ├── public/                  # Static assets
│   ├── globals.css              # Global styles
│   ├── layout.tsx               # Root layout
│   └── page.tsx                 # Home page
├── backend/                     # Python backend services
│   ├── finger.py               # Audio processing & subtitle generation
│   ├── test.py                 # Testing utilities
│   ├── requirements.txt        # Python dependencies
│   └── .env                    # Environment variables
└── configuration files         # Project configuration

Installation & Setup

Prerequisites

  • Node.js 18+
  • Python 3.8+
  • npm

API keys:

-Sarvam AI -Google Gemini -Firebase

Frontend Setup

Navigate to the app directory

cd frontend

Install dependencies

npm install

Environment Configuration

Create .env.local file with:

# Firebase Configuration
NEXT_PUBLIC_FIREBASE_API_KEY=your_firebase_api_key
NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
NEXT_PUBLIC_FIREBASE_PROJECT_ID=your_project_id
NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
NEXT_PUBLIC_FIREBASE_APP_ID=your_app_id

Run the development server

npm run dev

Backend Setup

Navigate to the backend directory

cd backend

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies

pip install -r requirements.txt

Environment Configuration

Create .env file with:

# API Keys
SARVAM_API_KEY=your_sarvam_api_key
GEMINI_API_KEY=your_gemini_api_key

Run the backend server

python test.py

Key Features

  1. Multilingual Subtitle Generation
  • Automatic speech recognition for multiple languages
  • Real-time translation between supported languages
  • Accurate timestamp synchronization
  • Support for Indian regional languages via Sarvam AI
  1. Supported Languages
  • All languages
  1. Media Processing
  • Audio file support: WAV, MP3, M4A, FLAC, AAC
  • Video file support: MP4, AVI, MOV, MKV
  • Batch processing for multiple files
  • Progress tracking during generation
  1. Subtitle Management
  • Real-time subtitle editor
  • Timeline synchronization
  • Multiple export formats (SRT, VTT, TXT)
  • Translation memory for consistency
  1. User Dashboard
  • Project history and management
  • Processing statistics
  • Quick access to recent files
  • Export management

Usage Guide

  1. User Registration & Login
  • Create account with email/password
  • Secure authentication via Firebase
  • Profile management
  1. File Upload
  • Navigate to upload section
  • Drag and drop or select media files
  • Supported formats: audio/video files up to 100MB
  1. Subtitle Generation Process
  • Select source language of the media
  • Choose target languages for translation
  • Configure generation settings
  • Monitor real-time progress
  • Review and edit generated subtitles
  1. Subtitle Editing
  • Timeline adjustment for perfect sync
  • Text editing for accuracy
  • Multi-language preview
  • Real-time saving
  1. Export Options
  • SRT - Standard subtitle format
  • VTT - Web video text tracks

Contributors

About

Anuvaadhya is a multilingual subtitle generator that automatically creates accurate subtitles in multiple languages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors