Skip to content

AmmrFX/arabicgo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArabicGo

A powerful Arabic text shaping library for Go, specifically designed for Quranic text rendering with full RTL (right-to-left) support. This library handles the complete Arabic character shaping pipeline, making it the ideal choice for applications requiring proper Arabic typography.

Key Features

  • Complete Quranic Text Support - Render the Holy Quran with all diacritical marks perfectly preserved
  • Advanced Tashkeel Engine - Full support for all Arabic diacritical marks:
    • Harakat: Fatha (َ), Damma (ُ), Kasra (ِ), Sukun (ْ)
    • Shadda: Gemination mark (ّ) with automatic vowel ligature combining
    • Tanween: Fathatan (ً), Dammatan (ٌ), Kasratan (ٍ)
    • Quranic Marks: Superscript Alef (ٰ), Maddah (ٓ), Hamza Above/Below (ٔ ٕ), Subscript Alef (ٖ), Inverted Damma (ٗ), Noon Ghunna (٘)
  • Intelligent Character Joining - Automatic joining of Arabic letters in their correct contextual forms (isolated, initial, medial, final)
  • Ligature Rendering - Proper Lam-Alef (لا) and Allah (ﷲ) ligature formation
  • Arabic-Indic Numerals - Automatic conversion from Western (0-9) to Eastern Arabic-Indic (٠-٩) numerals
  • Bidirectional (BiDi) Text - Correct handling of mixed Arabic + English text — English stays LTR within the RTL flow
  • RTL Text Processing - Correct right-to-left text ordering for PDF and image generation

Screenshots

Surah Al-Fatiha - Complete Quranic Rendering

Surah Al-Fatiha

The opening chapter of the Holy Quran rendered with complete tashkeel, demonstrating perfect Arabic typography

Arabic Letters with Numerals

Arabic Numerals

Demonstration of Arabic-Indic numeral conversion and mixed Arabic-number text

Installation

go get github.com/AmmrFX/arabicgo

Quick Start

package main

import (
    "fmt"
    "github.com/AmmrFX/arabicgo"
)

func main() {
    // Render Quranic text with full tashkeel
    bismillah := arabicgo.ToArabic("بِسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ")
    fmt.Println(bismillah)

    // Shape any Arabic text
    greeting := arabicgo.Shape("السَّلامُ عَلَيْكُمْ")
    fmt.Println(greeting)

    // Mixed Arabic + English — BiDi keeps English LTR within RTL flow
    mixed := arabicgo.ToArabic("مرحبا Hello عالم")
    fmt.Println(mixed) // Output: عالم Hello مرحبا
}

PDF Generation Example

package main

import (
    "log"
    "github.com/AmmrFX/arabicgo"
    "github.com/signintech/gopdf"
)

func main() {
    pdf := gopdf.GoPdf{}
    pdf.Start(gopdf.Config{PageSize: *gopdf.PageSizeA4})
    pdf.AddPage()

    err := pdf.AddTTFFont("Arabic", "path/to/arabic-font.ttf")
    if err != nil {
        log.Fatal(err)
    }
    pdf.SetFont("Arabic", "", 24)

    // Render Surah Al-Fatiha
    pdf.SetXY(50, 50)
    pdf.Cell(nil, arabicgo.ToArabic("بِسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ"))

    pdf.SetXY(50, 90)
    pdf.Cell(nil, arabicgo.ToArabic("الْحَمْدُ لله رَبِّ الْعَالَمِينَ"))

    // Numbers automatically convert to Arabic-Indic
    pdf.SetXY(50, 130)
    pdf.Cell(nil, arabicgo.ToArabic("سورة الفاتحة - 7 آيات"))

    pdf.WritePdf("quran.pdf")
}

API Reference

Core Functions

ToArabic(text string) string

The main text processing function. Transforms Arabic text for proper visual rendering by:

  • Applying contextual character shaping
  • Forming required ligatures (Lam-Alef, Allah)
  • Preserving and positioning tashkeel marks
  • Converting Western numerals to Arabic-Indic
  • BiDi reordering for correct RTL display with mixed-direction text

Shape(text string) string

Alias for ToArabic() - use whichever name you prefer.

Utility Functions

IsTashkeel(r rune) bool

Check if a rune is an Arabic diacritical mark.

IsWesternDigit(r rune) bool

Check if a rune is a Western Arabic digit (0-9).

ToEasternDigit(r rune) rune

Convert a single Western digit to its Eastern Arabic-Indic equivalent.

GetShaddaLigature(vowel rune) rune

Get the combined Shadda+Vowel ligature character for a vowel.

Supported Characters

Arabic Alphabet

All 28 Arabic letters with full contextual forms:

ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي

Special Characters

ة (Teh Marbuta)    ى (Alef Maksura)    ء (Hamza)
أ (Alef + Hamza)   إ (Alef + Hamza Below)   آ (Alef + Maddah)
ؤ (Waw + Hamza)    ئ (Yeh + Hamza)

Persian/Urdu Extensions

پ (Peh)   چ (Tcheh)   ژ (Jeh)   گ (Gaf)   ک (Keheh)

Complete Tashkeel Set

Mark Name Unicode
َ Fatha U+064E
ُ Damma U+064F
ِ Kasra U+0650
ْ Sukun U+0652
ّ Shadda U+0651
ً Tanween Fath U+064B
ٌ Tanween Damm U+064C
ٍ Tanween Kasr U+064D
ٰ Superscript Alef U+0670
ٓ Maddah Above U+0653
ٔ Hamza Above U+0654
ٕ Hamza Below U+0655
ٖ Subscript Alef U+0656
ٗ Inverted Damma U+0657
٘ Noon Ghunna U+0658

Shadda + Vowel Ligatures

The library automatically combines Shadda with vowels into single ligature characters:

Combination Ligature
Shadda + Fatha ﱠ (U+FC60)
Shadda + Damma ﱡ (U+FC61)
Shadda + Kasra ﱢ (U+FC62)
Shadda + Dammatan ﱞ (U+FC5E)
Shadda + Kasratan ﱟ (U+FC5F)
Shadda + Superscript Alef ﱣ (U+FC63)

Bidirectional (BiDi) Text Support

ArabicGo includes a simplified Unicode BiDi algorithm that correctly handles mixed Arabic + English text. English words and numbers stay left-to-right within the right-to-left Arabic flow — no manual intervention needed.

// Mixed directions — English stays LTR within RTL flow
arabicgo.ToArabic("go الان تدعم العربية!")
// Output: !الان تدعم العربية go    (English "go" stays LTR ✓)

// Numbers stay in correct order
arabicgo.ToArabic("2026 ستكون أفضل")
// Output: ٢٠٢٦ ستكون أفضل    (digits stay LTR ✓)

// Punctuation and multi-word English
arabicgo.ToArabic("لنقل: for example:")
// Output:  for example: :لنقل

How It Works

The BiDi engine classifies each character by direction (RTL for Arabic, LTR for English/numbers, Neutral for spaces/punctuation), groups them into directional runs, then reorders the runs for correct RTL paragraph display:

  1. Classify — each rune gets a direction via fast Unicode range checks (~2.8 ns per rune)
  2. Resolve neutrals — spaces and punctuation inherit direction from their neighbors
  3. Split into runs — consecutive same-direction characters are grouped
  4. Reorder — reverse run order (RTL base), then reverse only Arabic runs internally

Why ArabicGo?

PDF libraries and image generators don't natively support Arabic because:

  1. Contextual Shaping - Arabic letters have 4 different forms depending on position
  2. Right-to-Left Flow - Text must be reversed for proper display
  3. Bidirectional Text - Mixed Arabic + English text needs different segments ordered differently
  4. Tashkeel Complexity - Diacritical marks must stay attached to their base letters
  5. Ligature Requirements - Certain letter combinations must form ligatures

ArabicGo handles all of this automatically, allowing you to focus on your application logic while producing beautiful, correctly-rendered Arabic text - including the Holy Quran with complete tashkeel and mixed-language content.

Arabic Shaping Ecosystem

Several libraries exist across languages for Arabic text shaping. Here's how the landscape looks:

Library Language Tashkeel Shadda+Vowel Ligatures Allah Ligature Quranic Marks Numerals BiDi Dependencies
ArabicGo Go Full (14 marks) Yes (6 combinations) Yes Yes Yes Yes None
goarabic Go Strip/Partial No No No No No None
garabic Go Strip only No No No Yes No External
python-arabic-reshaper Python Keep or Delete No No Partial No No None
HarfBuzz C/C++ Font-dependent Font-dependent Font-dependent Font-dependent No External Large

ArabicGo was built with a specific focus on Quranic text rendering — full diacritical mark preservation, Shadda+Vowel ligature combining, and Allah ligature detection, all with zero dependencies.

Benchmarks

Run on Intel Core Ultra 9 285H, Go 1.25, Windows/amd64:

go test -bench=. -benchmem
Benchmark Input Time/op Allocs/op Bytes/op
ShortWord مرحبا (5 chars) ~6.4 µs 9 648 B
WithTashkeel بِسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ ~21 µs 22 3,224 B
FullVerse 3 Quran verses with full tashkeel ~58 µs 38 7,472 B
WithNumbers Arabic text + numeral conversion ~20 µs 17 2,328 B
MixedBiDi go الان تدعم العربية (mixed directions) ~12.7 µs 16 1,600 B
LongText Surah Al-Fatiha (full, 7 verses) ~203 µs 106 30,248 B
ClassifyRune Single rune classification ~2.8 ns 0 0 B
IsTashkeel Single rune check ~30 ns 0 0 B
ToEasternDigit Single digit conversion ~11 ns 0 0 B

Utility functions (IsTashkeel, ToEasternDigit, ClassifyRune) are zero-allocation. Mixed Arabic + English text with BiDi reordering processes in ~13 microseconds.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

AmmrFX - GitHub

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages