A simple NLP package for text preprocessing in Python. It includes:
- Sentence Splitting
- Word Splitting
- Stopword Removal
- Special Character & Punctuation Removal
- N-Grams
- Word Counting
- Bag of Words Model
Clone the repository:
git clone https://github.com/StrangeVlad/NLP_Package_Python.git
cd my_nlp_packageThen install it using:
pip install -e .Importing the package
from my_nlp.preprocessing import splitSentence, removeStopWords
text = "I love coding! NLP is awesome."
sentences = splitSentence(text, ['!', '.'])
print(sentences) # ['I love coding', 'NLP is awesome']
words = ["I", "love", "coding"]
filtered_words = removeStopWords(words, ["I"])
print(filtered_words) # ['love', 'coding']splitSentence("Hello world. Python is fun!", ["."])
# Output: ['Hello world', 'Python is fun!']splitWords(["Hello world"])
# Output: ['Hello', 'world']removeStopWords(["I", "love", "coding"], ["I"])
# Output: ['love', 'coding']removeSpecialCharactersPunctuation(["lo$$ve", "code!", "play????ing"], ["$", "!", "?"])
# Output: ['love', 'code', 'playing']nGrams(["love", "coding", "NLP"], 2)
# Output: [('love', 'coding'), ('coding', 'NLP')]wordCounting(["love", "coding", "love"])
# Output: {'love': 2, 'coding': 1}bagOfWords([["I", "love", "coding"], ["coding", "is", "fun"]])
# Output: Vocabulary: ['I', 'coding', 'fun', 'is', 'love']-
[1, 1, 0, 0, 1]
-
[0, 1, 1, 1, 0]
This project is licensed under the MIT License.
Feel free to contribute by submitting pull requests or opening issues.