NLP with Python and NLTK

Loading

Natural Language Processing (NLP) is a field of Artificial Intelligence that enables computers to understand, interpret, and respond to human language.


Installing Required Libraries

pip install nltk

Step 1: Importing NLTK

import nltk
nltk.download('all') # Download necessary datasets

Step 2: Tokenization (Splitting Text into Words & Sentences)

from nltk.tokenize import word_tokenize, sent_tokenize

text = "Natural Language Processing is amazing! It helps computers understand human language."

# Tokenizing sentences
sentences = sent_tokenize(text)
print("Sentence Tokenization:", sentences)

# Tokenizing words
words = word_tokenize(text)
print("Word Tokenization:", words)

Step 3: Removing Stopwords (Common Unimportant Words)

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

filtered_words = [word for word in words if word.lower() not in stop_words]
print("Filtered Words:", filtered_words)

Step 4: Stemming (Reducing Words to Root Form)

pythonCopyEditfrom nltk.stem import PorterStemmer

ps = PorterStemmer()
stemmed_words = [ps.stem(word) for word in words]
print("Stemmed Words:", stemmed_words)

Step 5: Lemmatization (Better Root Word Reduction with Meaning Retained)

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print("Lemmatized Words:", lemmatized_words)

Step 6: POS (Parts of Speech) Tagging

from nltk import pos_tag

pos_tags = pos_tag(words)
print("POS Tags:", pos_tags)

Step 7: Named Entity Recognition (NER)

from nltk import ne_chunk

ner_tree = ne_chunk(pos_tags)
print("Named Entity Recognition:", ner_tree)

Step 8: Text Classification Using NLTK

from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy

# Preparing dataset
train_data = [
("I love this product!", "Positive"),
("This is an amazing experience", "Positive"),
("I hate this!", "Negative"),
("Worst experience ever", "Negative"),
]

# Feature extractor
def extract_features(text):
words = word_tokenize(text.lower())
return {word: True for word in words}

# Training
train_set = [(extract_features(text), label) for text, label in train_data]
classifier = NaiveBayesClassifier.train(train_set)

# Testing
test_text = "I love this amazing experience"
print("Prediction:", classifier.classify(extract_features(test_text)))

Applications of NLP

Chatbots & Virtual Assistants – Siri, Alexa, Google Assistant.
Sentiment Analysis – Understanding customer reviews.
Machine Translation – Google Translate.
Speech Recognition – Voice commands.

Leave a Reply

Your email address will not be published. Required fields are marked *