Part of Speech Tagging and Chunking with NLTK

Chunking with NLTK: POS Tagging and Phrase Extraction

Table of Contents

What is Part of Speech (POS) Tagging?

Introduction to Parts of Speech

Part of Speech (POS) tagging is a fundamental concept in linguistics and Natural Language Processing (NLP) that classifies words based on their grammatical roles in a sentence. In the English language, words are categorized into eight main parts of speech, each serving a specific function in sentence structure. Understanding these classifications is crucial for various NLP applications such as text analysis, machine translation, speech recognition, and information retrieval.

The eight parts of speech in English are:
1️⃣ Nouns
2️⃣ Pronouns
3️⃣ Verbs
4️⃣ Adverbs
5️⃣ Adjectives
6️⃣ Prepositions
7️⃣ Conjunctions
8️⃣ Interjections

Each of these categories plays a unique role in sentence formation. Let’s explore them in detail with definitions and examples.

1. Nouns – Naming Words

Nouns are words that identify people, places, things, or ideas. They serve as the subject or object of a sentence.

Examples of Nouns:

  • Person: Mike, Jennifer, scientist
  • Place: Tokyo, beach, university
  • Thing: Laptop, elephant, vehicle
  • Idea: Happiness, freedom, democracy

Example Sentences:
Tokyo is a beautiful city.
✅ The elephant is the largest land animal.
Freedom is important for personal growth.

IT Courses in USA

2. Pronouns – Replacing Nouns

Pronouns are used to replace nouns to avoid repetition and enhance fluency in sentences.

Types of Pronouns:

  • Personal Pronouns: I, you, he, she, it, we, they
  • Possessive Pronouns: His, hers, yours, ours, theirs
  • Demonstrative Pronouns: This, that, these, those
  • Interrogative Pronouns: Who, what, which, whom
  • Reflexive Pronouns: Myself, yourself, himself, herself

Example Sentences:
He is my best friend. (Replaces “Mike”)
They went to the beach. (Replaces “John and Lisa”)
✅ Is this yours? (Refers to an object belonging to someone)

3. Verbs – Action Words

Verbs describe actions, occurrences, or states of being.

Types of Verbs:

  • Action Verbs: Run, eat, write, sleep, drive
  • Linking Verbs: Is, am, are, was, were, seem, become
  • Helping (Auxiliary) Verbs: Can, could, will, would, should, do, have

Example Sentences:
✅ She writes every day. (Action verb: “writes”)
✅ He is a doctor. (Linking verb: “is”)
✅ They have finished the project. (Helping verb: “have”)

4. Adverbs – Describing Verbs, Adjectives, or Other Adverbs

Adverbs modify verbs, adjectives, or other adverbs, providing additional information about how, when, where, or to what extent an action is performed.

Types of Adverbs:

  • Manner (How?): Quickly, boldly, carefully
  • Time (When?): Yesterday, often, yearly
  • Place (Where?): Here, there, everywhere
  • Degree (To what extent?): Very, quite, too

Example Sentences:
✅ She speaks quickly. (Modifies the verb “speaks”)
✅ He is very tall. (Modifies the adjective “tall”)
✅ They arrived early. (Modifies the verb “arrived”)

5. Adjectives – Describing Nouns and Pronouns

Adjectives describe or modify nouns and pronouns, adding details about quality, size, color, quantity, and more.

Examples of Adjectives:

  • Quality: Cheerful, intelligent, beautiful
  • Size: Small, huge, enormous
  • Color: Red, yellow, blue
  • Quantity: Few, many, several

Example Sentences:
✅ She has a cheerful personality. (Describes “personality”)
✅ The blue car is parked outside. (Describes “car”)
✅ We need many volunteers. (Describes “volunteers”)

6. Prepositions – Connecting Words

Prepositions show the relationship between a noun (or pronoun) and other words in a sentence. They indicate position, direction, time, cause, manner, or possession.

Common Prepositions:

  • Place: In, on, under, above, between
  • Time: At, before, after, during, since
  • Direction: To, from, toward, into, out of

Example Sentences:
✅ The book is on the table. (Position – “on”)
✅ She arrived before noon. (Time – “before”)
✅ He walked toward the park. (Direction – “toward”)

7. Conjunctions – Joining Words and Phrases

Conjunctions connect words, phrases, or clauses in a sentence, making it more structured and fluid.

Types of Conjunctions:

  • Coordinating Conjunctions: And, but, or, yet, so
  • Subordinating Conjunctions: Because, although, since, while
  • Correlative Conjunctions: Either…or, neither…nor, not only…but also

Example Sentences:
✅ She likes coffee and tea. (Joins two nouns)
✅ He stayed home because he was sick. (Joins two clauses)
✅ You can either go or stay. (Correlative conjunctions)

8. Interjections – Expressing Emotions

Interjections are words used to express strong emotions, feelings, or reactions. They are often followed by an exclamation mark.

Examples of Interjections:

  • Excitement: Wow!, Hurrah!, Yay!
  • Surprise: Oh!, Really?, What!?
  • Sorrow: Alas!, Oh no!, Oops!

Example Sentences:
Wow! That’s amazing. (Expresses excitement)
Oh no! I forgot my keys. (Expresses worry)
Hurrah! We won the match. (Expresses happiness)

Importance of POS Tagging in NLP

Part of Speech (POS) tagging is widely used in Natural Language Processing (NLP) to identify grammatical categories in a text. It is crucial for:
🔹 Speech Recognition: Understanding sentence structure for accurate transcriptions.
🔹 Machine Translation: Ensuring correct word usage across languages.
🔹 Chatbots & AI Assistants: Improving sentence interpretation.
🔹 Text Mining & Sentiment Analysis: Analyzing patterns and extracting meaning from texts.

Example of POS Tagging in Python using NLTK:

pythonimport nltk
nltk.download('averaged_perceptron_tagger')

text = "The quick brown fox jumps over the lazy dog"
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

print(pos_tags)

Output:

bash[('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), 
('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

Now that you know what each part of speech are, let’s discuss Part of Speech Tagging 

Part of Speech (POS) Tagging 

POS tagging in simple terms means allocating every word in a sentence to a part of speech. NLTK has a method called pos_tag that performs POS tagging on a sentence. The methods apply supervised learning approaches that utilize features such as context, the capitulation of words, punctuations, and so on to determine the part of speech. 

POS tagging is a critical procedure to understand the meaning of a sentence and know the relationship between words. 

There are 35 POS tags in NLTK’s pos_tag methods. The tags are shown in the table below

TagAbbreviation Words
Coordinating Conjunction CCBut, yet, although
Determiner DTA, An, The, This, My, Most
Cardinal DigitCDOne, Two, Three, Forty
Existential ThereEXThere
Foreign WordFWEn masse, bona fide, et cetera, et al
Subordinating Conjunction or PrepositionINOver, Behind, Into
AdjectiveJJBeautiful, Slow, New
Adjective, Comparative JJRGreater, Better, Older
Adjective, SuperlativeJJSGreatest, Best, Oldest
List MarkerLSi, ii, iii, iv, … 
ModalMDHave, Can, Shall
Noun, SingularNNSchool, Table, Pen
Noun, PluralNNSSchools, Tables, Pens
Proper Noun, Singular NNPMonday, Chicago, Mark
Proper Noun, Plural NNPSKoreans, Universities, Americans 
Predeterminer PDTBoth, All, The
Possessive Endings POSDavid’s, Dan’s, Francis’
Personal PronounPRPI, They, She
Possessive PronounPRP$His, Her, Their
AdverbRBLater, Very, Already
Adverb, ComparativeRBRBetter, More, Worse
Adverb, SuperlativeRBSBest, Most, Worst
Particle RPAt, Across, About
ToTOTo
Verb, Base FormVBJump, Eat, Play
Verb, Past TenseVBD Jumped, Ate, Played 
Verb, Present Participle VBGJumping, Eating, Playing
Verb, Past Participle VBNTaken, Given, Gone
Verb, Present Tense but not Third Person SingularVBPEnd, Go, Endure
Verb, Present Tense, Third Person Singular VBZJumps, Eats, Plays
Wh – Determiner WDTWhich, What, Whichever
Wh – PronounsWP Which, Whom, What
Possessive Wh – Pronoun WP$Whose
Wh – Adverb WRBWhere, Why, When

Now that you know what the POS tags are, let’s take a code example to demonstrate the steps involved in POS tagging

#import the nltk library
import nltk
#define a text
sentence = "The man was excited after he was informed about his promotion at work"
#tokenize the text
tokens = nltk.word_tokenize(sentence)

#Perform POS tagging
nltk.pos_tag(tokens)

Output:

[('The', 'DT'),
 ('man', 'NN'),
 ('was', 'VBD'),
 ('excited', 'VBN'),
 ('after', 'IN'),
 ('he', 'PRP'),
 ('was', 'VBD'),
 ('informed', 'VBN'),
 ('about', 'IN'),
 ('his', 'PRP$'),
 ('promotion', 'NN'),
 ('at', 'IN'),
 ('work', 'NN')]

You can also check for more information about a tag using the help.upenn_tagset() method. Say I have forgotten what JJ means, I can find out by typing this line of code

nltk.help.upenn_tagset("JJ")

Output:

JJ: adjective or numeral, ordinal
    third ill-mannered pre-war regrettable oiled calamitous first separable
    ectoplasmic battery-powered participatory fourth still-to-be-named
    multilingual multi-disciplinary ...

The code informs us that JJ means ‘adjective’ and went on to list some examples.

Chunking

Chunking can be defined as the process of extracting phrases or chunks of texts from unstructured texts. There are situations where a single word cannot encapsulate the complete meaning of a text. In such cases, chunks can be used to extract meaningful insights. In other words, chunking allows more flexibility in the extraction process.  

Chunking works on top of POS tags such that it takes input from the POS tags and outputs the chunks. A common group of chunk tags is the noun phrase chunk (NP chunk). To create a noun phrase chunk, a chunk grammar is first defined using POS tags. This chunk grammar contains the rule with which the chunks would be created. 

The rule is created using regular expressions and the following syntax

? means match 0 or 1 repetitions 
* means match 0 or more repetitions
+ means match 1 or more
. means any character but not a new line

The POS tags and regular expressions are placed inside the < > placeholders. <RB.?> for instance would mean 0 or more of any adverbial tense. Let’s take a coding example to drive home our point.

#import the library
import nltk
#define the text
sentence = "I told the children I was going to tell them a story. They were excited"
#tokenize the text
tokens = nltk.word_tokenize(sentence)
#perform POS tagging
tags = nltk.pos_tag(tokens)
#define a chunk grammar named mychunk
chunk_grammar = """ mychunk: {<NNS.?>*<PRP.?>*<VBD?>}"""
#parse the grammar with regular expression parser
parser = nltk.RegexpParser(chunk_grammar)
#assign the chunk
tree = parser.parse(tags)
print(tree)

After getting the POS tags, the chunk grammar defined would select plural nouns with not more than 1 repetition, followed by personal pronouns with not more than 1 repetition, followed by the past tense verb with not more than 1 repetition, anywhere in the text. A RegexpParser was used to parse the chunk grammar. The POS tags were parsed with the parse() method to print the chunk. See the out output.

Output:

(S
  (mychunk I/PRP told/VBD)
  the/DT
  (mychunk children/NNS I/PRP was/VBD)
  going/VBG
  to/TO
  (mychunk tell/VB)
  them/PRP
  a/DT
  story/NN
  ./.
  (mychunk They/PRP were/VBD)
  excited/VBN)

As seen, “I told”, “Children I was”, “Tell” and “They were” were the selected chunk. To visualize the results better, you can use draw() method

tree.draw()

Output: 

(S
  (mychunk I/PRP told/VBD)
  the/DT
  (mychunk children/NNS I/PRP was/VBD)
  going/VBG
  to/TO
  (mychunk tell/VB)
  them/PRP
  a/DT
  story/NN
  ./.
  (mychunk They/PRP were/VBD)
  excited/VBN)
(mychunk I/PRP told/VBD)
(mychunk children/NNS I/PRP was/VBD)
(mychunk tell/VB)
(mychunk They/PRP were/VBD)

Chunking with NLTK: POS Tagging and Phrase Extraction

Why is Chunking Important in Natural Language Processing (NLP)?

Chunking is a crucial technique in Natural Language Processing (NLP) that allows for structured information extraction from text data. While Part of Speech (POS) tagging classifies words into categories such as nouns, verbs, and adjectives, chunking goes a step further by grouping these tagged words into meaningful phrases (also called chunks).

This process is particularly useful in entity detection, information retrieval, and text analysis, as it helps extract specific patterns from text without having to analyze the entire dataset. Let’s explore the importance of chunking, how it works, and why it is widely used in NLP applications.

🔹 The Role of Chunking in NLP

1. Extracting Meaningful Phrases from Text

Chunking helps identify groups of words that form coherent phrases, such as:

  • Noun Phrases (NP): “The big brown fox”
  • Verb Phrases (VP): “is running quickly”
  • Prepositional Phrases (PP): “on the hill”

Instead of analyzing individual words, chunking allows us to extract information at the phrase level, which provides better context and meaning.

For example, consider the sentence:
“The quick brown fox jumps over the lazy dog.”

POS Tagging Output:

bash[('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), 
 ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

With chunking, we can group words into noun phrases:

cssCopyEdit[('The quick brown fox', 'NP'), ('jumps', 'VP'), ('over the lazy dog', 'PP')]

This makes text analysis more structured and meaningful.

2. Chunking for Entity Detection

Chunking is particularly useful for Named Entity Recognition (NER), where we need to extract specific entities such as names, dates, locations, or product details from large text datasets.

For example, if you have a large set of customer transactions and you only need to extract:
Customer Name
Item Purchased
Price
Date of Purchase

You can define a chunk grammar to detect patterns and extract this information without having to analyze the entire text.

Example Use Case: Extracting Purchase Information

Imagine we have the sentence:
“John Doe bought a Samsung Galaxy S21 for $999 on March 5, 2023.”

Using chunking rules, we can extract key details:

bash[('John Doe', 'CUSTOMER_NAME'), ('Samsung Galaxy S21', 'PRODUCT_NAME'),
 ('$999', 'PRICE'), ('March 5, 2023', 'DATE')]

This structured format makes it easier to process transactions, analyze consumer behavior, and automate data extraction tasks in business applications.

3. Faster Information Extraction

One of the major benefits of chunking is its ability to rapidly filter and extract words based on defined grammar rules.

For instance, when processing a large volume of customer reviews, news articles, or business documents, chunking allows us to:

  • Extract key insights without scanning the entire text
  • Group related words together for better context
  • Perform filtering and summarization more efficiently

Example: If we have thousands of product reviews, chunking can help extract:
Customer names
Product attributes
Sentiments (positive/negative opinions)

This significantly improves data processing speed in NLP applications.

4. Chunking vs. POS Tagging: Why Both Are Needed

While POS tagging helps identify the grammatical role of individual words, it does not provide structured phrase-level understanding.

Comparison of POS Tagging and Chunking:

FeaturePOS TaggingChunking
FocusIndividual wordsGroups of words (phrases)
PurposeIdentifies grammatical categoryIdentifies meaningful phrases
Example Output(‘fox’, ‘NN’), (‘jumps’, ‘VBZ’)(‘The quick brown fox’, ‘NP’)
Use CasesSyntax analysis, spell checkEntity recognition, information extraction

Thus, using both POS tagging and chunking together ensures better language understanding.

🔹 Practical Applications of Chunking

Chunking plays a key role in many NLP applications, including:

📌 1. Named Entity Recognition (NER)

  • Extracts names, locations, organizations, dates, and product names from text.
  • Used in customer service chatbots, sentiment analysis, and search engines.

📌 2. Information Retrieval

  • Helps filter relevant content in news analysis, financial reports, and legal documents.

📌 3. Question Answering Systems

  • Enhances AI assistants like Siri, Alexa, and ChatGPT to understand user queries better.

📌 4. Resume Screening in HR

  • Automates extraction of candidate details such as skills, education, and experience.

📌 5. Customer Sentiment Analysis

  • Identifies positive and negative sentiments in product reviews and social media posts.

🔹 Example of Chunking in Python using NLTK

Let’s see how chunking works using the Natural Language Toolkit (NLTK) in Python.

pythonimport nltk

# Sample sentence
sentence = "John Doe bought a Samsung Galaxy S21 for $999 on March 5, 2023."

# Tokenizing words and assigning POS tags
words = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(words)

# Defining a simple chunk grammar for extracting noun phrases
chunk_grammar = r"NP: {<DT>?<JJ>*<NN>+}"  # NP = Noun Phrase

# Creating a chunk parser
chunk_parser = nltk.RegexpParser(chunk_grammar)
chunked = chunk_parser.parse(pos_tags)

# Displaying chunked structure
print(chunked)
chunked.draw()  # Visualize the chunks

Output:

scss(NP John/NN)
(VP bought/VBD)
(NP a/DT Samsung/NN Galaxy/NN S21/NN)
(PP for/IN)
(NP $999/CD)
(PP on/IN)
(NP March/NN 5/CD 2023/CD)

This output shows noun phrases (NP), verb phrases (VP), and prepositional phrases (PP) extracted using chunking.

Why Chunking is Essential

Chunking is a powerful NLP technique that enhances POS tagging by grouping words into meaningful phrases.

✅ It is essential for entity detection, information extraction, and text analysis.
✅ It allows rapid data processing without analyzing the entire text.
✅ It provides better phrase-level understanding than POS tagging alone.
✅ It is widely used in search engines, AI assistants, financial analysis, and e-commerce applications.

By combining POS tagging with chunking, NLP models can extract structured data from text more effectively, leading to better automation, search relevance, and business intelligence. 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Enroll IT Courses

Enroll Free demo class
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.