Skip to content

Text Mining on COVID-19 article of CBC news - Summarize the given text automatically using spaCy & Python.

License

Notifications You must be signed in to change notification settings

0xsuid/text-summerizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Summerizer

Text Mining on CBC news article with Natural Language Processing(NLP) - Automatically summarize the given text using spaCy & Python.

Requirements

  1. Spacy
  2. spaCy Model
    • python -m spacy download en_core_web_sm

Overview

  1. Convert the input text to a list of sentences. Then, compute the number of sentences in the given Text.
  2. Calculate the frequency of words in each sentence:
    • The output is a dictionary where each key is a sentence and the value is also a dictionary of word frequency.
  3. Calculate Term frequency for each word in a sentence:
TF(word) = (Number of times term “word” appears in a sentence) / (Total number of terms in the sentence)
  1. Create a matrix termFrequency:
    • The termFrequency matrix is a dictionary where each key is a sentence and the value is also a dictionary of word frequency.
  2. For each word compute how many sentences contain that word.
  3. Calculate IDF for each word in a sentence.
IDF(word) = log_e(Total number of sentences / number of sentences with term word in it)
  1. Compute the TF-IDF for each word in each sentence.
  2. Use the TF-IDF computed in (7) and give a weight for each sentence.
  3. Threshold: compute the average sentence weight
  4. Generate the summary : select a sentence for summarization if the weight of the sentence exceeds the threshold.

References

About

Text Mining on COVID-19 article of CBC news - Summarize the given text automatically using spaCy & Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published