Stop words are a set of commonly used words in a language. Examples of stop words in English are “a”, “the”, “is”, “are”, etc. These words do not add much meaning to a sentence.
They can be safely ignored without sacrificing the meaning of the sentence. For some search engines, these are some of the most common short function words like is, is, because, which, and on.
List of Stop Words
Following is a list of stop words used for natural language processing (NLP):
a | ourselves |
about | out |
above | over |
after | own |
again | same |
against | shan’t |
all | she |
am | she’d |
an | she’ll |
and | she’s |
any | should |
are | shouldn’t |
aren’t | so |
as | some |
at | such |
be | than |
because | that |
been | that’s |
before | the |
being | their |
below | theirs |
between | them |
both | themselves |
but | then |
by | there |
can’t | there’s |
cannot | these |
could | they |
couldn’t | they’d |
did | they’ll |
didn’t | they’re |
do | they’ve |
does | this |
doesn’t | those |
doing | through |
don’t | to |
down | too |
during | under |
each | until |
few | up |
for | very |
from | was |
further | wasn’t |
had | we |
hadn’t | we’d |
has | we’ll |
hasn’t | we’re |
have | we’ve |
haven’t | were |
having | weren’t |
he | what |
he’d | what’s |
he’ll | when |
he’s | when’s |
her | where |
here | where’s |
here’s | which |
hers | while |
herself | who |
him | who’s |
himself | whom |
his | why |
how | why’s |
how’s | with |
i | won’t |
i’d | would |
i’ll | wouldn’t |
i’m | you |
i’ve | you’d |
if | you’ll |
in | you’re |
into | you’ve |
is | your |
isn’t | yours |
it | yourself |
it’s | yourselves |
its | nor |
itself | not |
let’s | of |
me | off |
more | on |
most | once |
mustn’t | only |
my | or |
myself | other |
no | ought |
ours | our |
You should only remove these tokens if they do not add any new information about your problem. Classification problems usually do not need stop words because it is possible to talk about the general idea of the text even if you remove the stop words from it.
Removing stop words helps reduce both index size and query size. Fewer deadlines is always a win when it comes to performance. And since stop words are semantically empty, the relevance score is not affected.
Quick Links