Introduction to Information Retrieval
Introduction to
Information Retrieval
CS276: Information Retrieval and Web Search
Text Classification 1
Chris Manning, Pandu Nayak and Prabhakar Raghavan
Introduction to Information Retrieval
Prep work
This lecture presumes that you’ve seen the 124 coursera lecture on Naïve Bayes, or equivalent
Will refer to NB without describing it
Ch. 13
Introduction to Information Retrieval
Standing queries
The path from IR to text classification:
You have an information need to monitor, say:
Unrest in the Niger delta region
You want to rerun an appropriate query periodically to find new news items on this topic
You will be sent new documents that are found
., it’s not ranking but classification (relevant vs. not relevant)
Such queries are called standing queries
Long used by “information professionals”
A modern mass instantiation is Google Alerts
Standing queries are (hand-written) text classifiers
Ch. 13
Introduction to Information Retrieval
3
Introduction to Information Retrieval
Spam filteringAnother text classification task
From: "" <takworlld@>
Subject: real estate is the only way... gem oalvgkay
Anyone can buy real estate with no money down
Stop paying rent TODAY !
There is no need to spend hundreds or even thousands for similar courses
I am 22 years old and I have already purchased 6 properties using the
methods outlined in this truly INCREDIBLE ebook.
Change your life NOW !
=================================================
Click Below to order:
es/
=================================================
Ch. 13
Introduction to Information Retrieval
Categorization/Classification
Given:
A representation of a document d
Issue: how to represent text documents.
Usually some type of high-dimensional space – bag of words
A fixed set of classes:
C = {c1, c2,…, cJ}
Determine:
The category of d: γ(d) ∈ C, where γ(d) is a classification function
We want to build classification functions (“classifiers”).
Sec. 1
氟离子选择电极使用及氟离子浓度测量 来自淘豆网www.taodocs.com转载请标明出处.