Information Retrieval

Titel: Information Retrieval
Organisation: UNI DUISBURG
Seitenzahl: 204

Skript herunterladen (PDF)

Inhalt

Einführung
Was ist Information Retrieval?
IR-Konzepte
Daten — Information — Wissen
Konzeptionelles Modell für IR-Systeme
Evaluierung
Effizienz und Effektivität
Relevanz
Distributionen
Standpunkte und Bewertungsmaße
Benutzerstandpunkte
Benutzer- vs. Systemstandpunkte
Maße für boolesches Retrieval
Recall, Precision und Fallout
Recall-Abschätzung
Frageweise Vergleiche
Mittelwertbildung
Rangordnungen
Lineare Ordnung
Schwache Ordnung
Interpretation von Recall-Precision-Graphen
Abbruchkriterium
Abbruchkriterium
Mittelwertbildung und Signifikanztests bei Rangordnungen
Nützlichkeitsmaß
Evaluierungsinitiativen
Evaluierungsmaße
TREC
CLEF
NTCIR
INE
Evaluierung von interaktivem Retrieval
Wissensrepräsentation für Texte
Problemstellung
Freitextsuche
Informatischer Ansatz
Computerlinguistischer Ansatz
Dokumentationssprachen
Allgemeine Eigenschaften
Klassifikationen
Thesauri
RDF (Resource Description Framework)
Dokumentationssprachen vs. Freitext
Beurteilung der Verfahren zur Repräsentation von Textinhalten
Zusammenhang zwischen Modellen und Repräsentationen
Textrepräsentation für IR-Modelle
Einfache statistische Modelle
Nicht-probabilistische IR-Modelle
Notationen
Überblick über die Modelle
Boolesches Retrieval
Mächtigkeit der booleschen Anfragesprache
Nachteile des booleschen Retrieval
Fuzzy-Retrieval
Beurteilung des Fuzzy-Retrieval
Das Vektorraummodell
Coordination Level Match
Relevance Feedback
Dokumentindexierung
Beurteilung des VRM
Dokumenten-Clustering
Cluster-Retrieval
Ähnlichkeitssuche von Dokumenten
Probabilistisches Clustering
Cluster-Browsing
Scatter/Gather-Browsing
Probabilistic Models in Information Retrieval
Introduction
Basic concepts of relevance models
The binary independence retrieval model
A conceptual model for IR
Parameter learning in IR
Event space
The Probability Ranking Principle
Some relevance models
A description-oriented approach for retrieval functions
The binary independence indexing model
A description-oriented indexing approach
The 2-Poisson model
Retrieval with probabilistic indexing
IR as uncertain inference
Parameter estimation
Parameter estimation and IR models
Standard methods of parameter estimation
Optimum parameter estimation
Models based on propositional logic
A Probabilistic Inference Model
Classical IR models
Disjoint basic concepts
Nondisjoint basic concepts
Models based on predicate logic
Introduction
Terminological logic
Thesauri
Elements of terminological logic
Semantics of MIRTL
Retrieval with terminological logic
Datalog
Introduction
Hypertext structure
Aggregation
Object hierarchy
Retrieval with terminological knowledge
Probabilistic Datalog
Introduction
Informal description of DatalogP
Syntax
Semantics of DatalogP
Evaluation of DatalogP programs
DatalogP with independence assumptions
Further application examples
Probabilistic rules
IR-Systeme
Ebenenarchitektur
Konzeptionelle Ebene
Stufen der Systembeteiligung
Arten von Suchaktivitäten
Kombination von Systembeteiligung und Suchaktivitäten
Semantic level
The FERMI multimedia retrieval model
POOL
FMM and POOL
Implementierung von IR-Systemen
Hardware-Aspekte
Speichermedien
Ein-/Ausgabegeräte
Kommunikationsnetzwerke
Aufbau von IRS
Funktionale Sicht
Dateistruktur
Dialogfunktionen herkömmlicher IRS
Dokumentarchitekturen
ODA
Markup-Sprachen
Zugriffspfade
Scanning
Ähnlichkeit von Zeichenketten
Invertierte Listen
Signaturen
PAT-Bäume
Fact Retrieval
A Probabilistic Approach to Fact Retrieval
Introduction
Foundations of the probabilistic model
Indexing for missing or imprecise data
Retrieval functions
Integration of Text and Fact Retrieval
Introduction
Extending text retrieval methods for coping with facts
An application example

Vorschau

Information Retrieval Skriptum zur Vorlesung im SS 04

Norbert Fuhr 11. Juni 2004

Inhaltsverzeichnis

1 Einfuhrung ¨ 1.1 Was ist Information Retrieval? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 IR-Konzepte 2.1 Daten — Information — Wissen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Konzeptionelles Modell f¨r IR-Systeme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . u 3 Evaluierung 3.1 Eﬃzienz und Eﬀektivit¨t . . . . . . . . . . . . . . . . . . . . . . . a 3.2 Relevanz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Distributionen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Standpunkte und Bewertungsmaße . . . . . . . . . . . . . . . . . . 3.4.1 Benutzerstandpunkte . . . . . . . . . . . . . . . . . . . . . 3.4.2 Benutzer- vs. Systemstandpunkte . . . . . . . . . . . . . . . 3.5 Maße f¨r boolesches Retrieval . . . . . . . . . . . . . . . . . . . . . u 3.5.1 Recall, Precision und Fallout . . . . . . . . . . . . . . . . . a 3.5.2 Recall-Absch¨tzung . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Frageweise Vergleiche . . . . . . . . . . . . . . . . . . . . . 3.5.4 Mittelwertbildung . . . . . . . . . . . . . . . . . . . . . . . 3.6 Rangordnungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Lineare Ordnung . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Schwache Ordnung . . . . . . . . . . . . . . . . . . . . . . . 3.7 Interpretation von Recall-Precision-Graphen . . . . . . . . . . . . . 3.7.1 Abbruchkriterium: Anzahl der relevanten Dokumente (NR) 3.7.2 Abbruchkriterium: Anzahl der Dokumente . . . . . . . . . . 3.8 Mittelwertbildung und Signiﬁkanztests bei Rangordnungen . . . . 3.9 N¨tzlichkeitsmaß . . . . . . . . . . . . . . . . . . . . . . . . . . . . u 3.10 Evaluierungsinitiativen:TREC, CLEF, NTCIR und INE . . . . . 3.10.1 Evaluierungsmaße . . . . . . . . . . . . . . . . . . . . . . . 3.10.2 TREC: Text REtrieval Conference . . . . . . . . . . . . . . 3.10.3 CLEF: Cross-Language Evaluation Forum . . . . . . . . . . 3.10.4 NTCIR: NACSIS Test Collection Project . . . . . . . . . . 3.10.5 INE : Initiative for the Evaluation of ML Retrieval . . . 3.11 Evaluierung von interaktivem Retrieval . . . . . . . . . . . . . . . 4 Wissensrepr¨sentation fur Texte a ¨ 4.1 Problemstellung . . . . . . . . . . . . . 4.2 Freitextsuche . . . . . . . . . . . . . . 4.2.1 Informatischer Ansatz . . . . . 4.2.2 Computerlinguistischer Ansatz 4.3 Dokumentationssprachen . . . . . . . 4.3.1 Allgemeine Eigenschaften . . . 4.3.2 Klassiﬁkationen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 8 8 9 13 14 14 15 15 15 16 16 16 17 18 19 20 20 21 24 24 27 27 28 31 31 31 32 32 32 33 34 34 34 35 36 43 43 43

1

INHALTSVER EICHNIS 4.3.3 Thesauri . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 RDF (Resource Description Framework) . . . . . . . . . 4.3.5 Dokumentationssprachen vs. Freitext . . . . . . . . . . . Beurteilung der Verfahren zur Repr¨sentation von Textinhalten a usammenhang zwischen Modellen und Repr¨sentationen . . . a 4.5.1 Textrepr¨sentation f¨r IR-Modelle . . . . . . . . . . . . a u 4.5.2 Einfache statistische Modelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 48 52 54 55 56 56 56 57 57 58 58 59 59 59 60 61 62 62 65 66 66 67 68 68 71 71 72 72 72 72 74 75 76 77 79 79 83 85 87 87 89 90 90 91 93 96 96 97 97 99

4.4 4.5

5 Nicht-probabilistische IR-Modelle 5.1 Notationen . . . . . . . . . . . . . . . . . . . . . . ¨ 5.2 Uberblick uber die Modelle . . . . . . . . . . . . . ¨ 5.3 Boolesches Retrieval . . . . . . . . . . . . . . . . . 5.3.1 M¨chtigkeit der booleschen Anfragesprache a 5.3.2 Nachteile des booleschen Retrieval . . . . . 5.4 Fuzzy-Retrieval . . . . . . . . . . . . . . . . . . . . 5.4.1 Beurteilung des Fuzzy-Retrieval . . . . . . . 5.5 Das Vektorraummodell . . . . . . . . . . . . . . . . 5.5.1 Coordination Level Match . . . . . . . . . . 5.5.2 Relevance Feedback . . . . . . . . . . . . . 5.5.3 Dokumentindexierung . . . . . . . . . . . . 5.5.4 Beurteilung des VRM . . . . . . . . . . . . 5.6 Dokumenten-Clustering . . . . . . . . . . . . . . . 5.6.1 Cluster-Retrieval . . . . . . . . . . . . . . . ¨ 5.6.2 Ahnlichkeitssuche von Dokumenten . . . . . 5.6.3 Probabilistisches Clustering . . . . . . . . . 5.6.4 Cluster-Browsing . . . . . . . . . . . . . . . 5.6.5 Scatter/Gather-Browsing . . . . . . . . . .

6 Probabilistic Models in Information Retrieval 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Basic concepts of relevance models . . . . . . . . . . . . . . . 6.2.1 The binary independence retrieval model . . . . . . . 6.2.2 A conceptual model for IR . . . . . . . . . . . . . . . 6.2.3 Parameter learning in IR . . . . . . . . . . . . . . . . 6.2.4 Event space . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 The Probability Ranking Principle . . . . . . . . . . . 6.3 Some relevance models . . . . . . . . . . . . . . . . . . . . . . 6.3.1 A description-oriented approach for retrieval functions 6.3.2 The binary independence indexing model . . . . . . . 6.3.3 A description-oriented indexing approach . . . . . . . 6.3.4 The 2-Poisson model . . . . . . . . . . . . . . . . . . . 6.3.5 Retrieval with probabilistic indexing . . . . . . . . . . 6.4 IR as uncertain inference . . . . . . . . . . . . . . . . . . . . . 6.5 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Parameter estimation and IR models . . . . . . . . . . 6.5.2 Standard methods of parameter estimation . . . . . . 6.5.3 Optimum parameter estimation . . . . . . . . . . . . . 7 Models based on propositional logic 7.1 A Probabilistic Inference Model . . . 7.2 Classical IR models . . . . . . . . . . 7.2.1 Disjoint basic concepts . . . . 7.2.2 Nondisjoint basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .