By Gerard Salton
Provides a idea of indexing able to score index phrases, or topic identifiers in lowering order of significance. This results in the alternative of fine rfile representations, and likewise debts for the function of words and of word list sessions within the indexing procedure.
This examine is common of theoretical paintings in computerized info association and retrieval, in that innovations are used from arithmetic, laptop technology, and linguistics. an entire idea of details retrieval might emerge from a suitable blend of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Best probability books
This paintings indicates sleek probabilistic equipment in motion: Brownian movement strategy as utilized to phenomena invesitigated via eco-friendly et al. It starts off with the Newton-Coulomb capability and ends with options through first and final exits of Brownian paths and conductors.
- Level crossing methods in stochastic models
- Empirical Model Building: Data, Models, and Reality, Second Edition (Wiley Series in Probability and Statistics)
- Probabilistics Search for Tracking Targets: Theory and Modern Applications
- Probability and Experimental Errors in Science
Extra resources for A Theory of Indexing
Column 4 contains the output for "thesaurus plus PT phrases", where pairs and triples are derived from high-frequency nondiscriminators only.
The recall-precision results shown in Table 13 for the three test collections show that in general better average performance is obtained when the low-valued terms are deleted than with the full vocabulary. The best performance result is emphasized in Table 13 by a vertical bar. The last two columns of the Table contain statistical significance output. For each pair of processes listed, t-test and Wilcoxon signed A THEORY OF INDEXING 37 rank test probabilities are given. It is seen that all term deletion results are significantly better than the standard term frequency word stem weighting, with the exception of the DISC CUT run used with the CRAN collection.
The average S/N terms exhibit a medium document frequency and a total collection frequency which is about fifty percent higher than the document frequency. Their frequency distributions are characterized by an occurrence frequency of 1 in a very large proportion of the documents to which they are assigned. This last feature is accentuated even more in the poor S/N terms—these terms occur exclusively with very low term frequencies, and the distribution is very flat. The characterization of the S/N terms contained in the upper half of Table 5 makes it appear that the S/N classification is one based on specificity alone, and that it is not well correlated with the frequency characteristics.
A Theory of Indexing by Gerard Salton