1. |
[ACGPR01]
Searching the
Web,
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and
Sriram Raghavan.
ACM Transactions on Internet Technology, 1, p. 2-43, 2001.
| All
|
2. |
[BP98]
The Anatomy
of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin and Lawrence Page.
Proceedings of the 7th International WWW conference, 1998.
| All
|
3. |
[NH01]
High-Performance Web Crawling.
Marc Najork and Allan Heydon.
Compaq SRC Research Report 173
| All
|
4. |
[HN99]
Mercator:
A Scalable, Extensible Web Crawler.
Allan Heydon and Marc Najork.
In World Wide Web, December 1999, pages 219-229.
| All
|
5. |
[NW01]
Breadth-First
Search Crawling Yields High-Quality Pages.
Marc Najork and Janet L. Wiener.
In Proceedings of the Tenth Internal World Wide Web Conference, pages 114-118, May, 2001
| All
|
6. |
[RM02]
Search engines and Web dynamics.
Knut Magne Risvik and Rolf Michelsen.
Computer Networks, Volume 39, Issue 3, 21 June 2002, Pages 289-302.
| All
|
7. |
[CG00]
The Evolution of the Web and Implications for an Incremental Crawler.
Junghoo Cho and Hector Garcia-Molina.
Proceedings of 26th International Conference on Very
Large Data Bases (VLDB), 10-14 September 2000, Cairo, Egypt,
Pages 200-209.
| All
|
8. |
[MPSR01]
Evaluating Topic-Driven Web Crawlers.
Filippo Menczer, Gautam Pant, Padmini Srinivasan, and Miguel E. Ruiz.
Proceedings SIGIR 01, September 9-12, 2001, New Orleans, Louisiana,
USA, Pages 241-249.
| All
|
9. |
[CBD99]
Focused
crawling: a new approach to topic-specific Web resource discovery.
Soumen Chakrabarti, Martin van den Berg, and Byron Dom.
Eighth World Wide Web conference, Toronto, 1999.
| All
|
10. |
[RG00]
Crawling the
Hidden Web. Sriram Raghavan and Hector Garcia-Molina.
Stanford Technical Report 2000-36. Full version of paper appearing in
proceedings of 27th International Conference on Very Large Data Bases,
Rome, Italy, September 11-14, 2001.
| All
|
11. |
[MRYG00]
Building a Distributed Full-Text Index for the Web.
Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina.
Stanford Technical Report 2000-55. Short version of paper appeared in
the proceedings of the 10th International WWW conference, May 2-5, 2001, Hong Kong.
| All
|
12. |
[WMB99]
Managing Gigabytes: Compressing and Indexing Documents and Images.
Ian H. Witten, Alistair Moffat, and Timothy C. Bell.
Morgan Kaufmann Publishing, San Francisco, 1999. ISBN 1-55860-570-3.
| All
|
13. |
[BS97]
Fast
Algorithms for Sorting and Searching Strings.
Jon Bentley and Robert Sedgewick.
Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms.
New Orleans, January, 1997. Pages 360- 369.
| All
|
14. |
[MM90]
Suffix arrays: a new method for on-line string searches.
Udi Manber and Gene Myers.
Proceedings of the first annual ACM-SIAM symposium on Discrete
algorithms, 1990.
Pages 319 - 327.
| All
|
15. |
[U95]
On-Line Construction of Suffix Trees.
Esko Ukkonen.
Algorithmica, 14, pages 249-260, 1995.
| All
|
16. |
[H99]
Efficient
Computation of PageRank.
Taher H. Haveliwala.
Stanford Technical Report 2000-36.
| All
|
17. |
[K99]
Authoritative Sources in a Hyperlinked Environment.
Jon M. Kleinberg.
Journal of the ACM, 46(5), 604-632, 1999.
| All
|
18. |
[CDRRGK98]
Automatic Resource
Compilation by Analyzing Hyperlink Structure and Associated Text.
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar
Rajagopalan, David Gibson and Jon Kleinberg.
Proceedings of the Seventh International World Wide Web
Conference (WWW7). Brisbane, Australia. April 1998. Also appeared in:
Computer Networks and ISDN Systems 30, 1998, pp.65-74
| All
|
19. |
[CDGKRRT98]
Experiments in topic distillation.
S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan,
S. Rajagopalan, and A. Tomkins.
In Proceedings of the ACM SIGIR Workshop on Hypertext Information
Retrieval on the Web, Melbourne, Australia, 1998. ACM.
| All
|
20. |
[BH98]
Improved
Algorithms for Topic Distillationa in Hyperlinked Environments.
Krishna Bharat and Monika R. Henzinger.
Proceedings of the 21st
International ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR), 1998, pp. 104-111.
| All
|
21. |
[BB99]
Understanding
Search Engines: Mathematical Modeling and Text Retrieval.
M.W. Berry and M. Browne.
SIAM Book Series: Software, Environments, and Tools, (June 1999), ISBN:
0-89871-437-0.
| Chapter 3
|
22. |
[BR99]
Modern Information
Retrieval.
Ricardo Baeza-Yates, Berthier Ribiero-Neto.
Addison Wesley Higher Education, 1999.
| Section 2.5.1-3, 3.2, 4.2-3, 8.4
|
23. |
[GT98]
Data Structures and
Algorithms in Java.
M. Goodrich and R. Tamassia, Wiley, First edition, 1998.
| Pages 660-663
|
24. |
[B97]
On the resemblance and containment of documents.
Andrei Broder.
In
Compression and Complexity of Sequences (SEQUENCES'97), pages
21-29. IEEE Computer Society, 1998.
| All
|
25. |
[BGNZ97]
Syntactic clustering of the Web.
Andrei Broder, Steve Glassman, Mark Manasse, and Geoffrey Zweig.
In Proceedings of the 6th
International World Wide Web Conference, pages 391-404, April 1997,
also appeared as SRC Technical Note 1997-015 (HTML).
| All
|
26. |
[B93]
Some applications of Rabin's fingerprinting method.
Andrei Z. Broder.
In R. Capocelli, A. De Santis, U. Vaccaro (eds), Sequences
II: Methods in Communications, Security, and Computer Science,
Springer-Verlag, 1993.
| All
|
27. |
[HGI00]
Scalable Techniques for
Clustering the Web.
Taher Haveliwala, Aristides Gionis, Piotr Indyk.
Third International Workshop on the Web and Databases, 2000.
| All
|
28. |
[DFKVV99]
Clustering in large graphs and matrices.
P. Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay.
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, 1999.
| All
|
29. |
[BDB94]
Using Linear Algebra for Intelligent Information Retrieval.
Michael W. Berry and Susan T. Dumais, and Gavin W. O'Brien.
December 1994. Published in SIAM Review 37:4 (1995), pp. 573-595.
| All
|
30. |
[AFKMS01]
Spectral analysis of data.
Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry, and Jared Saia.
Proceedings of the 33rd annual ACM symposium on Theory of
computing, pp 619-626, 2001.
| All
|
31. |
[BBDH00]
A
Comparison of Techniques to Find Mirrored Hosts on the WWW.
Krishna Bharat, Andrei Broder, Jeffrey Dean, Monika R. Henzinger.
Journal of the American Society for Information Science October 2000
Volume 51 Issue 12. Also in 1999 ACM Digital Library Workshop on
Organizing Web Space (WOWS).
| All
|
32. |
[GL02]
The
Web Graph: an Overview.
Jean-Loup Guillaume, Matthieu Latapy.
In Proc. AlgoTel 2002.
| All
|
33. |
[BKMRRSTW00]
Graph structure in the web.
Andrei Broder, Ravi Kumar, Farzin Maghoul,
Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins,
Janet Wiener.
In Proc. Ninth International World Wide Web Conference (WWW9), 2000.
| All
|
34. |
[KRRSTU00]
Stochastic models for the Web graph.
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal.
Proceedings of the 41th IEEE Symp. on Foundations of Computer
Science. November 2000, pp. 57-65.
| All
|
35. |
[L98]
Naive
(Bayes) at Forty: The Independence Assumption in Information Retrieval.
D. D. Lewis.
Proceedings of the 10th European Conference on
Machine Learning (ECML-98), LNAI, Vol. 1398, pp. 4-18, Springer, April
21-23 1998.
| All
|
36. |
[LM01]
SALSA: The Stochastic Approach for
Link-Structure Analysis.
R. Lempel and S. Moran.
ACM Transactions on Information Systems
19(2), pp. 131-160, April 2001. Also in 9th International World Wide
Web conference, 2000
| All
|
37. |
[P01]
Algorithms, games, and the internet.
Christos Papadimitriou.. Proc. 33rd Annual ACM Symposium on Theory of Computing,
2001, pp. 749-753, 2001.
| All
|