Search results
From Cohen Courses
Jump to navigationJump to searchCreate the page "Documents" on this wiki! See also the search results found.
Page title matches
- Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD. ...odeling the document collection]] as a [[Method::bipartite graph]] between documents and words, using which the simultaneous clustering problem can be posed as1 KB (164 words) - 00:57, 28 March 2011
- Rosen-Zvi et al, The Author-Topic Model for Authors and Documents * Build a [[UsesMethod:: Topic Model]] which could model the documents generation process by assigning each author with a separate topic mixture c3 KB (504 words) - 23:13, 31 March 2011
- ...Taylor and C. Lee Giles. 2010. Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization. In Procee ...essesProblem::Cross Document Coreference (CDC)]] for web-scale coropora of documents, by using document-level categories, sub-document level context and extract5 KB (658 words) - 14:58, 7 December 2010
- ...CT [[Huang et al, Coling 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization]]158 bytes (21 words) - 00:44, 1 December 2010
Page text matches
- ...s a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.218 bytes (29 words) - 01:18, 27 September 2012
- ...sent documents in the collection serving as the search space and index the documents accordingly.265 bytes (33 words) - 02:07, 6 November 2012
- ...about identifying authoritative documents in a given domain. Authoritative documents are ones which exhibit novel and relevant information relative to a documen Identifying such documents would be helpful in summarizing the information present in the collection w543 bytes (71 words) - 18:41, 3 October 2012
- ...the term frequency multiplied by the inverse document frequency (number of documents the term appears in within the corpus).275 bytes (34 words) - 10:14, 3 October 2012
- ...the search scope by overcoming vocabulary mismatch between user query and documents in collection.391 bytes (51 words) - 02:04, 6 November 2012
- ...egression]] with weight vector eta, and a measure of similarity of the two documents, using Hadamad product of the topic weights.1 KB (197 words) - 17:09, 1 February 2011
- ...t|dataset]] is used for text categorization classification, and consist of documents that appeared on the Reuters Newswire in 1987. ...The first 21 files contain 1000 documents each, and the 22nd contains 578 documents. The formatting of the data is in SGML format.1 KB (143 words) - 23:02, 25 September 2011
- ...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer2 KB (226 words) - 20:09, 1 October 2012
- ...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer2 KB (226 words) - 20:59, 1 October 2012
- By definition, online reference refers to the inference on newly arrived documents after the batch training process115 bytes (17 words) - 23:02, 4 April 2011
- This refers to any [[Category::dataset]] comprised of random documents that are available in the World Wide Web and can be accessed through a web154 bytes (26 words) - 02:58, 30 September 2011
- Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD. ...odeling the document collection]] as a [[Method::bipartite graph]] between documents and words, using which the simultaneous clustering problem can be posed as1 KB (164 words) - 00:57, 28 March 2011
- ...of estimating the underlying model using which the document or the set of documents were generated.124 bytes (20 words) - 20:08, 3 October 2012
- A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble personal journals.210 bytes (22 words) - 10:26, 3 October 2012
- ...refers to the [[category::problem]] of identifying approximately duplicate documents or strings.221 bytes (23 words) - 14:29, 28 September 2011
- A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble newspaper articles, rather than personal blo245 bytes (27 words) - 10:25, 3 October 2012
- ...nding the cosine similarity between the vectors corresponding to these two documents. Each element of vector A and vector B is generally taken to be tf-idf weig Widely used for calculating the similarity of documents using the bag-of-words and vector space models1 KB (210 words) - 23:49, 6 February 2011
- This corpus contains news articles and other text documents manually annotated for opinions and other private states.329 bytes (36 words) - 20:25, 26 September 2012
- ...nformation retrieval tasks, such as: query expansion, semantic indexing of documents and search results organization.326 bytes (37 words) - 14:30, 25 September 2011
- ...n entity of interest in a time window ''c'' is compared with the counts of documents containing the entity in the leading ''k'' windows. The entity is said to b926 bytes (138 words) - 07:52, 2 November 2011