Search results

From Cohen Courses
Jump to navigationJump to search

Page title matches

Page text matches

  • ...s a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.
    218 bytes (29 words) - 02:18, 27 September 2012
  • ...sent documents in the collection serving as the search space and index the documents accordingly.
    265 bytes (33 words) - 03:07, 6 November 2012
  • ...about identifying authoritative documents in a given domain. Authoritative documents are ones which exhibit novel and relevant information relative to a documen Identifying such documents would be helpful in summarizing the information present in the collection w
    543 bytes (71 words) - 19:41, 3 October 2012
  • ...the term frequency multiplied by the inverse document frequency (number of documents the term appears in within the corpus).
    275 bytes (34 words) - 11:14, 3 October 2012
  • ...the search scope by overcoming vocabulary mismatch between user query and documents in collection.
    391 bytes (51 words) - 03:04, 6 November 2012
  • ...egression]] with weight vector eta, and a measure of similarity of the two documents, using Hadamad product of the topic weights.
    1 KB (197 words) - 18:09, 1 February 2011
  • ...t|dataset]] is used for text categorization classification, and consist of documents that appeared on the Reuters Newswire in 1987. ...The first 21 files contain 1000 documents each, and the 22nd contains 578 documents. The formatting of the data is in SGML format.
    1 KB (143 words) - 00:02, 26 September 2011
  • ...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer
    2 KB (226 words) - 21:09, 1 October 2012
  • ...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer
    2 KB (226 words) - 21:59, 1 October 2012
  • By definition, online reference refers to the inference on newly arrived documents after the batch training process
    115 bytes (17 words) - 00:02, 5 April 2011
  • This refers to any [[Category::dataset]] comprised of random documents that are available in the World Wide Web and can be accessed through a web
    154 bytes (26 words) - 03:58, 30 September 2011
  • Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD. ...odeling the document collection]] as a [[Method::bipartite graph]] between documents and words, using which the simultaneous clustering problem can be posed as
    1 KB (164 words) - 01:57, 28 March 2011
  • ...of estimating the underlying model using which the document or the set of documents were generated.
    124 bytes (20 words) - 21:08, 3 October 2012
  • A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble personal journals.
    210 bytes (22 words) - 11:26, 3 October 2012
  • ...refers to the [[category::problem]] of identifying approximately duplicate documents or strings.
    221 bytes (23 words) - 15:29, 28 September 2011
  • A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble newspaper articles, rather than personal blo
    245 bytes (27 words) - 11:25, 3 October 2012
  • ...nding the cosine similarity between the vectors corresponding to these two documents. Each element of vector A and vector B is generally taken to be tf-idf weig Widely used for calculating the similarity of documents using the bag-of-words and vector space models
    1 KB (210 words) - 00:49, 7 February 2011
  • This corpus contains news articles and other text documents manually annotated for opinions and other private states.
    329 bytes (36 words) - 21:25, 26 September 2012
  • ...nformation retrieval tasks, such as: query expansion, semantic indexing of documents and search results organization.
    326 bytes (37 words) - 15:30, 25 September 2011
  • ...n entity of interest in a time window ''c'' is compared with the counts of documents containing the entity in the leading ''k'' windows. The entity is said to b
    926 bytes (138 words) - 08:52, 2 November 2011
  • Networks of references between documents such as papers, patents, or court cases.
    276 bytes (36 words) - 23:50, 6 February 2011
  • ...' aims to automatically find professional specialists from a collection of documents. An example is that we can discover experts in individual areas from scient
    414 bytes (60 words) - 15:39, 29 September 2012
  • ...' aims to automatically find professional specialists from a collection of documents. An example is that we can discover experts in individual areas from scient
    414 bytes (60 words) - 20:32, 3 October 2012
  • ...with about 1 million documents per day. In total it consist of 90 million documents (blog posts and news articles) from 1.65 million different sites obtained t 30% of the total number of documents in our dataset.
    2 KB (281 words) - 18:23, 22 April 2011
  • * The CiteSeer dataset contains 1,504 machine learning documents with 2,892 author references to 1,165 author entities.
    391 bytes (45 words) - 00:51, 1 April 2011
  • ...o sentences in the selected documents that are relevant to the topics. The documents that are annotated are separately distributed in a sentence-segmented forma
    1 KB (145 words) - 21:38, 26 September 2012
  • Documents related to the issue of animal cloning are contains 25 documents. All documents in the same set are
    4 KB (534 words) - 18:44, 26 October 2012
  • ...or model) is an algebraic [[Category::Method|model]] for representing text documents (and any objects, in general) as vectors of identifiers, such as, for examp
    439 bytes (65 words) - 20:35, 30 September 2012
  • ...ir frequency. This paper seeks to present a better model for understanding documents with associated tag data, using unlabeled data to uncover latent structure ...categories are latent variables, whereas the content and social annotation documents are visible.
    5 KB (800 words) - 10:28, 3 October 2012
  • Documents are ranked based on their scores. <br> ** TF-IDF between Q and all documents cited D
    4 KB (572 words) - 23:08, 2 April 2011
  • The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It was origina
    485 bytes (65 words) - 02:19, 27 September 2012
  • ...ions of "progress after hospital stay" of Clinical Data Architecture (CDA) documents, which came from Seoul National University Hospital. The data is not public The evaluation was performed on 200 documents for training and 100 documents for test with 3 fold validation. The performance of the system is not high,
    2 KB (313 words) - 16:06, 21 October 2010
  • ...of an extensive World Wide Web of facts can be achieved by mining the Web documents. This step has been described in [[RelatedPaper::Pasca et al, AAAI 2006]]. There are some differences in mining queries vs documents. These are:
    3 KB (486 words) - 04:20, 22 November 2010
  • ...from a stream of time-stamped information. Approaches usually aim to group documents belonging to the same event into a single cluster.
    657 bytes (94 words) - 19:42, 30 September 2012
  • ...hors_and_Documents Rosen-Zvi et al, The Author-Topic Model for Authors and Documents] ...in that they have a common '''big idea''' of being able to cluster similar documents, with using more than just the terms in the document. Both the papers use m
    2 KB (334 words) - 17:42, 5 November 2012
  • graphs of citations between documents. Using the network of citations between opinions handed down by the
    754 bytes (108 words) - 01:22, 7 February 2011
  • ...ection has 353 pairs of words, and the other collection has 1,225 pairs of documents. Both have human judgments as gold standards.
    2 KB (291 words) - 22:30, 30 November 2010
  • ...content evolution of the topics, where novel contents are introduced in by documents which adopt the topic. Unlike an explicit user behavior (e.g., buying a DVD ...r task as an joint inference problem, taking into consideration of textual documents, social influences, and topic evolution in a unified way. Specifically,
    5 KB (702 words) - 22:42, 5 November 2012
  • ...that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative i
    688 bytes (101 words) - 08:06, 4 October 2012
  • Rosen-Zvi et al, The Author-Topic Model for Authors and Documents * Build a [[UsesMethod:: Topic Model]] which could model the documents generation process by assigning each author with a separate topic mixture c
    3 KB (504 words) - 00:13, 1 April 2011
  • We examine the problem of predicting local sentiment flow in documents, and its
    674 bytes (100 words) - 22:16, 5 November 2012
  • ...l derived models, this one is not completely generative due to hyperlinked documents being fixed. ...sets of 1,124 (doesn't explicitly state what happened to the duplicated 68 documents - which could be a potential source of overfitting). The model needs a bipa
    5 KB (740 words) - 22:21, 1 December 2012
  • * Identifying topics and common subjects covered by documents. * Identifying authoritative documents on a given topic.
    4 KB (610 words) - 17:08, 5 November 2012
  • ...ontain attributes as the positive sample. The rest of the sentences in the documents are used as negative samples.
    2 KB (318 words) - 17:18, 5 October 2010
  • ...phrases in clinical narrative texts. I am going to use clinical narrative documents written by Korean doctors. The high level concept information which will be ...s such clinical texts automatically in Korea. Semantic tagging on clinical documents will be able to help developing applications which can be useful for doctor
    4 KB (637 words) - 04:48, 9 October 2010
  • ...ontain attributes as the positive sample. The rest of the sentences in the documents are used as negative samples.
    2 KB (330 words) - 14:21, 26 September 2010
  • ...the larger seed set; new models can then be trained on the newly labelled documents. ...ery high-precision indicator. Using these seeds, labels can be assigned to documents containing those seeds. If the seeds are balanced across classes, this will
    4 KB (667 words) - 02:13, 30 November 2011
  • The Author-Topic Model for Authors and Documents. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth. In Proc ...atalab.uci.edu/author-topic/398.pdf The Author-Topic Model for Authors and Documents]
    2 KB (353 words) - 23:22, 26 September 2012
  • ...eference (CDC) is the task of extracting all the noun phrases from all the documents in a corpus, and clustering them according to the real-world entity that th ..., an additional layer of complexity is introduced: clusters from different documents must also be resolved as describing the same real-world entity or not.
    4 KB (521 words) - 02:11, 28 September 2010
  • ...ich could jointly model the documents along with the citations between the documents. Both the words and citations in a document are dependent on the topic prop
    3 KB (380 words) - 21:01, 28 March 2011

View (previous 50 | next 50) (20 | 50 | 100 | 250 | 500)