Search results

Create the page "Documents" on this wiki! See also the search results found.

Page title matches

Co-clustering documents and words using bipartite spectral graph partitioning
Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD. ...odeling the document collection]] as a [[Method::bipartite graph]] between documents and words, using which the simultaneous clustering problem can be posed as

1 KB (164 words) - 01:57, 28 March 2011
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents * Build a [[UsesMethod:: Topic Model]] which could model the documents generation process by assigning each author with a separate topic mixture c

3 KB (504 words) - 00:13, 1 April 2011
Huang et al, Coling 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
...Taylor and C. Lee Giles. 2010. Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization. In Procee ...essesProblem::Cross Document Coreference (CDC)]] for web-scale coropora of documents, by using document-level categories, sub-document level context and extract

5 KB (658 words) - 15:58, 7 December 2010
Huang et al, ACL 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
...CT [[Huang et al, Coling 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization]]

158 bytes (21 words) - 01:44, 1 December 2010

Page text matches

Reuters-21578
...s a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.

218 bytes (29 words) - 02:18, 27 September 2012
Document representation
...sent documents in the collection serving as the search space and index the documents accordingly.

265 bytes (33 words) - 03:07, 6 November 2012
Authority Identification
...about identifying authoritative documents in a given domain. Authoritative documents are ones which exhibit novel and relevant information relative to a documen Identifying such documents would be helpful in summarizing the information present in the collection w

543 bytes (71 words) - 19:41, 3 October 2012
Tf * idf
...the term frequency multiplied by the inverse document frequency (number of documents the term appears in within the corpus).

275 bytes (34 words) - 11:14, 3 October 2012
Query expansion
...the search scope by overcoming vocabulary mismatch between user query and documents in collection.

391 bytes (51 words) - 03:04, 6 November 2012
Relational topic model
...egression]] with weight vector eta, and a measure of similarity of the two documents, using Hadamad product of the topic weights.

1 KB (197 words) - 18:09, 1 February 2011
Reuters 21578
...t|dataset]] is used for text categorization classification, and consist of documents that appeared on the Reuters Newswire in 1987. ...The first 21 files contain 1000 documents each, and the 22nd contains 578 documents. The formatting of the data is in SGML format.

1 KB (143 words) - 00:02, 26 September 2011
Ye and Chua (2006)
...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer

2 KB (226 words) - 21:09, 1 October 2012
Ye and Chua 2006
...ically construct object data and induce object models from complicated Web documents, such as the technical descriptions of personal computers and digital camer

2 KB (226 words) - 21:59, 1 October 2012
Online inference model for LDA
By definition, online reference refers to the inference on newly arrived documents after the batch training process

115 bytes (17 words) - 00:02, 5 April 2011
Web pages
This refers to any [[Category::dataset]] comprised of random documents that are available in the World Wide Web and can be accessed through a web

154 bytes (26 words) - 03:58, 30 September 2011
Co-clustering documents and words using bipartite spectral graph partitioning
Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD. ...odeling the document collection]] as a [[Method::bipartite graph]] between documents and words, using which the simultaneous clustering problem can be posed as

1 KB (164 words) - 01:57, 28 March 2011
Document modeling
...of estimating the underlying model using which the document or the set of documents were generated.

124 bytes (20 words) - 21:08, 3 October 2012
20060501.xml dataset
A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble personal journals.

210 bytes (22 words) - 11:26, 3 October 2012
Duplicate Document Detection
...refers to the [[category::problem]] of identifying approximately duplicate documents or strings.

221 bytes (23 words) - 15:29, 28 September 2011
Cyberjournalist.net dataset
A [[category::Dataset]] consisting of blog documents drawn from blogs that resemble newspaper articles, rather than personal blo

245 bytes (27 words) - 11:25, 3 October 2012
Cosine similarity
...nding the cosine similarity between the vectors corresponding to these two documents. Each element of vector A and vector B is generally taken to be tf-idf weig Widely used for calculating the similarity of documents using the bag-of-words and vector space models

1 KB (210 words) - 00:49, 7 February 2011
MPQA Multi-Perspective Question Answering
This corpus contains news articles and other text documents manually annotated for opinions and other private states.

329 bytes (36 words) - 21:25, 26 September 2012
Ontology refinement
...nformation retrieval tasks, such as: query expansion, semantic indexing of documents and search results organization.

326 bytes (37 words) - 15:30, 25 September 2011
Peak Detection
...n entity of interest in a time window ''c'' is compared with the counts of documents containing the entity in the leading ''k'' windows. The entity is said to b

926 bytes (138 words) - 08:52, 2 November 2011
The structure of scientific collaboration
Networks of references between documents such as papers, patents, or court cases.

276 bytes (36 words) - 23:50, 6 February 2011
Expert search
...' aims to automatically find professional specialists from a collection of documents. An example is that we can discover experts in individual areas from scient

414 bytes (60 words) - 15:39, 29 September 2012
Expert Search
...' aims to automatically find professional specialists from a collection of documents. An example is that we can discover experts in individual areas from scient

414 bytes (60 words) - 20:32, 3 October 2012
Leskovec, Backstrom and Kleinberg KDD 09 News and Blog dataset
...with about 1 million documents per day. In total it consist of 90 million documents (blog posts and news articles) from 1.65 million different sites obtained t 30% of the total number of documents in our dataset.

2 KB (281 words) - 18:23, 22 April 2011
CiteSeer
* The CiteSeer dataset contains 1,504 machine learning documents with 2,892 author references to 1,165 author entities.

391 bytes (45 words) - 00:51, 1 April 2011
NTCIR-6 Opinion
...o sentences in the selected documents that are relevant to the topics. The documents that are annotated are separately distributed in a sentence-segmented forma

1 KB (145 words) - 21:38, 26 September 2012
L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006
Documents related to the issue of animal cloning are contains 25 documents. All documents in the same set are

4 KB (534 words) - 18:44, 26 October 2012
Vector space models
...or model) is an algebraic [[Category::Method|model]] for representing text documents (and any objects, in general) as vectors of identifiers, such as, for examp

439 bytes (65 words) - 20:35, 30 September 2012
Kashoob, Caverlee and Ding ICWSM 2009
...ir frequency. This paper seeks to present a better model for understanding documents with associated tag data, using unlabeled data to uncover latent structure ...categories are latent variables, whereas the content and social annotation documents are visible.

5 KB (800 words) - 10:28, 3 October 2012
Bethard cikm2010
Documents are ranked based on their scores. <br> ** TF-IDF between Q and all documents cited D

4 KB (572 words) - 23:08, 2 April 2011
20 Newsgroups
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It was origina

485 bytes (65 words) - 02:19, 27 September 2012
Hyeju Jang et al IRI 2006
...ions of "progress after hospital stay" of Clinical Data Architecture (CDA) documents, which came from Seoul National University Hospital. The data is not public The evaluation was performed on 200 documents for training and 100 documents for test with 3 fold validation. The performance of the system is not high,

2 KB (313 words) - 16:06, 21 October 2010
Pasca, WWW 2007
...of an extensive World Wide Web of facts can be achieved by mining the Web documents. This step has been described in [[RelatedPaper::Pasca et al, AAAI 2006]]. There are some differences in mining queries vs documents. These are:

3 KB (486 words) - 04:20, 22 November 2010
Event detection
...from a stream of time-stamped information. Approaches usually aim to group documents belonging to the same event into a single cluster.

657 bytes (94 words) - 19:42, 30 September 2012
Comparison Rosen-Zvi el al and cohn et al
...hors_and_Documents Rosen-Zvi et al, The Author-Topic Model for Authors and Documents] ...in that they have a common '''big idea''' of being able to cluster similar documents, with using more than just the terms in the document. Both the papers use m

2 KB (334 words) - 17:42, 5 November 2012
This paper demonstrates how each of these methods can divide the structure of large-scale network.
graphs of citations between documents. Using the network of citations between opinions handed down by the

754 bytes (108 words) - 01:22, 7 February 2011
Gabrilovich and Markovitch IJCAI 2007
...ection has 353 pairs of words, and the other collection has 1,225 pairs of documents. Both have human judgments as gold standards.

2 KB (291 words) - 22:30, 30 November 2010
Inferring the Diffusion and Evolution of Topics in Social Communities
...content evolution of the topics, where novel contents are introduced in by documents which adopt the topic. Unlike an explicit user behavior (e.g., buying a DVD ...r task as an joint inference problem, taking into consideration of textual documents, social inﬂuences, and topic evolution in a uniﬁed way. Speciﬁcally,

5 KB (702 words) - 22:42, 5 November 2012
Pagerank
...that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative i

688 bytes (101 words) - 08:06, 4 October 2012
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents * Build a [[UsesMethod:: Topic Model]] which could model the documents generation process by assigning each author with a separate topic mixture c

3 KB (504 words) - 00:13, 1 April 2011
Mao and G. Lebanon, 2006
We examine the problem of predicting local sentiment ﬂow in documents, and its

674 bytes (100 words) - 22:16, 5 November 2012
Link-PLSA-LDA: A new unsupervised model for topics and inﬂuence of blogs
...l derived models, this one is not completely generative due to hyperlinked documents being fixed. ...sets of 1,124 (doesn't explicitly state what happened to the duplicated 68 documents - which could be a potential source of overfitting). The model needs a bipa

5 KB (740 words) - 22:21, 1 December 2012
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
* Identifying topics and common subjects covered by documents. * Identifying authoritative documents on a given topic.

4 KB (610 words) - 17:08, 5 November 2012
Wu and Weld CIKM 2007
...ontain attributes as the positive sample. The rest of the sentences in the documents are used as negative samples.

2 KB (318 words) - 17:18, 5 October 2010
Hyejuj project abstract
...phrases in clinical narrative texts. I am going to use clinical narrative documents written by Korean doctors. The high level concept information which will be ...s such clinical texts automatically in Korea. Semantic tagging on clinical documents will be able to help developing applications which can be useful for doctor

4 KB (637 words) - 04:48, 9 October 2010
"Wu and Weld CIKM 2007"
...ontain attributes as the positive sample. The rest of the sentences in the documents are used as negative samples.

2 KB (330 words) - 14:21, 26 September 2010
Bootstrapping
...the larger seed set; new models can then be trained on the newly labelled documents. ...ery high-precision indicator. Using these seeds, labels can be assigned to documents containing those seeds. If the seeds are balanced across classes, this will

4 KB (667 words) - 02:13, 30 November 2011
Rosen-Zvi et al UAI 2004
The Author-Topic Model for Authors and Documents. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth. In Proc ...atalab.uci.edu/author-topic/398.pdf The Author-Topic Model for Authors and Documents]

2 KB (353 words) - 23:22, 26 September 2012
Cross Document Coreference (CDC)
...eference (CDC) is the task of extracting all the noun phrases from all the documents in a corpus, and clustering them according to the real-world entity that th ..., an additional layer of complexity is introduced: clusters from different documents must also be resolved as describing the same real-world entity or not.

4 KB (521 words) - 02:11, 28 September 2010
Cohn et al, Advances in Neural Information Processing Systems 2001
...ich could jointly model the documents along with the citations between the documents. Both the words and citations in a document are dependent on the topic prop

3 KB (380 words) - 21:01, 28 March 2011
Blei et al, 2002
- N words of documents are shown by <math> w=\{w_1,w_2,...,w_N\}</math> ...ers are estimated using maximum likelihood estimation on a set of training documents. For inference, one approach is to approximate parameter <math> \phi </math

4 KB (616 words) - 16:55, 24 November 2010
Segmented Topic Model
...ew form of topic model which can take into account the inner structures in documents.

733 bytes (112 words) - 15:54, 29 September 2012
Dong et al WWW 2010
...e queries pose a particular problem for search engines because very recent documents may not even be indexed yet, and even if they are indexed, there may be a r #Twitter is likely to contain URLs of uncrawled documents likely to be relevant to recency sensitive queries.

6 KB (944 words) - 10:22, 29 March 2011
Barzilay and Elhadad, 2003
This paper studies the problem of aligning documents at the sentence level when they are on the same topic or are describing the ...tiple components, first clustering paragraphs within-corpus, then aligning documents at the paragraph level (essentially marking candidate sentence-sentence pai

5 KB (807 words) - 08:10, 30 September 2011
Tag Predicting For Stackoverflow
...ce that they are labeled correctly.Use these high-confidence fresh labeled documents as the input and build the feature graph again. This step can be done itera

3 KB (408 words) - 00:25, 16 October 2012
Huang et al, Coling 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
...Taylor and C. Lee Giles. 2010. Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization. In Procee ...essesProblem::Cross Document Coreference (CDC)]] for web-scale coropora of documents, by using document-level categories, sub-document level context and extract

5 KB (658 words) - 15:58, 7 December 2010
Clustering
e.g clustering of similar documents, summarization etc.

1 KB (142 words) - 00:42, 7 February 2011
Github Repo Recommendation:Topic Model meets Code
...n topics from a subset of the documents? If yes, how can we collect sample documents that are representative of the original distribution? ...ccurately model the corpus by modeling it as a collection of collection of documents?

4 KB (592 words) - 10:14, 16 October 2012
Topic Model Approach to Authority Identification
...of [[AddressesProblem::Authority_Identification|identifying authoritative documents]] in a given domain using textual content and report their best performing Authoritative documents are ones which exhibit novel and relevant information relative to a documen

6 KB (961 words) - 08:16, 4 October 2012
Project dong, 10-802 spring 2010
* Diversify search results (return documents written in different perspectives about topics of interest) * Personalize search results (return documents in viewpoint of user)

3 KB (397 words) - 17:01, 1 February 2011
Latent semantic indexing
...s of the <math>m</math> unique terms within a collection of <math>n</math> documents. In a term-document matrix, each term is represented by a row, and each do ...scribes the relative frequency of the term within the entire collection of documents.

5 KB (774 words) - 00:36, 1 December 2010
Comparison: A Latent Variable Model for Geographic Lexical Variation and A probabilistic approach to spatiotemporal theme pattern mining on weblogs
.... Mei et al. aim at finding subtopics in different time and locations from documents that have the same topics. ..., the data set they used are very different. Jacob et al. use twitter type documents, which are very short. Q. Mei use Weblogs, which are relative long.

3 KB (516 words) - 11:12, 6 November 2012
Document representation and query expansion models for blog recommendation
...ral ways: (1) the unit of output (the blog) is composed of a collection of documents (the blog posts) rather than a single document, (2) the query represents an ...tain lot of noise in the form of reader comments, spams unlike traditional documents

9 KB (1,328 words) - 03:49, 6 November 2012
What VS What? Detect Controversial Topics in Online Community
...iven series of Documents d and the number of comments associated with that Documents, note as <math>N(d)</math> ...ment. Specifically given a topic <math>t_{i}</math>, we hope to find those documents that hold a positive sentiment to this topic, define as <math>D_{t_{i}+}</m

4 KB (744 words) - 01:48, 16 October 2012
Das Sarma et. al., Dynamic Relationship and Event Discovery, WSDM 2011
...es are co-bursting if they appear close together in a large number of news documents in the given time period. ...nts in which both entities appear divided by the product of the numbers of documents each entity appears in (i.e. the [[UsesMethod::Pointwise mutual information

11 KB (1,678 words) - 22:58, 2 November 2011
Chang and Blei, AOAS2010
For Network data, such as social networks of friends, citation networks of documents or hyperlinked networks of web pages, people want to point social network m 2. For each pair of documents <math>d</math>,<math>d'</math>:

3 KB (442 words) - 15:40, 31 March 2011
Nallapati Cohen Link PLSA LDA
...ents. Unlike in Link-LDA and Link-PLSA, which only use citations of other documents with respect to topic k in determining the influence of document d', their

3 KB (521 words) - 14:43, 2 October 2012
Mann 2005 Multi-Field Information Extraction and Cross-Document Fusion
...troduces and evaluates methods for fusing the extracted information across documents to return a consensus answer. It could be applied together with cross-docum ...proach to combine the attribute values extracted for one person across the documents. Two alternatives are considered, one is to pick the most probable value, t

3 KB (514 words) - 01:09, 1 December 2010
Compare Yano et al NAACL 2009 Link PLSA LDA
The biggest difference is that this models the text of the cited documents as well. It is worth noting that the same priors <math>\Omega</math> and <m ...f links off of the words expressed in the original document and the linked documents (either comment on a blog post, or linked blog) can help in this task.

5 KB (895 words) - 22:20, 1 December 2012
Chambers and Jurafsky, Unsupervised Learning of Narrative Event Chains, ACL 2008
...The purpose of this paper is to learn such "scripts" from a collection of documents automatically. The experiment is conducted on documents from the [[UsesDataset::Gigaword corpus]]. The temporal classifier is train

8 KB (1,180 words) - 01:38, 29 November 2011
BinLu et al. ACL2011
...Current translation algorithms can barely give meaningful translation for documents, and parallel corpus on document level is also rare. * Paper:Text classification from labeled and unlabeled documents using EM.:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&c

5 KB (716 words) - 22:30, 26 September 2012
Connections between the Lines: Augmenting Social Networks with Text
...enes, proteins, and diseases that have been manually labelled as entities. Documents are individual abstracts, and co-occurrences of entities in an abstract cre

4 KB (606 words) - 10:25, 27 September 2012
Class Meeting for 10-710 12-01-2011
....com/papers/emcat-mlj99.pdf Text Classification from Labeled and Unlabeled Documents using EM], K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, ML 2000

2 KB (255 words) - 15:20, 1 December 2011
Bollen 2011 vs Mishne 2006
...sis, which is to assign a sentiment to each document. In this problem, the documents in the corpora are gathered and the mood is determined over each aggregate ...odels to see how well they predict a given mood for a given time series of documents. It could be said that perhaps Bollen provides a better summary overview an

4 KB (607 words) - 03:17, 6 November 2012
Leskovec et al KDD 09
...llion documents per day, amounting to over 90 million articles as a whole. Documents come form both major news websites, as well as blogs, and the total size of

4 KB (623 words) - 14:08, 1 October 2012
Daly et al Social Lense: Personalization Around User Defined Collections for Filtering Enterprise Message Streams ICWSM 2011
...ne similarity]] of the [[UsesMethod::vector space models|TF-IDF weighted]] documents representing the people. ...very specific and information-rich environment, where links between users, documents, and communities are explicit and there are no concerns about identifying t

4 KB (633 words) - 01:13, 2 October 2012
E. Minkov et al.
...iple document repetition (MDR): mark repeated tokens appearing in multiple documents as a name.

1 KB (216 words) - 16:52, 8 October 2010
Buza et al Scalable Event-based Clustering of Social Media via Record Linkage Techniques ICWSM 2011
...a sliding window of size <math>n</math> on a temporally ordered set of the documents to generate candidate pairs. ...er is also very useful. However, while the paper mentions that some of the documents are missing fields, there is no exact statistics. Also, there is no discuss

4 KB (632 words) - 05:03, 4 October 2012
Borkar, SIGMOD 2001
...earners which makes them reach their maximum accuracy with small number of documents.

2 KB (246 words) - 13:13, 22 September 2010
Andreevskaia et al., ICWSM 2007
...significant difference in classifying sentiment for the two genres of blog documents, but the ternary task is more difficult than the binary task. ** Work dealing with extracting sentiment from web documents where valence shifting terms are taken into account.

4 KB (540 words) - 11:30, 3 October 2012
Blei et al Latent Dirichlet Allocation
...otably the topic node is sampled repeatedly within a document. This allows documents to be associated with multiple topics rather than just one. ...rical Bayes method for parameter estimation is provided. Given a corpus of documents D, we wish to ﬁnd parameters <math>\alpha</math> and <math>\beta</math> t

6 KB (962 words) - 20:57, 3 October 2012
Topics over Time
...rts of the document are discussing different time periods. However, common documents typically have only one time stamp per document. Therefore, an alternative ...(1) first fitting a time-unaware topic model on data and then ordering the documents in time, or (2) divides data into discrete time slices and fits a separate

5 KB (738 words) - 00:08, 28 November 2011
E.A. Leicht, Structure of Time Evo citation networks 2007
Suppose there are '''''n''''' vertices representing documents in a network, it can be divided into '''''c''''' groups. Then a log-likelih ...n the Scientific Literature: A New Measure of the Relationship Between Two Documents. mall, Henry. s.l.]] [http://onlinelibrary.wiley.com/doi/10.1002/asi.463024

4 KB (674 words) - 01:59, 7 February 2011
Class meeting for 10-605 Rocchio and Hadoop Workflows
* The TFIDF representation for documents.

3 KB (350 words) - 16:16, 14 October 2015
Singh et al., ACL 2011
...e Singh was a Google intern - we're talking about ''really'' large sets of documents).

4 KB (706 words) - 00:51, 30 November 2011
Weld et al SIGMOD 2009
...ages using heuristics. First a heuristic document classifier will classify documents into classes, then sentence classifier ([[UsesMethod::Maximum Entropy model

2 KB (294 words) - 12:45, 29 September 2011
Borkar et al, SIGMOD 2001
...earners which makes them reach their maximum accuracy with small number of documents.

2 KB (295 words) - 14:09, 22 October 2010
E. Minkov et al. HLT/EMNLP 2005
...iple document repetition (MDR): mark repeated tokens appearing in multiple documents as a name.

2 KB (276 words) - 15:48, 23 October 2010
Haghighi and Klein, ACL 2006: Prototype-Driven Learning for Sequence Models
...like conventional semi-supervised learning where a portion of the training documents are fully labeled, in prototype-driven learning, a list of "prototype words

5 KB (694 words) - 16:00, 18 September 2011
Attribute Extraction
...te and relative ordering of where the attribute values typically appear in documents.

2 KB (299 words) - 20:29, 30 November 2010
Maximization of the benefit function known as "modularity"
network to be a sign of connection between documents,

3 KB (414 words) - 02:04, 7 February 2011
Class meeting for 10-605 Workflows For Hadoop
* The TFIDF representation for documents.

3 KB (434 words) - 12:37, 19 September 2017
J. Artiles et al. EMNLP 2009
a classification problem such that each pair of documents will be classified as coreferent

2 KB (344 words) - 05:47, 23 November 2010
Leskovec, J., L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the Dynamics of the News Cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 497–506.
...with about 1 million documents per day. In total it consist of 90 million documents (blog posts and news articles) from 1.65 million different sites obtained t

6 KB (923 words) - 18:21, 22 April 2011
Turney, 2002
...ting polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-s

2 KB (317 words) - 12:47, 27 October 2010
Cohen et al IJCAI 2005
...ting polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-s

2 KB (317 words) - 16:39, 29 September 2010
Borkar et al SIGMOD 2001
...ting polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-s

2 KB (317 words) - 16:39, 29 September 2010
Adar, E. et al, WSDM 2009
...languages and they are updated very fast, which means not all the parallel documents are likely to be well updated. The system uses additive [[UsesMethod::Logis Given a set of parallel, multilingual documents and a document to be modified, a set of potential infobox classes is guesse

5 KB (787 words) - 13:14, 30 September 2011
Jansche and Abney ACL 2002
...ting polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-s

2 KB (323 words) - 19:51, 29 September 2010
Class Meeting for 10-802 02/24/2011
* Efron, M. 2004. Cultural orientation: Classifying subjective documents by cociation analysis. In AAAI Fall Symposium on Style and Meaning in Langu

2 KB (326 words) - 22:21, 31 March 2011
Reisinger et al 2010: Spherical Topic Models
...esProblem::Topic model]]ing. The highlight of this paper is that it models documents as data points in high-dimensional spherical manifold. Like cosine similari ...enario where topic proportion <math>\theta = [1/3,1/3,1/3]</math>, the two documents are equivalent. In contrast, vMF would compute different cosine distances.

10 KB (1,516 words) - 18:11, 29 November 2011
Adaptive Real-time Filtering in Twitter
...s. Also available from a previous project is a web crawl of 1 million HTML documents that were linked from tweets.

2 KB (384 words) - 14:53, 15 October 2012
The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email
* [[RelatedPaper::Rosen-Zvi et al, The Author-Topic Model for Authors and Documents]] proposes the Author-Topic model, which this paper expands upon.

3 KB (449 words) - 00:01, 6 November 2012
Cohen and Carvalho, 2005
...hnique on additional segmentation tasks such as classifying lines from FAQ documents, video segmentation, etc.

3 KB (410 words) - 18:33, 1 February 2011
Visualization of Social Net
...e. Thus the interface should be able to provide evidence (the links to the documents) and serve as a entity-based document browser.

3 KB (484 words) - 23:54, 23 October 2010
Tackstrom and McDonald, ECIR 2011. Discovering fine-grained sentiment with latent variable structured prediction models
Based on the observations about positive and negative reviews in documents, the authors model sentence level classifications as: ...ibution of sentence labels per category and distributions of labels in the documents respectively.

7 KB (1,050 words) - 01:12, 29 November 2011
Yang et al, SIGIR 98
...d reweighting similarity scores according to the temporal proximity of two documents.

3 KB (482 words) - 00:01, 1 October 2012
Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations
...ions. <math> Hits(S) </math> for string <math> S </math> denotes number of documents returned from Google when <math> S </math> is queried. <math> C_v </math> i

3 KB (474 words) - 06:45, 6 November 2012
Final-project-review
...ant that readers not be distracted by sloppy writing or confusing English. Documents should be spell checked and carefully proofed before being submitted. (1=ne

3 KB (508 words) - 14:27, 26 April 2013
Yang et al 2007 Fusion approach to finding opinions in blogosphere
...ion score generating modules in use and they each produce a ranked list of documents and the four opinion detection modules are:

3 KB (533 words) - 05:02, 4 October 2012
David M. Blei and Pedro J. Moreno, Topic Segmentation with an Aspect Hidden Markov Model, SIGIR 2001
...the paper lies on its addition of aspect model to HMM model for segmenting documents. It removes HMM naive assumption that words are generated independently giv ...of this paper is its addition of aspect model to HMM model for segmenting documents. Other earlier works for topic segmentation include:

8 KB (1,332 words) - 00:14, 29 March 2011
Class Meeting for 10-802 10/23/2012
* Efron, M. 2004. Cultural orientation: Classifying subjective documents by cociation analysis. In AAAI Fall Symposium on Style and Meaning in Langu

3 KB (459 words) - 12:38, 25 October 2012
Popular Event Tracking
...ath> is a stream of document collections. <math>D_k</math> is a the set of documents published between time <math>t_{k-1}</math> and <math>t_k</math>. <math>D_k

4 KB (687 words) - 15:15, 4 February 2011
Structured Models for Fine-to-Coarse Sentiment Analysis
*The paper uses product reviews dataset which tends to have small documents. It would be helpful to see model performance on large text corpora.

4 KB (515 words) - 11:06, 6 November 2012
Turney, ACL 2002
...ting polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-s

4 KB (577 words) - 17:22, 30 January 2014
Automated Template Extraction
...generally represent important information to pull from a subset of all the documents. The intuition we're following is that, generally, the information we're se

4 KB (707 words) - 22:45, 6 October 2011
Mining of Political Issues
...s - [[Politics.com dataset]] is one, but it's not an easy dataset, and the documents are not really comments.

4 KB (566 words) - 14:53, 10 October 2012
Pattern Matching over Annotations
...h can easily be queried. More often, however, it is stored in unstructured documents which can be decorated by external NLP tools. These decorations are then st

4 KB (645 words) - 08:37, 30 November 2011
Rahman and Ng, ACL 2011
...et. These two datasets are annotated differently, so they used only those documents that were common to both, so they could evaluate with both sets of annotati

4 KB (684 words) - 23:48, 29 September 2011
Compare Hassan et al, ICWSM 2009 and Document representation and query expansion models for blog recommendation
Hassan et al. were trying to target the problem of ranking documents in a set based on their similarity to identify the representative blogs in

4 KB (674 words) - 06:12, 6 November 2012
Generalized Expectation Criteria
* Flexible supervision -- things like "I prefer 70% or more of the documents containing the word 'ice' would be about 'icehockey' instead of 'baseball'"

5 KB (794 words) - 16:50, 2 November 2011
Online Inference of Topics with Latent Dirichlet Allocation
...(2005) proposed a new algorithm to have online inference on newly arrived documents. First, they apply batch Gibbs sampler on part of the full dataset, then sa

4 KB (736 words) - 02:40, 3 November 2011
Lin et al KDD 2011
...ath> is a stream of document collections. <math>D_k</math> is a the set of documents published between time <math>t_{k-1}</math> and <math>t_k</math>. <math>D_k

5 KB (794 words) - 23:01, 3 February 2011
Project Abstract - Bo, Kevin, Rushin
...s classifier and cluster all the chains that we have gathered from all the documents in the corpus.

4 KB (675 words) - 18:19, 1 February 2011
Determining term subjectivity and term orientation for opinion mining.
...his is believed to be of key importance for identifying the orientation of documents, i.e. determining whether a document expresses a positive or negative opini

6 KB (807 words) - 22:56, 3 November 2012
Mixed membership models of scientiﬁc publication
...ixed membership models, this paper specifically focuses on the modeling of documents. In this scenario, mixed membership basically means soft classification for

5 KB (754 words) - 01:58, 6 November 2012
Hall et, EMNLP2008
...he Dynamic Topic Model (Blei and Lafferty, 2006), representing each years' documents as generated from a normal distribution centroid over topics, with the foll

5 KB (707 words) - 02:47, 5 February 2011
Xufei Wang, ICDM, 2010
The author uses Co-Clustering method in [[RelatedPaper::Co-clustering documents and words using bipartite spectral graph partitioning]] as a comparison to

5 KB (726 words) - 00:28, 4 April 2011
Detection of Ad Hominem attacks in blog and review data
* Collection of 30,771 blog documents from blogs discussing evolution and anti-evolution. (Unlabeled)

5 KB (700 words) - 16:39, 3 November 2012
Project Abstract - Rushin, Kevin, Bo
...s classifier and cluster all the chains that we have gathered from all the documents in the corpus.

5 KB (739 words) - 18:19, 1 February 2011
Krishnan 2006 an effective two stage model for exploiting non local dependencies in named entity recognition
...r reduction over the baseline. Incorporating non-local dependencies across documents (at corpus level) as well achieved 13.3% relative error reduction. Also, de

5 KB (813 words) - 10:28, 29 September 2011
Weng et al WSDM 10
...re distilled from the twitter text. The topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a use

6 KB (895 words) - 09:09, 4 October 2012
Detecting Topic Evolution in Scientiﬁc Literature: How Can Citations Help?
...that topic evolution has purely modeled based on bag-of-word assumption of documents at different timestamps, but an important factor for evolution analysis: th

5 KB (780 words) - 10:54, 6 November 2012
Ling and He Joint Sentiment Topic Model for Sentiment Analysis
...topic-document distribution <math>\theta</math> which accounts for all the documents in the corpus.

6 KB (903 words) - 23:57, 3 October 2012
Davidov et al COLING 10
...han the tweets to validate the usage of the semantic labels in other text documents.

6 KB (816 words) - 09:54, 4 October 2012
Huang et al, ACL 2009: Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering
...on names from the US Census data, respectively. For each name, the top 100 documents retrieved from the Yahoo! Search API are used.

5 KB (765 words) - 01:45, 1 December 2010
Bikel et al MLJ 1999
...[[UsesDataset::MUC|MUC-6]] dataset, a collection of 30 Wall Street Journal documents. The authors compared the performance of their model in comparison with the

6 KB (898 words) - 18:59, 13 October 2011
Grenager et al, ACL 2005: Unsupervised Learning of Field Segmentation Models for Information Extraction
...sults. This is because of the existence of multiple levels of structure in documents: the desired field structure, as well as lower-level POS structure. Unconst

6 KB (798 words) - 01:15, 18 September 2011
Proposal 2nd Draft Nitin Yandong Ming Yanbo
...STEYVERS, M. and SMITH, P. (2004). The author-topic model for authors and documents. In AUAI’04: Proceedings of the 20th Conference on Uncertainty in Artific

6 KB (943 words) - 23:22, 15 February 2011
Miller et al ICWSM 2011
Each of the web documents has been treated as a bag-of-word model. [http://www.wjh.harvard.edu/~inqui

6 KB (1,000 words) - 08:49, 27 September 2012
Blog summarization: CIKM 2007
...elin and J. Keklinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, Athens, Greece, 2000.

7 KB (1,193 words) - 15:27, 31 March 2011
Analyzing User Tweets around foursquare checkins
...s we considered a larger time interval of around 3 hours. Hence there were documents which contained tweets surrounding a “checkin” for each top level fours

7 KB (1,097 words) - 23:02, 11 January 2013
Project Midterm Status Report - Rushin, Kevin, Bo
..., they are not only faster to calculate but also more robust to ill-formed documents. We therefore chose to implement a subsequence-based kernel for relation ex

7 KB (1,032 words) - 12:04, 11 November 2010
ToWikify
...red_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3

12 KB (1,642 words) - 17:02, 30 November 2012
Role of informatics in promoting patient-centered care
...use paper-based health records and thus are prone to lost files, illegible documents as well as other mishaps. Some tools for mediating the risk of error includ

8 KB (1,266 words) - 17:19, 3 October 2012
Miray Dongyang Niting project proposal
...ing automated extraction and grouping of citations for academic/scientific documents. While previous citation extraction was a manual process, citation measures ...n the Scientific Literature: A New Measure of the Relationship Between Two Documents. Small, Henry. s.l. : Journal of the American Society for Information Scien

15 KB (2,315 words) - 00:18, 15 February 2011
Dave et. al., WWW 2003
The system uses various approaches to obtain features from the given documents and scoring the features. They also experiment with training various machin

8 KB (1,205 words) - 02:09, 4 October 2012
Syllabus for Analysis of Social Media 10-802 in Spring 2011
...- [[User:nitina | Nitin Agarwal]] - The Author-Topic Model for Authors and Documents

9 KB (1,053 words) - 11:00, 19 April 2011
The viability of web-derived Polarity Lexicons
...df Building lexicon for sentiment analysis from massive collection of HTML documents]. In Proceedings of the Joint Conference on Empirical Methods in Natural La

8 KB (1,211 words) - 10:00, 4 October 2012
Projects for Machine Learning with Large Datasets 10-605 in Spring 2012
...se the structure of a hierarchy of labels to improve the classification of documents (or anything else) into that hierarchy? There are many approaches to this

9 KB (1,458 words) - 18:09, 19 April 2012
Extracting Opinion Expressions with semi-Markov Conditional Random Fields
...11,114 sentences with 55.89% sentences with DSEs and 57.93% with ESEs. 135 documents are used for training and 400 are used for testing.

9 KB (1,307 words) - 20:21, 3 October 2012
Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described b

10 KB (1,514 words) - 20:22, 3 October 2012
Talukdar 2006 a context pattern induction method for named entity extraction
Authors used 18 billion tokens (31 million documents) of news data as the source of unlabeled data. They experimented with 500 a

10 KB (1,656 words) - 19:21, 30 November 2011
Stylistic Structure in Historic Legal Text
...c information. Thus, we propose a latent variable model for modeling legal documents: * Rosen-Zvi et al., "The author-topic model for authors and documents", UAI 2001.

19 KB (3,063 words) - 19:54, 5 December 2011
Machine Learning 10-601 in Fall 2013
...spotting high-risk medical patients, recognizing speech, classifying text documents, detecting credit card fraud, or driving autonomous robots.

9 KB (1,409 words) - 17:24, 6 January 2016
Controversial events detection
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as w

11 KB (1,726 words) - 02:12, 16 October 2012
Machine Learning 10-601 in Spring 2016
...spotting high-risk medical patients, recognizing speech, classifying text documents, detecting credit card fraud, or driving autonomous robots.

11 KB (1,783 words) - 21:13, 5 September 2016
Machine Learning 10-601 in Fall 2014
...spotting high-risk medical patients, recognizing speech, classifying text documents, detecting credit card fraud, or driving autonomous robots.

11 KB (1,700 words) - 20:45, 18 November 2014
Project Anuj Dani Somanchi
...broadly similar to [1]. An efficient method to perform random-walk between documents based on tf-idf similarity is presented in [13].

15 KB (2,240 words) - 23:45, 14 February 2011
Guinea Pig
File "/Users/wcohen/Documents/code/GuineaPig/tutorial/guineapig.py", line 69, in getArgvParams

65 KB (10,376 words) - 13:01, 14 September 2017

Search results

Page title matches

Page text matches

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools