Search results

From Cohen Courses
Jump to navigationJump to search
  • ...The authors' goals are to create a technique scalable to large volumes of data generated by social media outlets and to create clusters that are meaningfu ...by TF-IDF weighted [[UsesMethod::vector space models]]. When features were missing (e.g. not all images include a description), the metric is set to 0.
    4 KB (632 words) - 04:03, 4 October 2012
  • ..., S. Olivier,S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399–403, 2002]] *A social network dataset found [http://ai.stanford.edu/~gal/data.html here]
    3 KB (541 words) - 12:34, 5 November 2012
  • ...to handle non-projective dependencies (about 8% of the sentences in their data - the Penn treebank - had at least one non-projective link), by adding labe ...sing a fraction of the time. The Stanford dependency parser was noticeably missing from their presentation, however, and it wasn't clear to me why that was. T
    3 KB (550 words) - 13:19, 29 November 2011
  • ...-_A_Probabilistic_Model_of_Document_Content_and_Hypertext_Connectivity The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity ...h using more than just the terms in the document. Both the papers use meta-data for their topic models. In the first case, it is hyperlinks and in the seco
    2 KB (334 words) - 16:42, 5 November 2012
  • Topic discovery in text data is an important machine learning problem. Several methods have been used p == Data Used ==
    3 KB (521 words) - 13:43, 2 October 2012
  • Kerstin Denecke and Jochen Bernauer. 2007. Extracting Specific Medical Data Using Semantic Structures. Artificial Intelligence in Medicine, LNCS 2007 V .... The main reasons of error were unknown words, wrong paragraph detection, missing trigger words, and processing needs additional knowledge.
    2 KB (321 words) - 15:06, 21 October 2010
  • This problem is great, but I've had previous students work with youtube data and regret it intensely. It's very noisy and hard to distill any real sign ...one nice example. I'd look over this line of work, and see if some of her data is available. --[[User:Wcohen|Wcohen]] 14:38, 10 October 2012 (UTC)
    4 KB (594 words) - 09:39, 10 October 2012
  • === Training Data Creation === The labeled training data are used to generate extractors for as many attributes as possible. In this
    4 KB (569 words) - 16:38, 30 September 2011
  • ...infoboxes are manually created & maintained, several articles have either missing or outdated information (that only got revised in the plain text). We also * Large scale data
    4 KB (604 words) - 02:59, 24 October 2011
  • ...the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD’05), 177--187. ...due to (i) possible sampling problems, (ii) disconnected components, (iii) missing past effects (the nodes current links in the graph refer to but not include
    3 KB (388 words) - 09:56, 7 February 2011
  • ...that is either directly provided by human expert or estimated from labeled data. ...the predicted distribution and the target distribution over the unlabeled data <math>\tilde\mathcal{X}</math>
    5 KB (794 words) - 15:50, 2 November 2011
  • ..., Proceedings of the Second ACM International Conference on Web Search and Data Mining, February 09-12, 2009, Barcelona, Spain. ...pancies across the parallel pages written in other languages, and fills in missing information. This way of extracting new information is particularly useful
    5 KB (787 words) - 12:14, 30 September 2011
  • Anderson et al have used [[UsesDataset::Stack Overflow|Stack Overflow Data ]] for the study. Liu et al have used [[UsesDataset::Click_Dataset_Google_Y ...the first paper, and then go through the original first paper to fill the missing points. The original paper being a case study of Stack Overflow was quite l
    4 KB (698 words) - 07:51, 6 November 2012
  • ...model. Hence this process doesn't require any manual labeling of training data. They have also used interesting "patterns based" features that are languag ...ls in summary. But,a good discussion on pros and cons of the approach were missing.
    4 KB (625 words) - 09:45, 6 November 2012
  • ...he text. This new interpretation allows one to process large heterogeneous data sources (different writing styles and/or topics) and does not require to be ...consistent (which may not apply if the constraints are derived from other data than the training, or are externally imposed).
    7 KB (1,216 words) - 13:08, 27 September 2011
  • title = {The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity S. Dzeroski and N. Lavrac, editors, Relational Data Mining. Springer-Verlag, 2001.
    4 KB (610 words) - 16:08, 5 November 2012
  • ...decomposition, the authors are able, by reconstructing the tensor, to fill missing entries on it, thus performing the review rating prediction. ...pabo/movie-review-data/ http://www.cs.cornell.edu/people/pabo/movie-review-data/]
    5 KB (740 words) - 04:34, 27 September 2012
  • ...ocial structures from even a small amount of spatio-temporal co-occurrence data. Given the observed distribution in Flickr data of the probability of friendship over number of co-occurrences ''k'', cell
    10 KB (1,588 words) - 21:49, 26 March 2011
  • ..." means citations with poor author-title boundary (e.g. with a punctuation missing after the author's last name and title's first word).
    8 KB (1,246 words) - 05:37, 7 December 2011
  • ..._WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || ...sen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
    12 KB (1,642 words) - 16:02, 30 November 2012

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)