Difference between revisions of "Guinea Pig"

Revision as of 15:16, 9 May 2014

Quick Start

Running wordcount.py

Set up a directory that contains the file gp.py and a second script called wordcount.py which contains this code:

# always start like this
from gp import *
import sys

# supporting routines can go here
def tokens(line):
    for tok in line.split():
        yield tok.lower()

#always subclass Planner
class WordCount(Planner):

    wc = ReadLines('corpus.txt') | FlattenBy(by=tokens) | Group(by=lambda x:x, reducingWith=ReduceToCount())

# always end like this
if __name__ == "__main__":
    WordCount().main(sys.argv)

Then type the command:

% python tutorial/wordcount.py --store wc

After a couple of seconds it will return, and you can see the wordcounts with

% head wc.gp

Understanding the wordcount example

There's also a less concise but easier-to-explain wordcount file, longer-wordcount.py, with this view definition:

class WordCount(Planner):
    lines = ReadLines('corpus.txt')
    words = Flatten(lines,by=tokens)
    wordGroups = Group(words, by=lambda x:x)
    wordCount = Group(words, by=lambda x:x, reducingTo=ReduceToCount())

If you type

% python longer-wordcount.py

you'll get a brief usage message:

usage: --[store|pprint|plan|cat] view [--echo] [--target hadoop] [--reuse foo.gp bar.gp ...] [other options]
       --list

Typing

% python longer-wordcount.py --list

will list the views that are defined in the file: lines, words, wordGroups, and wordCount If you pprint one of these, say wordCount, you can see what it essentially is: basically, a Python data structure, with several named subparts (like words)

wordCount = Group(words,by=<function <lambda> at 0x10497aa28>,reducingTo=<guineapig.ReduceToCount object at 0x104979190>)
| words = Flatten(lines, by=<function tokens at 0x1048965f0>).opts(cached=True)
| | lines = ReadLines("corpus.txt")

GuineaPig can convert one of these view structures into a plan for storing the view. To see a plan, you can type something like

% python longer-wordcount.py --plan wordCount

If you typed

% python longer-wordcount.py --plan wordCount | sh

this would equivalent to python longer-wordcount.py --store wordCount, modulo some details about how errors are reported.

There's also a less concise but easier-to-explain wordcount file, longer-wordcount.py

class WordCount(Planner):
    lines = ReadLines('corpus.txt')
    words = Flatten(lines,by=tokens)
    wordGroups = Group(words, by=lambda x:x)
    wordCount = Group(words, by=lambda x:x, reducingTo=ReduceToCount())

If you type

% python longer-wordcount.py

you'll get a brief usage message:

usage: --[store|pprint|plan|cat] view [--echo] [--target hadoop] [--reuse foo.gp bar.gp ...] [other options]
       --list

Typing

% python longer-wordcount.py --list

will list the views that are defined in the file: lines, words, wordGroups, and wordCount If you pprint one of these, say wordCount you can see what it essentially is: basically, a Python data structure, with several named subparts (like words)

wordCount = Group(words,by=<function <lambda> at 0x10497aa28>,reducingTo=<guineapig.ReduceToCount object at 0x104979190>)
| words = Flatten(lines, by=<function tokens at 0x1048965f0>).opts(cached=True)
| | lines = ReadLines("corpus.txt")

GuineaPig can convert one of these view structures into a plan for storing the view. To see a plan, you can type something like

% python longer-wordcount.py --plan wordCount

If you typed

% python longer-wordcount.py --plan wordCount | sh

this would equivalent to python longer-wordcount.py --store wordCount, modulo some details about how errors are reported.

@@ Line 68: / Line 68: @@
 file: <code>lines</code>, <code>words</code>, <code>wordGroups</code>,
 and <code>wordCount</code> If you <code>pprint</code> one of these,
-say <code>wordCount</code> you can see what it essentially is:
+say <code>wordCount</code>, you can see what it essentially is:
 basically, a Python data structure, with several named subparts
 (like <code>words</code>)

Difference between revisions of "Guinea Pig"

Revision as of 15:16, 9 May 2014

Quick Start

Running wordcount.py

Understanding the wordcount example

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools