Blasha

A blog...by masha.

Mechanize-d

The Mechanize library is built on top of the Nokogiri library, adding additional functionality for interacting with a website through an app. Our team used mechanize in order to scrape the number of badges a student earned on team treehouse for our project. Mechanize is used in cases where you need to interact with forms and links.

Using Mechanize

1
require 'mechanize'

instantiate a mechanize object:

1
agent = Mechanize.new

fetch a page:

1
page = agent.get("http://www.teamtreehouse.com/login")

Mechanize returns all of the page data including links and forms

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
#<Mechanize::Page
 {url #<URI::HTTPS:0x007fec31b944a0 URL:https://teamtreehouse.com/login>}
 {meta_refresh}
 {title "Sign In"}
 {iframes}
 {frames}
 {links
  #<Mechanize::Page::Link "\n          \n" "https://teamtreehouse.com/">
  #<Mechanize::Page::Link
   "\n        Library\n"
   "https://teamtreehouse.com/library">
  #<Mechanize::Page::Link "\n        Forum\n" "/forum">
  #<Mechanize::Page::Link
   "\n        Plans & Pricing\n"
   "https://teamtreehouse.com/subscribe/plans">
  #<Mechanize::Page::Link
   "\n          Help"
   "https://teamtreehouse.com/contact">
  #<Mechanize::Page::Link "Search" nil>
  #<Mechanize::Page::Link "I forgot my password" "/password_resets">
  #<Mechanize::Page::Link "Our Company →" "/about">
  #<Mechanize::Page::Link "About" "/about">
  #<Mechanize::Page::Link "Forum" "/forum">
  #<Mechanize::Page::Link "Stories" "/stories">
  #<Mechanize::Page::Link "Blog" "http://blog.teamtreehouse.com">
  #<Mechanize::Page::Link
   "Affiliate Program"
   "http://blog.teamtreehouse.com/treehouse-affiliate-program">
  #<Mechanize::Page::Link "Treehouse Shop" "http://shop.teamtreehouse.com">
  #<Mechanize::Page::Link "Jobs" "/jobs">
  #<Mechanize::Page::Link "Privacy Policy" "/privacy">
  #<Mechanize::Page::Link "Terms & Conditions" "/terms">
  #<Mechanize::Page::Link "Browse full library →" "/library">
  #<Mechanize::Page::Link "Websites" "/library/websites">
  #<Mechanize::Page::Link "Programming" "/library/programming">
  #<Mechanize::Page::Link "Business" "/library/business">
  #<Mechanize::Page::Link "iOS Development" "/library/ios-development">
  #<Mechanize::Page::Link "Android Development" "/library/android-development">
  #<Mechanize::Page::Link
   "WordPress"
   "/library/websites/how-to-make-a-wordpress-blog">
  #<Mechanize::Page::Link "Learning Adventures" "/learning-adventures">
  #<Mechanize::Page::Link "Bonus Content" "/library/bonus-content">
  #<Mechanize::Page::Link
   "Learning Adventures →"
   "/library#learning-adventures">
  #<Mechanize::Page::Link
   "Become a Web Designer"
   "/learning-adventures/become-a-web-designer">
  #<Mechanize::Page::Link
   "Become a Web Developer"
   "/learning-adventures/become-a-web-developer">
  #<Mechanize::Page::Link
   "Learn HTML and CSS"
   "/learning-adventures/learn-html-and-css">
  #<Mechanize::Page::Link
   "Start a Business"
   "/learning-adventures/start-a-business">
  #<Mechanize::Page::Link
   "Learn Ruby on Rails"
   "/learning-adventures/learn-ruby-on-rails">
  #<Mechanize::Page::Link
   "Learn to Build iPhone Apps"
   "/learning-adventures/learn-to-build-iphone-apps">
  #<Mechanize::Page::Link
   "Learn to Build Android Apps"
   "/learning-adventures/learn-to-build-android-apps">
  #<Mechanize::Page::Link
   "Become a Mobile Developer"
   "/learning-adventures/become-a-mobile-developer">
  #<Mechanize::Page::Link "Email us" "/contact">
  #<Mechanize::Page::Link
   "\n            \n            Twitter\n          "
   "http://twitter.com/treehouse">
  #<Mechanize::Page::Link
   "\n            \n            Youtube\n          "
   "http://youtube.com/user/gotreehouse">
  #<Mechanize::Page::Link
   "\n            \n            Twitter\n          "
   "http://facebook.com/teamtreehouse">
  #<Mechanize::Page::Link
   "\n            \n            Google Plus\n          "
   "http://plus.google.com/110278003536476194286/posts">
  #<Mechanize::Page::Link
   "\n            \n            Linked\n          "
   "http://linkedin.com/company/treehouse-island-inc-">
  #<Mechanize::Page::Link "" "#">
  #<Mechanize::Page::Link
   "help@teamtreehouse.com"
   "mailto:help@teamtreehouse.com">
  #<Mechanize::Page::Link "Forum" "/forum">}
 {forms
  #<Mechanize::Form
   {name nil}
   {method "GET"}
   {action "/library/search"}
   {fields [text:0x3ff6194a1394 type: text name: q value: ]}
   {radiobuttons}
   {checkboxes}
   {file_uploads}
   {buttons [button:0x3ff6194a2708 type:  name: search value: ]}>
  #<Mechanize::Form
   {name nil}
   {method "POST"}
   {action "https://teamtreehouse.com/person_session"}
   {fields
    [hidden:0x3ff6194a4e40 type: hidden name: utf8 value: ✓]
    [hidden:0x3ff6194a4990 type: hidden name: authenticity_token value: jn2fqPxlHgEpz2UDt70qONf0cKp67zQ6huGugPsfKHA=]
    [field:0x3ff6194a438c type: email name: user_session[email] value: ]
    [field:0x3ff6194a7ce4 type: password name: user_session[password] value: ]}
   {radiobuttons}
   {checkboxes}
   {file_uploads}
   {buttons [button:0x3ff6194a9f80 type: submit name:  value: ]}>
  #<Mechanize::Form
   {name nil}
   {method "POST"}
   {action "/contact"}
   {fields
    [hidden:0x3ff6194abe98 type: hidden name: utf8 value: ✓]
    [hidden:0x3ff6194ab678 type: hidden name: authenticity_token value: hrJlRQyZuuW423dPn4VYW+IgKhZrWqdbFvTC43WmG2U=]
    [text:0x3ff6194ab3bc type: text name: contact_form[name] value: ]
    [text:0x3ff6194ab100 type: text name: contact_form[email] value: ]
    [text:0x3ff6194aaebc type: text name: contact_form[email_confirmation] value: ]
    [text:0x3ff6194aac00 type: text name: contact_form[phone_number] value: ]
    [textarea:0x3ff6194ae10c type:  name: contact_form[message] value: ]}
   {radiobuttons}
   {checkboxes}
   {file_uploads}
   {buttons}>}>

Lets take a look at the sign-in form:

1
form = agent.page.forms[1]

This returns the second form object:

1
2
3
4
5
6
7
8
9
10
11
12
13
#<Mechanize::Form
 {name nil}
 {method "POST"}
 {action "https://teamtreehouse.com/person_session"}
 {fields
  [hidden:0x3ff6194a4e40 type: hidden name: utf8 value: ✓]
  [hidden:0x3ff6194a4990 type: hidden name: authenticity_token value: jn2fqPxlHgEpz2UDt70qONf0cKp67zQ6huGugPsfKHA=]
  [field:0x3ff6194a438c type: email name: user_session[email] value: ]
  [field:0x3ff6194a7ce4 type: password name: user_session[password] value: ]}
 {radiobuttons}
 {checkboxes}
 {file_uploads}
 {buttons [button:0x3ff6194a9f80 type: submit name:  value: ]}>

Mechanize actually allows you to fill in the form and press the submit button to log-in:

1
2
3
form.fields[2].value = "login info"
form.fields[3].value = "password"
form.submit

Now we’re in, well assuming you specified your own account login/password, and can begin scraping the site using Nokogiri!

If you wanted to say, click on a link from the homepage you could search for the link text:

1
agent.page.link_with(:text => "About").click

If there was more than one link with the text “About”, you could use the plural form of link and obtain a list of links, then choosing which one you want:

1
agent.page.links_with(:text => "About")[0].click

There are a plethora of other ways to use mechanize in interacting with websites, the documentation is pretty robust and clear, check it out!