Ruby Hacking Guide
Chapter 8 : Ruby Language Details
I’ll talk about the details of Ruby’s syntax and evaluation, which haven’t been covered yet. I didn’t intend a complete exposition, so I left out everything which doesn’t come up in this book. That’s why you won’t be able to write Ruby programs just by reading this. A complete exposition can be found in the \footnote{Ruby reference manual: `archives/ruby-refm.tar.gz` in the attached CD-ROM}
Readers who know Ruby can skip over this chapter.
Literals
The expressiveness of Ruby’s literals is extremely high. In my opinion, what makes Ruby a script language is firstly the existence of the toplevel, secondly it’s the expressiveness of its literals. Thirdly it might be the richness of its standard library.
A single literal already has enormous power, but even more when multiple literals are combined. Especially the ability of creating complex literals that hash and array literals are combined is the biggest advantage of Ruby’s literal. One can write, for instance, a hash of arrays of regular expressions by constructing straightforwardly.
What kind of expressions are valid? Let’s look at them one by one.
Strings
Strings and regular expressions can’t be missing in a scripting language. The expressiveness of Ruby’s string is very various even more than the other Ruby’s literals.
Single Quoted Strings
'string' # 「string」 '\\begin{document}' # 「\begin{document}」 '\n' # 「\n」backslash and an n, not a newline '\1' # 「\1」backslash and 1 '\'' # 「'」
This is the simplest form. In C, what enclosed in single quotes becomes a character, but in Ruby, it becomes a string. Let’s call this a `‘`-string. The backslash escape is in effect only for `\` itself and `’`. If one puts a backslash in front of another character the backslash remains as in the fourth example.
And Ruby’s strings aren’t divided by newline characters. If we write a string over several lines the newlines are contained in the string.
'multi line string'
And if the `-K` option is given to the `ruby` command, multibyte strings will be accepted. At present the three encodings EUC-JP (`-Ke`), Shift JIS (`-Ks`), and UTF8 (`-Ku`) can be specified.
'「漢字が通る」と「マルチバイト文字が通る」はちょっと違う' # 'There's a little difference between "Kanji are accepted" and "Multibyte characters are accepted".'
Double Quoted Strings
"string" # 「string」 "\n" # newline "\x0f" # a byte given in hexadecimal form "page#{n}.html" # embedding a command
With double quotes we can use command expansion and backslash notation. The backslash notation is something classical that is also supported in C, for instance, `\n` is a newline, `\b` is a backspace. In Ruby, `Ctrl-C` and ESC can also be expressed, that’s convenient. However, merely listing the whole notation is not fun, regarding its implementation, it just means a large number of cases to be handled and there’s nothing especially interesting. Therefore, they are entirely left out here.
On the other hand, expression expansion is even more fantastic. We can write an arbitrary Ruby expression inside `#{ }` and it will be evaluated at runtime and embedded into the string. There are no limitations like only one variable or only one method. Getting this far, it is not a mere literal anymore but the entire thing can be considered as an expression to express a string.
"embedded #{lvar} expression" "embedded #{@ivar} expression" "embedded #{1 + 1} expression" "embedded #{method_call(arg)} expression" "embedded #{"string in string"} expression"
Strings with `%`
%q(string) # same as 'string' %Q(string) # same as "string" %(string) # same as %Q(string) or "string"
If a lot of separator characters appear in a string, escaping all of them becomes a burden. In that case the separator characters can be changed by using `%`. In the following example, the same string is written as a `"`-string and `%`-string.
"<a href=\"http://i.loveruby.net#{path}\">" %Q(<a href="http://i.loveruby.net#{path}">)
The both expressions has the same length, but the `%`-one is a lot nicer to look at. When we have more characters to escape in it, `%`-string would also have advantage in length.
Here we have used parentheses as delimiters, but something else is fine, too. Like brackets or braces or `#`. Almost every symbol is fine, even `%`.
%q#this is string# %q[this is string] %q%this is string%
Here Documents
Here document is a syntax which can express strings spanning multiple lines. A normal string starts right after the delimiter `“` and everything until the ending `”` would be the content. When using here document, the lines between the line which contains the starting `<<EOS` and the line which contains the ending `EOS` would be the content.
"the characters between the starting symbol and the ending symbol will become a string." <<EOS All lines between the starting and the ending line are in this here document EOS
Here we used `EOS` as identifier but any word is fine. Precisely speaking, all the character matching `[a-zA-Z_0-9]` and multi-byte characters can be used.
The characteristic of here document is that the delimiters are “the lines containing the starting identifier or the ending identifier”. The line which contains the start symbol is the starting delimiter. Therefore, the position of the start identifier in the line is not important. Taking advantage of this, it doesn’t matter that, for instance, it is written in the middle of an expression:
printf(<<EOS, count_n(str)) count=%d EOS
In this case the string `“count=%d\n”` goes in the place of `<<EOS`. So it’s the same as the following.
printf("count=%d\n", count_n(str))
The position of the starting identifier is really not restricted, but on the contrary, there are strict rules for the ending symbol: It must be at the beginning of the line and there must not be another letter in that line. However if we write the start symbol with a minus like this `<<-EOS` we can indent the line with the end symbol.
<<-EOS It would be convenient if one could indent the content of a here document. But that's not possible. If you want that, writing a method to delete indents is usually a way to go. But beware of tabs. EOS
Furthermore, the start symbol can be enclosed in single or double quotes. Then the properties of the whole here document change. When we change `<<EOS` to `<<“EOS”` we can use embedded expressions and backslash notation.
<<"EOS" One day is #{24 * 60 * 60} seconds. Incredible. EOS
But `<<‘EOS’` is not the same as a single quoted string. It starts the complete literal mode. Everything even backslashes go into the string as they are typed. This is useful for a string which contains many backslashes.
In Part 2, I’ll explain how to parse a here document. But I’d like you to try to guess it before.
Characters
Ruby strings are byte sequences, there are no character objects. Instead there are the following expressions which return the integers which correspond a certain character in ASCII code.
?a # the integer which corresponds to "a" ?. # the integer which corresponds to "." ?\n # LF ?\C-a # Ctrl-a
Regular Expressions
/regexp/ /^Content-Length:/i /正規表現/ /\/\*.*?\*\//m # An expression which matches C comments /reg#{1 + 1}exp/ # the same as /reg2exp/
What is contained between slashes is a regular expression. Regular expressions are a language to designate string patterns. For example
/abc/
This regular expression matches a string where there’s an `a` followed by a `b` followed by a `c`. It matches “abc” or “fffffffabc” or “abcxxxxx”.
One can designate more special patterns.
/^From:/
This matches a string where there’s a `From` followed by a `:` at the beginning of a line. There are several more expressions of this kind, such that one can create quite complex patterns.
The uses are infinite: Changing the matched part to another string, deleting the matched part, determining if there’s one match and so on…
A more concrete use case would be, for instance, extracting the `From:` header from a mail, or changing the `\n` to an `\r`, or checking if a string looks like a mail address.
Since the regular expression itself is an independent language, it has its own parser and evaluator which are different from `ruby`. They can be found in `regex.c`. Hence, it’s enough for `ruby` to be able to cut out the regular expression part from a Ruby program and feed it. As a consequence, they are treated almost the same as strings from the grammatical point of view. Almost all of the features which strings have like escapes, backslash notations and embedded expressions can be used in the same way in regular expressions.
However, we can say they are treated as the same as strings only when we are in the viewpoint of “Ruby’s syntax”. As mentioned before, since regular expression itself is a language, naturally we have to follow its language constraints. To describe regular expression in detail, it’s so large that one more can be written, so I’d like you to read another book for this subject. I recommend “Mastering Regular Expression” by Jeffrey E.F. Friedl.
Regular Expressions with `%`
Also as with strings, regular expressions also have a syntax for changing delimiters. In this case it is `%r`. To understand this, looking at some examples are enough to understand.
%r(regexp) %r[/\*.*?\*/] # matches a C comment %r("(?:[^"\\]+|\\.)*") # matches a string in C %r{reg#{1 + 1}exp} # embedding a Ruby expression
Arrays
A comma-separated list enclosed in brackets `[]` is an array literal.
[1, 2, 3] ['This', 'is', 'an', 'array', 'of', 'string'] [/regexp/, {'hash'=>3}, 4, 'string', ?\C-a] lvar = $gvar = @ivar = @@cvar = nil [lvar, $gvar, @ivar, @@cvar] [Object.new(), Object.new(), Object.new()]
Ruby’s array (`Array`) is a list of arbitrary objects. From a syntactical standpoint, it’s characteristic is that arbitrary expressions can be elements. As mentioned earlier, an array of hashes of regular expressions can easily be made. Not just literals but also expressions which variables or method calls combined together can also be written straightforwardly.
Note that this is “an expression which generates an array object” as with the other literals.
i = 0 while i < 5 p([1,2,3].id) # Each time another object id is shown. i += 1 end
Word Arrays
When writing scripts one uses arrays of strings a lot, hence there is a special notation only for arrays of strings. That is `%w`. With an example it’s immediately obvious.
%w( alpha beta gamma delta ) # ['alpha','beta','gamma','delta'] %w( 月 火 水 木 金 土 日 ) %w( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )
There’s also `%W` where expressions can be embedded. It’s a feature implemented fairly recently.
n = 5 %w( list0 list#{n} ) # ['list0', 'list#{n}'] %W( list0 list#{n} ) # ['list0', 'list5']
The author hasn’t come up with a good use of `%W` yet.
Hashes
Hash tables are data structure which store a one-to-one relation between arbitrary objects. By writing as follows, they will be expressions to generate tables.
{ 'key' => 'value', 'key2' => 'value2' } { 3 => 0, 'string' => 5, ['array'] => 9 } { Object.new() => 3, Object.new() => 'string' } # Of course we can put it in several lines. { 0 => 0, 1 => 3, 2 => 6 }
We explained hashes in detail in the third chapter “Names and Nametables”. They are fast lookup tables which allocate memory slots depending on the hash values. In Ruby grammar, both keys and values can be arbitrary expressions.
Furthermore, when used as an argument of a method call, the `{…}` can be omitted under a certain condition.
some_method(arg, key => value, key2 => value2) # some_method(arg, {key => value, key2 => value2}) # same as above
With this we can imitate named (keyword) arguments.
button.set_geometry('x' => 80, 'y' => '240')
Of course in this case `set_geometry` must accept a hash as input. Though real keyword arguments will be transformed into parameter variables, it’s not the case for this because this is just a “imitation”.
Ranges
Range literals are oddballs which don’t appear in most other languages. Here are some expressions which generate Range objects.
0..5 # from 0 to 5 containing 5 0...5 # from 0 to 5 not containing 5 1+2 .. 9+0 # from 3 to 9 containing 9 'a'..'z' # strings from 'a' to 'z' containing 'z'
If there are two dots the last element is included. If there are three dots it is not included. Not only integers but also floats and strings can be made into ranges, even a range between arbitrary objects can be created if you’d attempt. However, this is a specification of `Range` class, which is the class of range objects, (it means a library), this is not a matter of grammar. From the parser’s standpoint, it just enables to concatenate arbitrary expressions with `..`. If a range cannot be generated with the objects as the evaluated results, it would be a runtime error.
By the way, because the precedence of `..` and `…` is quite low, sometimes it is interpreted in a surprising way.
1..5.to_a() # 1..(5.to_a())
I think my personality is relatively bent for Ruby grammar, but somehow I don’t like only this specification.
Symbols
In Part 1, we talked about symbols at length. It’s something corresponds one-to-one to an arbitrary string. In Ruby symbols are expressed with a `:` in front.
:identifier :abcde
These examples are pretty normal. Actually, besides them, all variable names and method names can become symbols with a `:` in front. Like this:
:$gvar :@ivar :@@cvar :CONST
Moreover, though we haven’t talked this yet, `[]` or `attr=` can be used as method names, so naturally they can also be used as symbols.
:[] :attr=
When one uses these symbols as values in an array, it’ll look quite complicated.
Numerical Values
This is the least interesting. One possible thing I can introduce here is that, when writing a million,
1_000_000
as written above, we can use underscore delimiters in the middle. But even this isn’t particularly interesting. From here on in this book, we’ll completely forget about numerical values.
Methods
Let’s talk about the definition and calling of methods.
Definition and Calls
def some_method( arg ) .... end class C def some_method( arg ) .... end end
Methods are defined with `def`. If they are defined at toplevel they become function style methods, inside a class they become methods of this class. To call a method which was defined in a class, one usually has to create an instance with `new` as shown below.
C.new().some_method(0)
The Return Value of Methods
The return value of a method is, if a `return` is executed in the middle, its value. Otherwise, it’s the value of the statement which was executed last.
def one() # 1 is returned return 1 999 end def two() # 2 is returned 999 2 end def three() # 3 is returned if true then 3 else 999 end end
If the method body is empty, it would automatically be `nil`, and an expression without a value cannot put at the end. Hence every method has a return value.
Optional Arguments
Optional arguments can also be defined. If the number of arguments doesn’t suffice, the parameters are automatically assigned to default values.
def some_method( arg = 9 ) # default value is 9 p arg end some_method(0) # 0 is shown. some_method() # The default value 9 is shown.
There can also be several optional arguments. But in that case they must all come at the end of the argument list. If elements in the middle of the list were optional, how the correspondences of the arguments would be very unclear.
def right_decl( arg1, arg2, darg1 = nil, darg2 = nil ) .... end # This is not possible def wrong_decl( arg, default = nil, arg2 ) # A middle argument cannot be optional .... end
Omitting argument parentheses
In fact, the parentheses of a method call can be omitted.
puts 'Hello, World!' # puts("Hello, World") obj = Object.new # obj = Object.new()
In Python we can get the method object by leaving out parentheses, but there is no such thing in Ruby.
If you’d like to, you can omit more parentheses.
puts(File.basename fname) # puts(File.basename(fname)) same as the above
If we like we can even leave out more
puts File.basename fname # puts(File.basename(fname)) same as the above
However, recently this kind of “nested omissions” became a cause of warnings. It’s likely that this will not pass anymore in Ruby 2.0.
Actually even the parentheses of the parameters definition can also be omitted.
def some_method param1, param2, param3 end def other_method # without arguments ... we see this a lot end
Parentheses are often left out in method calls, but leaving out parentheses in the definition is not very popular. However if there are no arguments, the parentheses are frequently omitted.
Arguments and Lists
Because Arguments form a list of objects, there’s nothing odd if we can do something converse: extracting a list (an array) as arguments, as the following example.
def delegate(a, b, c) p(a, b, c) end list = [1, 2, 3] delegate(*list) # identical to delegate(1, 2, 3)
In this way we can distribute an array into arguments. Let’s call this device a `*`argument now. Here we used a local variable for demonstration, but of course there is no limitation. We can also directly put a literal or a method call instead.
m(*[1,2,3]) # We could have written the expanded form in the first place... m(*mcall())
The *
argument can be used together with ordinary arguments,
but the *
argument must come last.
Otherwise, the correspondences to parameter variables cannot be determined in a
single way.
In the definition on the other hand we can handle the arguments in bulk when we put a `*` in front of the parameter variable.
def some_method( *args ) p args end some_method() # prints [] some_method(0) # prints [0] some_method(0, 1) # prints [0,1]
The surplus arguments are gathered in an array. Only one `*`parameter can be declared. It must also come after the default arguments.
def some_method0( arg, *rest ) end def some_method1( arg, darg = nil, *rest ) end
If we combine list expansion and bulk reception together, the arguments of one method can be passed as a whole to another method. This might be the most practical use of the `*`parameter.
# a method which passes its arguments to other_method def delegate(*args) other_method(*args) end def other_method(a, b, c) return a + b + c end delegate(0, 1, 2) # same as other_method(0, 1, 2) delegate(10, 20, 30) # same as other_method(10, 20, 30)
Various Method Call Expressions
Being just a single feature as ‘method call’ does not mean its representation is also single. Here is about so-called syntactic sugar. In Ruby there is a ton of it, and they are really attractive for a person who has a fetish for parsers. For instance the examples below are all method calls.
1 + 2 # 1.+(2) a == b # a.==(b) ~/regexp/ # /regexp/.~ obj.attr = val # obj.attr=(val) obj[i] # obj.[](i) obj[k] = v # obj.[]=(k,v) `cvs diff abstract.rd` # Kernel.`('cvs diff abstract.rd')
It’s hard to believe until you get used to it, but `attr=`, `[]=`, `\`` are (indeed) all method names. They can appear as names in a method definition and can also be used as symbols.
class C def []( index ) end def +( another ) end end p(:attr=) p(:[]=) p(:`)
As there are people who don’t like sweets, there are also many people who dislike syntactic sugar. Maybe they feel unfair when the things which are essentially the same appear in faked looks. (Why’s everyone so serious?)
Let’s see some more details.
Symbol Appendices
obj.name? obj.name!
First a small thing. It’s just appending a `?` or a `!`. Call and Definition do not differ, so it’s not too painful. There are convention for what to use these method names, but there is no enforcement on language level. It’s just a convention at human level. This is probably influenced from Lisp in which a great variety of characters can be used in procedure names.
Binary Operators
1 + 2 # 1.+(2)
Binary Operators will be converted to a method call to the object on the left hand side. Here the method `` from the object `1` is called. As listed below there are many of them. There are the general operators `` and `-`, also the equivalence operator `==` and the spaceship operator `<=>’ as in Perl, all sorts. They are listed in order of their precedence.
** * / % + - << >> & | ^ > >= < <= <=> == === =~
The symbols `&` and `|` are methods, but the double symbols `&&` and `||` are built-in operators. Remember how it is in C.
Unary Operators
+2 -1.0 ~/regexp/
These are the unary operators. There are only three of them: `+ – `. `+` and `-` work as they look like (by default). The operator `` matches a string or a regular expression with the variable `$_`. With an integer it stands for bit conversion.
To distinguish the unary `` from the binary `` the method names
for the unary operators are `` and `-
` respectively.
Of course they can be called by just writing `n` or `-n`.
((errata: + or – as the prefix of a numeric literal is actually scanned as a part of the literal. This is a kind of optimizations.))
Attribute Assignment
obj.attr = val # obj.attr=(val)
This is an attribute assignment fashion. The above will be translated into the method call `attr=`. When using this together with method calls whose parentheses are omitted, we can write code which looks like attribute access.
class C def i() @i end # We can write the definition in one line def i=(n) @i = n end end c = C.new c.i = 99 p c.i # prints 99
However it will turn out both are method calls. They are similar to get/set property in Delphi or slot accessors in CLOS.
Besides, we cannot define a method such as `obj.attr(arg)=`, which can take another argument in the attribute assignment fashion.
Index Notation
obj[i] # obj.[](i)
The above will be translated into a method call for `[]`. Array and hash access are also implemented with this device.
obj[i] = val # obj.[]=(i, val)
Index assignment fashion. This is translated into a call for a method named `[]=`.
`super`
We relatively often have a situation where we want add a little bit to the behaviour of an already existing method rather than replacing it. Here a mechanism to call a method of the superclass when overwriting a method is required. In Ruby, that’s `super`.
class A def test puts 'in A' end end class B < A def test super # invokes A#test end end
Ruby’s `super` differs from the one in Java. This single word means “call the method with the same name in the superclass”. `super` is a reserved word.
When using `super`, be careful about the difference between `super` with no arguments and `super` whose arguments are omitted. The `super` whose arguments are omitted passes all the given parameter variables.
class A def test( *args ) p args end end class B < A def test( a, b, c ) # super with no arguments super() # shows [] # super with omitted arguments. Same result as super(a, b, c) super # shows [1, 2, 3] end end B.new.test(1,2,3)
Visibility
In Ruby, even when calling the same method, it can be or cannot be called depending on the location (meaning the object). This functionality is usually called “visibility” (whether it is visible). In Ruby, the below three types of methods can be defined.
- `public`
- `private`
- `protected`
`public` methods can be called from anywhere in any form. `private` methods can only be called in a form “syntactically” without a receiver. In effect they can only be called by instances of the class in which they were defined and in instances of its subclass. `protected` methods can only be called by instances of the defining class and its subclasses. It differs from `private` that methods can still be called from other instances of the same class.
The terms are the same as in C++ but the meaning is slightly different. Be careful.
Usually we control visibility as shown below.
class C public def a1() end # becomes public def a2() end # becomes public private def b1() end # becomes private def b2() end # becomes private protected def c1() end # becomes protected def c2() end # becomes protected end
Here `public`, `private` and `protected are method calls without parentheses. These aren’t even reserved words.
`public` and `private` can also be used with an argument to set the visibility of a particular method. But its mechanism is not interesting. We’ll leave this out.
Module functions
Given a module ‘M’. If there are two methods with the exact same content
- `M.method_name`
- `M#method_name`(Visibility is `private`)
then we call this a module function.
It is not apparent why this should be useful. But let’s look at the next example which is happily used.
Math.sin(5) # If used for a few times this is more convenient include Math sin(5) # If used more often this is more practical
It’s important that both functions have the same content. With a different `self` but with the same code the behavior should still be the same. Instance variables become extremely difficult to use. Hence such method is very likely a method in which only procedures are written (like `sin`). That’s why they are called module “functions”.
Iterators
Ruby’s iterators differ a bit from Java’s or C++’s iterator classes or ‘Iterator’ design pattern. Precisely speaking, those iterators are called exterior iterators, Ruby’s iterators are interior iterators. Regarding this, it’s difficult to understand from the definition so let’s explain it with a concrete example.
arr = [0,2,4,6.8]
This array is given and we want to access the elements in order. In C style we would write the following.
i = 0 while i < arr.length print arr[i] i += 1 end
Using an iterator we can write:
arr.each do |item| print item end
Everything from `each do` to `end` is the call to an iterator method. More precisely `each` is the iterator method and between `do` and `end` is the iterator block. The part between the vertical bars are called block parameters, which become variables to receive the parameters passed from the iterator method to the block.
Saying it a little abstractly, an iterator is something like a piece of code which has been cut out and passed. In our example the piece `print item` has been cut out and is passed to the `each` method. Then `each` takes all the elements of the array in order and passes them to the cut out piece of code.
We can also think the other way round. The other parts except `print item` are being cut out and enclosed into the `each` method.
i = 0 while i < arr.length print arr[i] i += 1 end arr.each do |item| print item end
Comparison with higher order functions
What comes closest in C to iterators are functions which receive function pointers, it means higher order functions. But there are two points in which iterators in Ruby and higher order functions in C differ.
Firstly, Ruby iterators can only take one block. For instance we can’t do the following.
# Mistake. Several blocks cannot be passed. array_of_array.each do |i| .... end do |j| .... end
Secondly, Ruby’s blocks can share local variables with the code outside.
lvar = 'ok' [0,1,2].each do |i| p lvar # Can acces local variable outside the block. end
That’s where iterators are convenient.
But variables can only be shared with the outside. They cannot be shared with the inside of the iterator method ( e.g. `each`). Putting it intuitively, only the variables in the place which looks of the source code continued are visible.
Block Local Variables
Local variables which are assigned inside a block stay local to that block, it means they become block local variables. Let’s check it out.
[0].each do i = 0 p i # 0 end
For now, to create a block, we apply `each` on an array of length 1 (We can fully leave out the block parameter). In that block, the `i` variable is first assigned .. meaning declared. This makes `i` block local.
It is said block local, so it should not be able to access from the outside. Let’s test it.
% ruby -e ' [0].each do i = 0 end p i # Here occurs an error. ' -e:5: undefined local variable or method `i' for #<Object:0x40163a9c> (NameError)
When we referenced a block local variable from outside the block, surely an error occured. Without a doubt it stayed local to the block.
Iterators can also be nested repeatedly. Each time the new block creates another scope.
lvar = 0 [1].each do var1 = 1 [2].each do var2 = 2 [3].each do var3 = 3 # Here lvar, var1, var2, var3 can be seen end # Here lvar, var1, var2 can be seen end # Here lvar, var1 can be seen end # Here only lvar can be seen
There’s one point which you have to keep in mind. Differing from nowadays’ major languages Ruby’s block local variables don’t do shadowing. Shadowing means for instance in C that in the code below the two declared variables `i` are different.
{ int i = 3; printf("%d\n", i); /* 3 */ { int i = 99; printf("%d\n", i); /* 99 */ } printf("%d\n", i); /* 3 (元に戻った) */ }
Inside the block the i
inside overshadows the i
outside.
That’s why it’s called shadowing.
But what happens with block local variables of Ruby where there’s no shadowing. Let’s look at this example.
i = 0 p i # 0 [0].each do i = 1 p i # 1 end p i # 1 the change is preserved
Even when we assign i
inside the block,
if there is the same name outside, it would be used.
Therefore when we assign to inside i
, the value of outside i
would be
changed. On this point there
came many complains: “This is error prone. Please do shadowing.”
Each time there’s nearly flaming but till now no conclusion was reached.
The syntax of iterators
There are some smaller topics left.
First, there are two ways to write an iterator. One is the `do` ~ `end` as used above, the other one is the enclosing in braces. The two expressions below have exactly the same meaning.
arr.each do |i| puts i end arr.each {|i| # The author likes a four space indentation for puts i # an iterator with braces. }
But grammatically the precedence is different. The braces bind much stronger than `do`~`end`.
m m do .... end # m(m) do....end m m { .... } # m(m() {....})
And iterators are definitely methods, so there are also iterators that take arguments.
re = /^\d/ # regular expression to match a digit at the beginning of the line $stdin.grep(re) do |line| # look repeatedly for this regular expression .... end
`yield`
Of course users can write their own iterators. Methods which have a `yield` in their definition text are iterators. Let’s try to write an iterator with the same effect as `Array#each`:
# adding the definition to the Array class class Array def my_each i = 0 while i < self.length yield self[i] i += 1 end end end # this is the original each [0,1,2,3,4].each do |i| p i end # my_each works the same [0,1,2,3,4].my_each do |i| p i end
yield
calls the block. At this point control is passed to the block,
when the execution of the block finishes it returns back to the same
location. Think about it like a characteristic function call. When the
present method does not have a block a runtime error will occur.
% ruby -e '[0,1,2].each' -e:1:in `each': no block given (LocalJumpError) from -e:1
`Proc`
I said, that iterators are like cut out code which is passed as an argument. But we can even more directly make code to an object and carry it around.
twice = Proc.new {|n| n * 2 } p twice.call(9) # 18 will be printed
In short, it is like a function. As might be expected from the fact it is
created with new
, the return value of Proc.new
is an instance
of the Proc
class.
Proc.new
looks surely like an iterator and it is indeed so.
It is an ordinary iterator. There’s only some mystic mechanism inside Proc.new
which turns an iterator block into an object.
Besides there is a function style method lambda
provided which
has the same effect as Proc.new
. Choose whatever suits you.
twice = lambda {|n| n * 2 }
Iterators and `Proc`
Why did we start talking all of a sudden about Proc
? Because there
is a deep relationship between iterators and Proc
.
In fact, iterator blocks and Proc
objects are quite the same thing.
That’s why one can be transformed into the other.
First, to turn an iterator block into a Proc
object
one has to put an &
in front of the parameter name.
def print_block( &block ) p block end print_block() do end # Shows something like <Proc:0x40155884> print_block() # Without a block nil is printed
With an &
in front of the argument name, the block is transformed to
a Proc
object and assigned to the variable. If the method is not an
iterator (there’s no block attached) nil
is assigned.
And in the other direction, if we want to pass a Proc
to an iterator
we also use &
.
block = Proc.new {|i| p i } [0,1,2].each(&block)
This code means exactly the same as the code below.
[0,1,2].each {|i| p i }
If we combine these two, we can delegate an iterator block to a method somewhere else.
def each_item( &block ) [0,1,2].each(&block) end each_item do |i| # same as [0,1,2].each do |i| p i end
Expressions
“Expressions” in Ruby are things with which we can create other expressions or statements by combining with the others. For instance a method call can be another method call’s argument, so it is an expression. The same goes for literals. But literals and method calls are not always combinations of elements. On the contrary, “expressions”, which I’m going to introduce, always consists of some elements.
`if`
We probably do not need to explain the if
expression. If the conditional
expression is true, the body is executed. As explained in Part 1,
every object except nil
and false
is true in Ruby.
if cond0 then .... elsif cond1 then .... elsif cond2 then .... else .... end
`elsif`/`else`-clauses can be omitted. Each `then` as well.
But there are some finer requirements concerning then
.
For this kind of thing, looking at some examples is the best way to understand.
Here only thing I’d say is that the below codes are valid.
# 1 # 4 if cond then ..... end if cond then .... end # 2 if cond; .... end # 5 if cond # 3 then if cond then; .... end .... end
And in Ruby, `if` is an expression, so there is the value of the entire `if` expression. It is the value of the body where a condition expression is met. For example, if the condition of the first `if` is true, the value would be the one of its body.
p(if true then 1 else 2 end) #=> 1 p(if false then 1 else 2 end) #=> 2 p(if false then 1 elsif true then 2 else 3 end) #=> 2
If there’s no match, or the matched clause is empty,
the value would be nil
.
p(if false then 1 end) #=> nil p(if true then end) #=> nil
`unless`
An if
with a negated condition is an unless
.
The following two expressions have the same meaning.
unless cond then if not (cond) then .... .... end end
unless
can also have attached else
clauses but any elsif
cannot be
attached.
Needless to say, then
can be omitted.
unless
also has a value and its condition to decide is completely the same as
`if`. It means the entire value would be the value of the body of the matched
clause. If there’s no match or the matched clause is empty,
the value would be nil
.
`and && or ||`
The most likely utilization of the and
is probably a boolean operation.
For instance in the conditional expression of an if
.
if cond1 and cond2 puts 'ok' end
But as in Perl, `sh` or Lisp, it can also be used as a conditional branch expression. The two following expressions have the same meaning.
if invalid?(key) invalid?(key) and return nil return nil end
&&
and and
have the same meaning. Different is the binding order.
method arg0 && arg1 # method(arg0 && arg1) method arg0 and arg1 # method(arg0) and arg1
Basically the symbolic operator creates an expression which can be an argument (`arg`). The alphabetical operator creates an expression which cannot become an argument (`expr`).
As for and
, if the evaluation of the left hand side is true,
the right hand side will also be evaluated.
On the other hand or
is the opposite of and
. If the evaluation of the left hand
side is false, the right hand side will also be evaluated.
valid?(key) or return nil
or
and ||
have the same relationship as &&
and and
. Only the precedence is
different.
The Conditional Operator
There is a conditional operator similar to C:
cond ? iftrue : iffalse
The space between the symbols is important. If they bump together the following weirdness happens.
cond?iftrue:iffalse # cond?(iftrue(:iffalse))
The value of the conditional operator is the value of the last executed expression. Either the value of the true side or the value of the false side.
`while until`
Here’s a `while` expression.
while cond do .... end
This is the simplest loop syntax. As long as cond
is true
the body is executed. The do
can be omitted.
until io_ready?(id) do sleep 0.5 end
until
creates a loop whose condition definition is opposite.
As long as the condition is false it is executed.
The do
can be omitted.
Naturally there is also jump syntaxes to exit a loop.
break
as in C/C++/Java is also break
,
but continue
is next
.
Perhaps next
has come from Perl.
i = 0 while true if i > 10 break # exit the loop elsif i % 2 == 0 i *= 2 next # next loop iteration end i += 1 end
And there is another Perlism: the redo
.
while cond # (A) .... redo .... end
It will return to (A) and repeat from there.
What differs from next
is it does not check the condition.
I might come into the world top 100, if the amount of Ruby programs
would be counted, but I haven’t used redo
yet. It does not seem to be
necessary after all because I’ve lived happily despite of it.
`case`
A special form of the if
expression. It performs branching on a series of
conditions. The following left and right expressions are identical in meaning.
case value when cond1 then if cond1 === value .... .... when cond2 then elsif cond2 === value .... .... when cond3, cond4 then elsif cond3 === value or cond4 === value .... .... else else .... .... end end
The threefold equals ===
is, as the same as the ==
, actually a method call.
Notice that the receiver is the object on the left hand side. Concretely,
if it is the `===` of an `Array`, it would check if it contains the `value`
as its element.
If it is a `Hash`, it tests whether it has the `value` as its key.
If its is an regular expression, it tests if the value
matches.
And so on.
Since `case` has many grammatical elements,
to list them all would be tedious, thus we will not cover them in this book.
Exceptions
This is a control structure which can pass over method boundaries and transmit errors. Readers who are acquainted to C++ or Java will know about exceptions. Ruby exceptions are basically the same.
In Ruby exceptions come in the form of the function style method `raise`. `raise` is not a reserved word.
raise ArgumentError, "wrong number of argument"
In Ruby exception are instances of the Exception
class and it’s
subclasses. This form takes an exception class as its first argument
and an error message as its second argument. In the above case
an instance of ArgumentError
is created and “thrown”. Exception
object would ditch the part after the raise
and start to return upwards the
method call stack.
def raise_exception raise ArgumentError, "wrong number of argument" # the code after the exception will not be executed puts 'after raise' end raise_exception()
If nothing blocks the exception it will move on and on and
finally it will reach the top level.
When there’s no place to return any more, ruby
gives out a message and ends
with a non-zero exit code.
% ruby raise.rb raise.rb:2:in `raise_exception': wrong number of argument (ArgumentError) from raise.rb:7
However an exit
would be sufficient for this, and for an exception there
should be a way to set handlers.
In Ruby, begin
~rescue
~end
is used for this.
It resembles the try
~catch
in C++ and Java.
def raise_exception raise ArgumentError, "wrong number of argument" end begin raise_exception() rescue ArgumentError => err then puts 'exception catched' p err end
rescue
is a control structure which captures exceptions, it catches
exception objects of the specified class and its subclasses. In the
above example, an instance of ArgumentError
comes flying into the place
where ArgumentError
is targeted, so it matches this rescue
.
By =>err
the exception object will be assigned to the local variable
err
, after that the rescue
part is executed.
% ruby rescue.rb exception catched #<ArgumentError: wrong number of argument>
When an exception is rescued, it will go through the `rescue` and it will start to execute the subsequent as if nothing happened, but we can also make it retry from the `begin`. To do so, `retry` is used.
begin # the place to return .... rescue ArgumentError => err then retry # retry your life end
We can omit the =>err
and the then
after rescue
. We can also leave
out the exception class. In this case, it means as the same as when the
StandardError
class is specified.
If we want to catch more exception classes, we can just write them in line. When we want to handle different errors differently, we can specify several `rescue` clauses.
begin raise IOError, 'port not ready' rescue ArgumentError, TypeError rescue IOError rescue NameError end
When written in this way, a `rescue` clause that matches the exception class is
searched in order from the top. Only the matched clause will be executed.
For instance, only the clause of IOError
will be executed in the above case.
On the other hand, when there is an else
clause, it is executed
only when there is no exception.
begin nil # Of course here will no error occur rescue ArgumentError # This part will not be executed else # This part will be executed end
Moreover an ensure
clause will be executed in every case:
when there is no exception, when there is an exception, rescued or not.
begin f = File.open('/etc/passwd') # do stuff ensure # this part will be executed anyway f.close end
By the way, this begin
expression also has a value. The value of the
whole begin
~end
expression is the value of the part which was executed
last among begin
/rescue
/else
clauses.
It means the last statement of the clauses aside from `ensure`.
The reason why the ensure
is not counted is probably because
ensure
is usually used for cleanup (thus it is not a main line).
Variables and Constants
Referring a variable or a constant. The value is the object the variable points to. We already talked in too much detail about the various behaviors.
lvar @ivar @@cvar CONST $gvar
I want to add one more thing.
Among the variables starting with $
,
there are special kinds.
They are not necessarily global variables and
some have strange names.
First the Perlish variables $_
and $~
. $_
saves the return
value of gets
and other methods, $~
contains the last match
of a regular expression.
They are incredible variables which are local variables and simultaneously
thread local variables.
And the $!
to hold the exception object when an error is occured,
the $?
to hold the status of a child process,
the $SAFE
to represent the security level,
they are all thread local.
Assignment
Variable assignments are all performed by `=`. All variables are typeless. What is saved is a reference to an object. As its implementation, it was a `VALUE` (pointer).
var = 1 obj = Object.new @ivar = 'string' @@cvar = ['array'] PI = 3.1415926535 $gvar = {'key' => 'value'}
However, as mentioned earlier `obj.attr=val` is not an assignment but a method call.
Self Assignment
var += 1
This syntax is also in C/C++/Java. In Ruby,
var = var + 1
it is a shortcut of this code.
Differing from C, the Ruby +
is a method and thus part of the library.
In C, the whole meaning of +=
is built in the language processor itself.
And in `C++`, +=
and *=
can be wholly overwritten,
but we cannot do this in Ruby.
In Ruby +=
is always defined as an operation of the combination of +
and assignment.
We can also combine self assignment and an attribute-access-flavor method. The result more looks like an attribute.
class C def i() @i end # A method definition can be written in one line. def i=(n) @i = n end end obj = C.new obj.i = 1 obj.i += 2 # obj.i = obj.i + 2 p obj.i # 3
If there is `=` there might also be `` but this is not the case. Why is that so? In Ruby assignment is dealt with on the language level. But on the other hand methods are in the library. Keeping these two, the world of variables and the world of objects, strictly apart is an important peculiarity of Ruby. If @@ were introduced the separation might easily be broken. That’s why there’s no @+@
Some people don’t want to go without the brevity of ++
. It has been
proposed again and again in the mailing list but was always turned down.
I am also in favor of ++
but not as much as I can’t do without,
and I have not felt so much needs of ++
in Ruby in the first place,
so I’ve kept silent and decided to forget about it.
`defined?`
defined?
is a syntax of a quite different color in Ruby. It tells whether an
expression value is “defined” or not at runtime.
var = 1 defined?(var) #=> true
In other words it tells whether a value can be obtained from the expression received as its argument (is it okay to call it so?) when the expression is evaluated. That said but of course you can’t write an expression causing a parse error, and it could not detect if the expression is something containing a method call which raises an error in it.
I would have loved to tell you more about defined?
but it will not appear again in this book. What a pity.
Statements
A statement is what basically cannot be combined with the other syntaxes, in other words, they are lined vertically.
But it does not mean there’s no evaluated value. For instance there are return values for class definition statements and method definition statements. However this is rarely recommended and isn’t useful, you’d better regard them lightly in this way. Here we also skip about the value of each statement.
The Ending of a statement
Up to now we just said “For now one line’s one statement”. But Ruby’s statement ending’s aren’t that straightforward.
First a statement can be ended explicitly with a semicolon as in C. Of course then we can write two and more statements in one line.
puts 'Hello, World!'; puts 'Hello, World once more!'
On the other hand, when the expression apparently continues, such as just after opened parentheses, dyadic operators, or a comma, the statement continues automatically.
# 1 + 3 * method(6, 7 + 8) 1 + 3 * method( 6, 7 + 8)
But it’s also totally no problem to use a backslash to explicitly indicate the continuation.
p 1 + \ 2
The Modifiers `if` and `unless`
The `if` modifier is an irregular version of the normal `if` The programs on the left and right mean exactly the same.
on_true() if cond if cond on_true() end
The `unless` is the negative version. Guard statements ( statements which exclude exceptional conditions) can be conveniently written with it.
The Modifiers `while` and `until`
`while` and `until` also have a back notation.
process() while have_content? sleep(1) until ready?
Combining this with `begin` and `end` gives a `do`-`while`-loop like in C.
begin res = get_response(id) end while need_continue?(res)
Class Definition
class C < SuperClass .... end
Defines the class `C` which inherits from `SuperClass`
We talked quite extensively about classes in Part 1.
This statement will be executed, the class to be defined will
become self
within the statement, arbitrary expressions can be written within. Class
definitions can be nested. They form the foundation of Ruby execution
image.
Method Definition
def m(arg) end
I’ve already written about method definition and won’t add more. This section is put to make it clear that they also belong to statements.
Singleton method definition
We already talked a lot about singleton methods in Part 1. They do not belong to classes but to objects, in fact, they belong to singleton classes. We define singleton methods by putting the receiver in front of the method name. Parameter declaration is done the same way like with ordinary methods.
def obj.some_method end def obj.some_method2( arg1, arg2, darg = nil, *rest, &block ) end
Definition of Singleton methods
class << obj .... end
From the viewpoint of purposes, it is the statement to define some singleton methods in a bundle. From the viewpoint of measures, it is the statement in which the singleton class of `obj` becomes `self` when executed. In all over the Ruby program, this is the only place where a singleton class is exposed.
class << obj p self #=> #<Class:#<Object:0x40156fcc>> # Singleton Class 「(obj)」 def a() end # def obj.a def b() end # def obj.b end
Multiple Assignment
With a multiple assignment, several assignments can be done all at once. The following is the simplest case:
a, b, c = 1, 2, 3
It’s exactly the same as the following.
a = 1 b = 2 c = 3
Just being concise is not interesting. in fact, when an array comes in to be mixed, it becomes something fun for the first time.
a, b, c = [1, 2, 3]
This also has the same result as the above. Furthermore, the right hand side does not need to be a grammatical list or a literal. It can also be a variable or a method call.
tmp = [1, 2, 3] a, b, c = tmp ret1, ret2 = some_method() # some_method might probably return several values
Precisely speaking it is as follows.
Here we’ll assume obj
is (the object of) the value of the left hand side,
- `obj` if it is an array
- if its `to_ary` method is defined, it is used to convert `obj` to an array.
- `[obj]`
Decide the right-hand side by following this procedure and perform assignments. It means the evaluation of the right-hand side and the operation of assignments are totally independent from each other.
And it goes on, both the left and right hand side can be infinitely nested.
a, (b, c, d) = [1, [2, 3, 4]] a, (b, (c, d)) = [1, [2, [3, 4]]] (a, b), (c, d) = [[1, 2], [3, 4]]
As the result of the execution of this program, each line will be `a=1 b=2 c=3 d=4`.
And it goes on. The left hand side can be index or parameter assignments.
i = 0 arr = [] arr[i], arr[i+1], arr[i+2] = 0, 2, 4 p arr # [0, 2, 4] obj.attr0, obj.attr1, obj.attr2 = "a", "b", "c"
And like with method parameters,
*
can be used to receive in a bundle.
first, *rest = 0, 1, 2, 3, 4 p first # 0 p rest # [1, 2, 3, 4]
When all of them are used all at once, it’s extremely confusing.
Block parameter and multiple assignment
We brushed over block parameters when we were talking about iterators. But there is a deep relationship between them and multiple assignment. For instance in the following case.
array.each do |i| .... end
Every time when the block is called,
the `yield`ed arguments are multi-assigned to `i`.
Here there’s only one variable on the left hand side, so it does not look like multi assignment.
But if there are two or more variables, it would a little more look like it.
For instance, Hash#each
is an repeated operation on the pairs of keys and values,
so usually we call it like this:
hash.each do |key, value| .... end
In this case, each array consist of a key and a value is `yield`ed from the hash.
Hence we can also does the following thing by using nested multiple assignment.
# [[key,value],index] are yielded hash.each_with_index do |(key, value), index| .... end
`alias`
class C alias new orig end
Defining another method `new` with the same body as the already defined method `orig`. `alias` are similar to hardlinks in a unix file system. They are a means of assigning multiple names to one method body. To say this inversely, because the names themselves are independent of each other, even if one method name is overwritten by a subclass method, the other one still remains with the same behavior.
`undef`
class C undef method_name end
Prohibits the calling of `C#method_name`. It’s not just a simple revoking of the definition. If there even were a method in the superclass it would also be forbidden. In other words the method is exchanged for a sign which says “This method must not be called”.
`undef` is extremely powerful, once it is set it cannot be deleted from the Ruby level because it is used to cover up contradictions in the internal structure. Only one left measure is inheriting and defining a method in the lower class. Even in that case, calling `super` would cause an error occurring.
The method which corresponds to `unlink` in a file system is `Module#remove_method`. While defining a class, `self` refers to that class, we can call it as follows (Remember that `Class` is a subclass of `Module`.)
class C remove_method(:method_name) end
But even with a `remove_method` one cannot cancel the `undef`. It’s because the sign put up by `undef` prohibits any kind of searches.
((errata: It can be redefined by using `def`))
Some more small topics
Comments
# examples of bad comments. 1 + 1 # compute 1+1. alias my_id id # my_id is an alias of id.
From a `#` to the end of line is a comment. It doesn’t have a meaning for the program.
Embedded documents
=begin This is an embedded document. It's so called because it is embedded in the program. Plain and simple. =end
An embedded document stretches from an `=begin` outside a string at the beginning of a line to a `=end`. The interior can be arbitrary. The program ignores it as a mere comment.
Multi-byte strings
When the global variable $KCODE
is set to either EUC
, SJIS
or UTF8
, strings encoded in euc-jp, shift_jis, or utf8 respectively can be
used in a string of a data.
And if the option -Ke
, -Ks
or -Ku
is given to the ruby
command multibyte strings can be used within the Ruby code.
String literals, regular expressions and even operator names
can contain multibyte characters. Hence it is possible to do
something like this:
def 表示( arg ) puts arg end 表示 'にほんご'
But I really cannot recommend doing things like that.