Ruby Hacking Guide
Chapter 17: Dynamic evaluation
Overview
I have already finished to describe about the mechanism of the evaluator by the previous chapter. In this chapter, by including the parser in addition to it, let’s examine the big picture as “the evaluator in a broad sense”. There are three targets: `eval`, `Module#module_eval` and `Object#instance_eval`.
`eval`
I’ve already described about `eval`, but I’ll introduce more tiny things about it here.
By using `eval`, you can compile and evaluate a string at runtime in the place. Its return value is the value of the last expression of the program.
p eval("1 + 1") # 2
You can also refer to a variable in its scope from inside of a string to `eval`.
lvar = 5 @ivar = 6 p eval("lvar + @ivar") # 11
Readers who have been reading until here cannot simply read and pass over the word “its scope”. For instance, you are curious about how is its “scope” of constants, aren’t you? I am. To put the bottom line first, basically you can think it directly inherits the environment of outside of `eval`.
And you can also define methods and define classes.
def a eval('class C; def test() puts("ok") end end') end a() # define class C and C#test C.new.test # shows ok
Moreover, as mentioned a little in the previous chapter, when you pass a `Proc` as the second argument, the string can be evaluated in its environment.
def new_env n = 5 Proc.new { nil } # turn the environment of this method into an object and return it end p eval('n * 3', new_env()) # 15
`module_eval` and `instance_eval`
When a `Proc` is passed as the second argument of `eval`, the evaluations can be done in its environment. `module_eval` and `instance_eval` is its limited (or shortcut) version. With `module_eval`, you can evaluate in an environment that is as if in a module statement or a class statement.
lvar = "toplevel lvar" # a local variable to confirm this scope module M end M.module_eval(<<'EOS') # a suitable situation to use here-document p lvar # referable p self # shows M def ok # define M#ok puts 'ok' end EOS
With `instance_eval`, you can evaluate in an environment whose `self` of the singleton class statement is the object.
lvar = "toplevel lvar" # a local variable to confirm this scope obj = Object.new obj.instance_eval(<<'EOS') p lvar # referable p self # shows #<Object:0x40274f5c> def ok # define obj.ok puts 'ok' end EOS
Additionally, these `module_eval` and `instance_eval` can also be used as iterators, a block is evaluated in each environment in that case. For instance,
obj = Object.new p obj # #<Object:0x40274fac> obj.instance_eval { p self # #<Object:0x40274fac> }
Like this.
However, between the case when using a string and the case when using a block, the behavior around local variables is different each other. For example, when creating a block in the `a` method then doing `instance_eval` it in the `b` method, the block would refer to the local variables of `a`. When creating a string in the `a` method then doing `instance_eval` it in the `b` method, from inside of the string, it would refer to the local variables of `b`. The scope of local variables is decided “at compile time”, the consequence differs because a string is compiled every time but a block is compiled when loading files.
`eval`
`eval()`
The `eval` of Ruby branches many times based on the presence and absence of the parameters. Let’s assume the form of call is limited to the below:
eval(prog_string, some_block)
Then, since this makes the actual interface function `rb_f_eval()` almost meaningless, we’ll start with the function `eval()` which is one step lower. The function prototype of `eval()` is:
static VALUE eval(VALUE self, VALUE src, VALUE scope, char *file, int line);
`scope` is the `Proc` of the second parameter. `file` and `line` is the file name and line number of where a string to `eval` is supposed to be located. Then, let’s see the content:
4984 static VALUE 4985 eval(self, src, scope, file, line) 4986 VALUE self, src, scope; 4987 char *file; 4988 int line; 4989 { 4990 struct BLOCK *data = NULL; 4991 volatile VALUE result = Qnil; 4992 struct SCOPE * volatile old_scope; 4993 struct BLOCK * volatile old_block; 4994 struct RVarmap * volatile old_dyna_vars; 4995 VALUE volatile old_cref; 4996 int volatile old_vmode; 4997 volatile VALUE old_wrapper; 4998 struct FRAME frame; 4999 NODE *nodesave = ruby_current_node; 5000 volatile int iter = ruby_frame->iter; 5001 int state; 5002 5003 if (!NIL_P(scope)) { /* always true now */ 5009 Data_Get_Struct(scope, struct BLOCK, data); 5010 /* push BLOCK from data */ 5011 frame = data->frame; 5012 frame.tmp = ruby_frame; /* to prevent from GC */ 5013 ruby_frame = &(frame); 5014 old_scope = ruby_scope; 5015 ruby_scope = data->scope; 5016 old_block = ruby_block; 5017 ruby_block = data->prev; 5018 old_dyna_vars = ruby_dyna_vars; 5019 ruby_dyna_vars = data->dyna_vars; 5020 old_vmode = scope_vmode; 5021 scope_vmode = data->vmode; 5022 old_cref = (VALUE)ruby_cref; 5023 ruby_cref = (NODE*)ruby_frame->cbase; 5024 old_wrapper = ruby_wrapper; 5025 ruby_wrapper = data->wrapper; 5032 self = data->self; 5033 ruby_frame->iter = data->iter; 5034 } 5045 PUSH_CLASS(); 5046 ruby_class = ruby_cbase; /* == ruby_frame->cbase */ 5047 5048 ruby_in_eval++; 5049 if (TYPE(ruby_class) == T_ICLASS) { 5050 ruby_class = RBASIC(ruby_class)->klass; 5051 } 5052 PUSH_TAG(PROT_NONE); 5053 if ((state = EXEC_TAG()) == 0) { 5054 NODE *node; 5055 5056 result = ruby_errinfo; 5057 ruby_errinfo = Qnil; 5058 node = compile(src, file, line); 5059 if (ruby_nerrs > 0) { 5060 compile_error(0); 5061 } 5062 if (!NIL_P(result)) ruby_errinfo = result; 5063 result = eval_node(self, node); 5064 } 5065 POP_TAG(); 5066 POP_CLASS(); 5067 ruby_in_eval--; 5068 if (!NIL_P(scope)) { /* always true now */ 5069 int dont_recycle = ruby_scope->flags & SCOPE_DONT_RECYCLE; 5070 5071 ruby_wrapper = old_wrapper; 5072 ruby_cref = (NODE*)old_cref; 5073 ruby_frame = frame.tmp; 5074 ruby_scope = old_scope; 5075 ruby_block = old_block; 5076 ruby_dyna_vars = old_dyna_vars; 5077 data->vmode = scope_vmode; /* save the modification of the visibility scope */ 5078 scope_vmode = old_vmode; 5079 if (dont_recycle) { /* ……copy SCOPE BLOCK VARS…… */ 5097 } 5098 } 5104 if (state) { 5105 if (state == TAG_RAISE) { /* ……prepare an exception object…… */ 5121 rb_exc_raise(ruby_errinfo); 5122 } 5123 JUMP_TAG(state); 5124 } 5125 5126 return result; 5127 } (eval.c)
If this function is shown without any preamble, you probably feel “oww!”. But we’ve defeated many functions of `eval.c` until here, so this is not enough to be an enemy of us. This function is just continuously saving/restoring the stacks. The points we need to care about are only the below three:
- unusually `FRAME` is also replaced (not copied and pushed)
- `ruby_cref` is substituted (?) by `ruby_frame→cbase`
- only `scope_vmode` is not simply restored but influences `data`.
And the main parts are the `compile()` and `eval_node()` located around the middle. Though it’s possible that `eval_node()` has already been forgotten, it is the function to start the evaluation of the parameter `node`. It was also used in `ruby_run()`.
Here is `compile()`.
4968 static NODE* 4969 compile(src, file, line) 4970 VALUE src; 4971 char *file; 4972 int line; 4973 { 4974 NODE *node; 4975 4976 ruby_nerrs = 0; 4977 Check_Type(src, T_STRING); 4978 node = rb_compile_string(file, src, line); 4979 4980 if (ruby_nerrs == 0) return node; 4981 return 0; 4982 } (eval.c)
`ruby_nerrs` is the variable incremented in `yyerror()`. In other words, if this variable is non-zero, it indicates more than one parse error happened. And, `rb_compile_string()` was already discussed in Part 2. It was a function to compile a Ruby string into a syntax tree.
One thing becomes a problem here is local variable. As we’ve seen in Chapter 12: Syntax tree construction, local variables are managed by using `lvtbl`. However, since a `SCOPE` (and possibly also `VARS`) already exists, we need to parse in the way of writing over and adding to it. This is in fact the heart of `eval()`, and is the worst difficult part. Let’s go back to `parse.y` again and complete this investigation.
`top_local`
I’ve mentioned that the functions named `local_push() local_pop()` are used when pushing `struct local_vars`, which is the management table of local variables, but actually there’s one more pair of functions to push the management table. It is the pair of `top_local_init()` and `top_local_setup()`. They are called in this sort of way.
program : { top_local_init(); } compstmt { top_local_setup(); }
Of course, in actuality various other things are also done, but all of them are cut here because it’s not important. And this is the content of it:
5273 static void 5274 top_local_init() 5275 { 5276 local_push(1); 5277 lvtbl->cnt = ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0; 5278 if (lvtbl->cnt > 0) { 5279 lvtbl->tbl = ALLOC_N(ID, lvtbl->cnt+3); 5280 MEMCPY(lvtbl->tbl, ruby_scope->local_tbl, ID, lvtbl->cnt+1); 5281 } 5282 else { 5283 lvtbl->tbl = 0; 5284 } 5285 if (ruby_dyna_vars) 5286 lvtbl->dlev = 1; 5287 else 5288 lvtbl->dlev = 0; 5289 } (parse.y)
This means that `local_tbl` is copied from `ruby_scope` to `lvtbl`. As for block local variables, since it’s better to see them all at once later, we’ll focus on ordinary local variables for the time being. Next, here is `top_local_setup()`.
5291 static void 5292 top_local_setup() 5293 { 5294 int len = lvtbl->cnt; /* the number of local variables after parsing */ 5295 int i; /* the number of local varaibles before parsing */ 5296 5297 if (len > 0) { 5298 i = ruby_scope->local_tbl ? ruby_scope->local_tbl[0] : 0; 5299 5300 if (i < len) { 5301 if (i == 0 || (ruby_scope->flags & SCOPE_MALLOC) == 0) { 5302 VALUE *vars = ALLOC_N(VALUE, len+1); 5303 if (ruby_scope->local_vars) { 5304 *vars++ = ruby_scope->local_vars[-1]; 5305 MEMCPY(vars, ruby_scope->local_vars, VALUE, i); 5306 rb_mem_clear(vars+i, len-i); 5307 } 5308 else { 5309 *vars++ = 0; 5310 rb_mem_clear(vars, len); 5311 } 5312 ruby_scope->local_vars = vars; 5313 ruby_scope->flags |= SCOPE_MALLOC; 5314 } 5315 else { 5316 VALUE *vars = ruby_scope->local_vars-1; 5317 REALLOC_N(vars, VALUE, len+1); 5318 ruby_scope->local_vars = vars+1; 5319 rb_mem_clear(ruby_scope->local_vars+i, len-i); 5320 } 5321 if (ruby_scope->local_tbl && ruby_scope->local_vars[-1] == 0) { 5322 free(ruby_scope->local_tbl); 5323 } 5324 ruby_scope->local_vars[-1] = 0; /* NODE is not necessary anymore */ 5325 ruby_scope->local_tbl = local_tbl(); 5326 } 5327 } 5328 local_pop(); 5329 } (parse.y)
Since `local_vars` can be either in the stack or in the heap, it makes the code complex to some extent. However, this is just updating `local_tbl` and `local_vars` of `ruby_scope`. (When `SCOPE_MALLOC` was set, `local_vars` was allocated by `malloc()`). And here, because there’s no meaning of using `alloca()`, it is forced to change its allocation method to `malloc`.
Block Local Variable
By the way, how about block local variables? To think about this, we have to go back to the entry point of the parser first, it is `yycompile()`.
static NODE* yycompile(f, line) { struct RVarmap *vars = ruby_dyna_vars; : n = yyparse(); : ruby_dyna_vars = vars; }
This looks like a mere save-restore, but the point is that this does not clear the `ruby_dyna_vars`. This means that also in the parser it directly adds elements to the link of `RVarmap` created in the evaluator.
However, according to the previous description, the structure of `ruby_dyna_vars` differs between the parser and the evalutor. How does it deal with the difference in the way of attaching the header (`RVarmap` whose `id=0`)?
What is helpful here is the “1” of `local_push(1)` in `top_local_init()`. When the argument of `local_push()` becomes true, it does not attach the first header of `ruby_dyna_vars`. It means, it would look like Figure 1. Now, it is assured that we can refer to the block local variables of the outside scope from inside of a string to `eval`.
Figure 1: `ruby_dyna_vars` inside `eval`
Well, it’s sure we can refer to, but didn’t you say that `ruby_dyna_vars` is entirely freed in the parser? What can we do if the link created at the evaluator will be freed? … I’d like the readers who noticed this to be relieved by reading the next part.
2386 vp = ruby_dyna_vars; 2387 ruby_dyna_vars = vars; 2388 lex_strterm = 0; 2389 while (vp && vp != vars) { 2390 struct RVarmap *tmp = vp; 2391 vp = vp->next; 2392 rb_gc_force_recycle((VALUE)tmp); 2393 } (parse.y)
It is designed so that the loop would stop when it reaches the link created at the evaluator (`vars`).
`instance_eval`
The Whole Picture
The substance of `Module#module_eval` is `rb_mod_module_eval()`, and the substance of `Object#instance_eval` is `rb_obj_instance_eval()`.
5316 VALUE 5317 rb_mod_module_eval(argc, argv, mod) 5318 int argc; 5319 VALUE *argv; 5320 VALUE mod; 5321 { 5322 return specific_eval(argc, argv, mod, mod); 5323 } 5298 VALUE 5299 rb_obj_instance_eval(argc, argv, self) 5300 int argc; 5301 VALUE *argv; 5302 VALUE self; 5303 { 5304 VALUE klass; 5305 5306 if (rb_special_const_p(self)) { 5307 klass = Qnil; 5308 } 5309 else { 5310 klass = rb_singleton_class(self); 5311 } 5312 5313 return specific_eval(argc, argv, klass, self); 5314 } (eval.c)
These two methods have a common part as “a method to replace `self` with `class`”, that part is defined as `specific_eval()`. Figure 2 shows it and also what will be described. What with parentheses are calls by function pointers.
Figure 2: Call Graph
Whichever `instance_eval` or `module_eval`, it can accept both a block and a string, thus it branches for each particular process to `yield` and `eval` respectively. However, most of them are also common again, this part is extracted as `exec_under()`.
But for those who reading, one have to simultaneously face at 2 times 2 = 4 ways, it is not a good plan. Therefore, here we assume only the case when
- it is an `instance_eval`
- which takes a string as its argument
. And extracting all functions under `rb_obj_instance_eval()` in-line, folding constants, we’ll read the result.
After Absorbed
After all, it becomes very comprehensible in comparison to the one before being absorbed.
static VALUE instance_eval_string(self, src, file, line) VALUE self, src; const char *file; int line; { VALUE sclass; VALUE result; int state; int mode; sclass = rb_singleton_class(self); PUSH_CLASS(); ruby_class = sclass; PUSH_FRAME(); ruby_frame->self = ruby_frame->prev->self; ruby_frame->last_func = ruby_frame->prev->last_func; ruby_frame->last_class = ruby_frame->prev->last_class; ruby_frame->argc = ruby_frame->prev->argc; ruby_frame->argv = ruby_frame->prev->argv; if (ruby_frame->cbase != sclass) { ruby_frame->cbase = rb_node_newnode(NODE_CREF, sclass, 0, ruby_frame->cbase); } PUSH_CREF(sclass); mode = scope_vmode; SCOPE_SET(SCOPE_PUBLIC); PUSH_TAG(PROT_NONE); if ((state = EXEC_TAG()) == 0) { result = eval(self, src, Qnil, file, line); } POP_TAG(); SCOPE_SET(mode); POP_CREF(); POP_FRAME(); POP_CLASS(); if (state) JUMP_TAG(state); return result; }
It seems that this pushes the singleton class of the object to `CLASS` and `CREF` and `ruby_frame→cbase`. The main process is one-shot of `eval()`. It is unusual that things such as initializing `FRAME` by a struct-copy are missing, but this is also not create so much difference.
Before being absorbed
Though the author said it becomes more friendly to read, it’s possible it has been already simple since it was not absorbed, let’s check where is simplified in comparison to the before-absorbed one.
The first one is `specific_eval()`. Since this function is to share the code of the interface to Ruby, almost all parts of it is to parse the parameters. Here is the result of cutting them all.
5258 static VALUE 5259 specific_eval(argc, argv, klass, self) 5260 int argc; 5261 VALUE *argv; 5262 VALUE klass, self; 5263 { 5264 if (rb_block_given_p()) { 5268 return yield_under(klass, self); 5269 } 5270 else { 5294 return eval_under(klass, self, argv[0], file, line); 5295 } 5296 } (eval.c)
As you can see, this is perfectly branches in two ways based on whether there’s a block or not, and each route would never influence the other. Therefore, when reading, we should read one by one. To begin with, the absorbed version is enhanced in this point.
And `file` and `line` are irrelevant when reading `yield_under()`, thus in the case when the route of `yield` is absorbed by the main body, it might become obvious that we don’t have to think about the parse of these parameters at all.
Next, we’ll look at `eval_under()` and `eval_under_i()`.
5222 static VALUE 5223 eval_under(under, self, src, file, line) 5224 VALUE under, self, src; 5225 const char *file; 5226 int line; 5227 { 5228 VALUE args[4]; 5229 5230 if (ruby_safe_level >= 4) { 5231 StringValue(src); 5232 } 5233 else { 5234 SafeStringValue(src); 5235 } 5236 args[0] = self; 5237 args[1] = src; 5238 args[2] = (VALUE)file; 5239 args[3] = (VALUE)line; 5240 return exec_under(eval_under_i, under, under, args); 5241 } 5214 static VALUE 5215 eval_under_i(args) 5216 VALUE *args; 5217 { 5218 return eval(args[0], args[1], Qnil, (char*)args[2], (int)args[3]); 5219 } (eval.c)
In this function, in order to make its arguments single, it stores them into the `args` array and passes it. We can imagine that this `args` exists as a temporary container to pass from `eval_under()` to `eval_under_i()`, but not sure that it is truly so. It’s possible that `args` is modified inside `evec_under()`.
As a way to share a code, this is a very right way to do. But for those who read it, this kind of indirect passing is incomprehensible. Particularly, because there are extra castings for `file` and `line` to fool the compiler, it is hard to imagine what were their actual types. The parts around this entirely disappeared in the absorbed version, so you don’t have to worry about getting lost.
However, it’s too much to say that absorbing and extracting always makes things easier to understand. For example, when calling `exec_under()`, `under` is passed as both the second and third arguments, but is it all right if the `exec_under()` side extracts the both parameter variables into `under`? That is to say, the second and third arguments of `exec_under()` are, in fact, indicating `CLASS` and `CREF` that should be pushed. `CLASS` and `CREF` are “different things”, it might be better to use different variables. Also in the previous absorbed version, for only this point,
VALUE sclass = .....; VALUE cbase = sclass;
I thought that I would write this way, but also thought it could give the strange impression if abruptly only these variables are left, thus it was extracted as `sclass`. It means that this is only because of the flow of the texts.
By now, so many times, I’ve extracted arguments and functions, and for each time I repeatedly explained the reason to extract. They are
- there are only a few possible patterns
- the behavior can slightly change
Definitely, I’m not saying “In whatever ways extracting various things always makes things simpler”.
In whatever case, what of the first priority is the comprehensibility for ourself and not keep complying the methodology. When extracting makes things simpler, extract it. When we feel that not extracting or conversely bundling as a procedure makes things easier to understand, let us do it. As for `ruby`, I often extracted them because the original is written properly, but if a source code was written by a poor programmer, aggressively bundling to functions should often become a good choice.