coope/sec/encap-hacks.tex

\section{Encapsulating the Hacks}
\label{sec:encap-hacks}
Imagine jumping into a project in order to make a simple modification and then
seeing the code in \jsref{lst:prot-share}. This is a far cry from the simple
protected member declarations in traditional classical object-oriented
languages. In fact, there becomes a point where the hacks discussed in the
previous sections become unmaintainable messes that add a great deal of
boilerplate code with little use other than to distract from the actual
software itself.

However, we do not have to settle for those messy implementations. Indeed, we
can come up with some fairly elegant and concise solutions by encapsulating the
hacks we have discussed into a classical object-oriented framework, library or
simple helper functions. Let's not get ahead of ourselves too quickly; we will
start exploring basic helper functions before we deal with diving into a full,
reusable framework.

This section is intended for educational and experimental purposes. Before using
these examples to develop your own class system for ECMAScript, ensure that none
of the existing systems satisfy your needs; your effort is best suited toward
the advancement of existing projects than the segregation caused by the
introduction of additional, specialty frameworks.\footnote{That is not to
discourage experimentation. Indeed, one of the best, most exciting and fun ways
to learn about these concepts are to implement them yourself.} These are
discussed a bit later.

\subsection{Constructor/Prototype Factory}
\label{sec:ctor-factory}
Section~\ref{sec:extending} offered one solution to the problem of creating an
extensible constructor, allowing it to be used both to instantiate new objects
and as a prototype. Unfortunately, as \jsref{lst:ctor-extend} demonstrated, the
solution adds a bit of noise to the definition that will also be duplicated for
each constructor. The section ended with the promise of a cleaner, reusable
implementation. Perhaps we can provide that.

Consider once again the issue at hand. The constructor, when called
conventionally with the \operator{new} operator to create a new instance, must
perform all of its construction logic. However, if we wish to use it as a
prototype, it is unlikely that we want to run \emph{any} of that logic --- we
are simply looking to have an object containing each of its members to use as a
prototype without the risk of modifying the prototype of the constructor in
question. Now consider how this issue is handled in other classical languages:
the \keyword{extend} keyword.

ECMAScript has no such keyword, so we will have to work on an implementation
ourselves. We cannot use the name \func{extend()}, as it is a reserved
name;\footnote{Perhaps for future versions of ECMAScript.} as such, we will
start with a simple \func{Class} factory function with which we can create new
``classes'' without supertypes. We can than provide a \func{Class.extend()}
method to define a ``class'' \emph{with} a supertype.

\lstinputlisting[%
  label=lst:ctor-factory,
  caption=Constructor factory,
  lastline=60,
]{lst/ctor-factory.js}

\jsref{lst:ctor-factory} demonstrates one such possible implementation of a
constructor factory. Rather than thinking of ``creating a class'' and ``creating
a class with a supertype'' as two separate processes, it is helpful to consider
them one and the same; instead, we can consider the former to be ``creating a
class \emph{with an empty supertype}''. As such, invoking \func{Class()} simply
calls \func{Class.extend()} with \keyword{null} for the base (on line 6),
allowing \func{Class.extend()} to handle the creation of a new constructor
without a supertype.

Both \func{Class()} and \func{Class.extend()} accept a \var{dfn} argument, which
we will refer to as the \dfn{definition object}; this object is to contain each
member that will appear on the prototype of the new constructor. The \var{base}
parameter, defined on \func{Class.extend()}, denotes the constructor from which
to extend (the constructor that will be instantiated and used as the prototype).
Line 11 will default \var{base} to an empty function if one has not been
provided (mainly, to satisfy the \func{Class()} call on line 6).

With that, we can now continue onto creating our constructor, beginning on line
16. Section~\ref{sec:extending} introduced the concept of using an
\var{extending} flag to let the constructor know when to avoid all of its
construction logic if being used only as a prototype (see
\jsref{lst:ctor-extend}). The problem with this implementation, as discussed,
was that it required that \emph{each} constructor that wishes to use this
pattern implement it themselves, violating the DRY\footnote{``Don't repreat
yourself'', \emph{The Pragmatic Programmer}.} principle. There were two main
areas of code duplication in \jsref{lst:ctor-extend} --- the checking of the
\var{extending} flag in the constructor and the setting (and resetting) of the
flag in \func{F.asPrototype()}. In fact, we can eliminate the
\func{asPrototype()} method altogether once we recognize that its entire
purpose is to set the flags before and after instantiation.

To address the first code duplication issue --- the checking of the flag in the
constructor --- we must remove the need to perform the check manually for each
and every constructor. The solution, as demonstrated in
\jsref{lst:ctor-factory}, is to separate our generic constructor logic (shared
between all constructors that use the factory) from the logic that can vary
between each constructor. \var{ctor} on line 16 accomplishes this by first
performing the \var{extending} check (lines 19--22) and then forwarding all
arguments to a separate function (\func{\_\_construct()}), if defined, using
\func{Function.apply()} (lines 25--28). One could adopt any name for the
constructor method; it is not significant.\footnote{The \code{\_\_construct}
name was taken from PHP.} Note that the first argument to
\func{Function.apply()} is important, as it will ensure that \keyword{this} is
properly bound within the \func{\_\_construct()} method.

To address the second code duplication issue and remove the need for
\func{asPrototype()} in \jsref{lst:ctor-extend} entirely, we can take advantage
of the implications of \func{Class.extend()} in \jsref{lst:ctor-factory}. The
only time we wish to use a constructor as a prototype and skip
\func{\_\_construct()} is during the process of creating a new constructor. As
such, we can simply set the \var{extending} flag to \keyword{true} when we begin
creating the new constructor (see line 14, though this flag could be placed
anywhere before line 31) and then reset it to \keyword{false} once we are done
(line 38). With that, we have eliminated the code duplication issues associated
with \jsref{lst:ctor-extend}.

The remainder of \jsref{lst:ctor-factory} is simply an abstraction around the
manual process we have been performing since section~\ref{sec:proto} --- setting
the prototype, properly setting the constructor and extending the prototype
with our own methods. Recall section~\ref{sec:prot} in which we had to manually
assign each member of the prototype for subtypes in order to ensure that we did
not overwrite the existing prototype members (e.g. \func{M.prototype.push()} in
\jsref{lst:prot-share}). The very same issue applies here: Line 31 first sets
the prototype to an instance of \var{base}. If we were to then set
\code{ctor.prototype = dfn}, we would entirely overwrite the benefit gained from
specifying \var{base}. In order to automate this manual assignment of each
additional prototype member of \var{dfn}, \func{copyTo()} is provided, which
accepts two arguments --- a destination object \var{dest} to which each given
member of \var{members} should be copied (defined on line 43 and called on line
34).

Like the examples provided in section~\ref{sec:hack-around}, we
use a self-executing function to hide the implementation details of our
\func{Class} function from the rest of the world.

To demonstrate use of the constructor factory, \jsref{lst:ctor-factory-ex}
defines two classes\footnote{The reader should take care in noting that the term
``class'', as used henceforth, will refer to a class-like object created using
the systems defined within this article. ECMAScript does not support classes, so
the use of the term ``class'' in any other context is misleading.} --- \var{Foo}
and \var{SubFoo}. Note that how, by placing the curly braces on their own line,
we can create the illusion that \func{Class()} is a language construct:

\lstinputlisting[%
  label=lst:ctor-factory-ex,
  caption=Demonstrating the constructor factory,
  firstline=62,
  firstnumber=last
]{lst/ctor-factory.js}

The reader should note that an important assertion has been omitted for brevity
in \jsref{lst:ctor-factory}. Consider, for example, what may happen in the case
of the following:

\begin{verbatim}
Class.extend( "foo", {} );
\end{verbatim}

It is apparent that \code{"foo"} is not a function and therefore cannot be used
with the \operator{new} operator. Given that, consider line 31, which blindly
invokes \code{base()} without consideration for the very probable scenario that
the user mistakenly (due to their own unfamiliarity or a simple bug) provided us
with a non-constructor for \var{base}. The user would then be presented with a
valid, but not necessarily useful error --- did the error occur because of user
error, or due to a bug in the factory implementation?

To avoid confusion, it would be best to perform a simple assertion before
invoking \var{base} (or wrap the invocation in a try/catch block, although doing
so is not recommended in case \func{base()} throws an error of its own):

\begin{verbatim}
if ( typeof base !== 'function' )
{
    throw TypeError( "Invalid base provided" );
}
\end{verbatim}

Note also that, although this implementation will work with any constructor as
\var{base}, only those created with \func{Class()} will have the benefit of
being able to check the \var{extending} flag. As such, when using
\func{Class.extend()} with third-party constructors, the issue of extensible
constructors may still remain and is left instead in the hands of the developer
of that base constructor.

\subsubsection{Factory Conveniences}
Although our constructor factory described in section~\ref{sec:ctor-factory} is
thus far very simple, one should take the time to realize what a powerful
abstraction has been created: it allows us to inject our own code in any part of
the constructor creation process, giving us full control over our class-like
objects. Indeed, this abstraction will be used as a strong foundation going
forward throughout all of section~\ref{sec:encap}. In the meantime, we can take
advantage of it in its infancy to provide a couple additional conveniences.

First, consider the syntax of \func{Class.extend()} in \jsref{lst:ctor-factory}.
It requires the extending of a constructor to be done in the following manner:

\begin{verbatim}
var SubFoo = Class.extend( Foo, {} );
\end{verbatim}

Would it not be more intuitive to instead be able to extend a constructor in the
following manner?

\begin{verbatim}
var SubFoo = Foo.extend( {} );
\end{verbatim}

The above two statements are semantically equivalent --- they define a subtype
\var{SubFoo} that extends from the constructor \var{Foo} --- but the latter
example is more concise and natural. Adding support for this method is trivial,
involving only a slight addition to \jsref{sec:ctor-factory}'s \func{C.extend()}
method, perhaps around line 30:

\lstinputlisting[%
  label=lst:ctor-factory-sextend,
  caption=Adding a static \func{extend()} method to constructors,
  firstnumber=31
]{lst/ctor-factory-sextend.js}

Of course, one should be aware that this implementation is exploitable in that,
for example, \func{Foo.extend()} could be reassigned at any point. As such,
using \func{Class.extend()} is the safe implementation, unless you can be
certain that such a reassignment is not possible. Alternatively, in ECMAScript 5
and later environments, one can use \func{Object.defineProperty()}, as discussed
in sections~\ref{sec:encap-naive} and \ref{sec:encap-proper}, to make the method
read-only.

Now consider the instantiation of our class-like objects, as was demonstrated in
\jsref{lst:ctor-factory-ex}:

\begin{verbatim}
var inst = new Foo( "Name" );
\end{verbatim}

We can make our code even more concise by eliminating the \operator{new}
operator entirely, allowing us to create a new instance as such:

\begin{verbatim}
var inst = Foo( "Name" );
\end{verbatim}

Of course, our constructors do not yet support this, but why may we want such a
thing? Firstly, for consistency --- the core ECMAScript constructors do not
require the use of the keyword, as has been demonstrated throughout this article
with the various \var{Error} types. Secondly, the omission of the keyword would
allow us to jump immediately into calling a method on an object without dealing
with awkward precedence rules: \code{Foo( "Name" ).getName()} vs. \code{( new
Foo( "Name" ) ).getName()}. However, those reasons exist more to offer syntactic
sugar; they do little to persuade those who do want or not mind the
\operator{new} operator.

The stronger argument against the \operator{new} operator is what happens should
someone \emph{omit} it, which would not be at all uncommon since the keyword is
not required for the core ECMAScript constructors. Recall that \keyword{this},
from within the constructor, is bound to the new instance when invoked with the
\operator{new} operator. As such, we expect to be able to make assignments to
properties of \keyword{this} from within the constructor without any problems.
What, then, happens if the constructor is invoked \emph{without} the keyword?
\keyword{this} would instead be bound (according to the ECMAScript
standard\cite{es5-call}) to ``the global object'',\footnote{In most browser
environments, the global object is \var{window}.} unless in strict mode. This is
dangerous:

\lstinputlisting[%
  label=lst:new-global,
  caption=Introducing unintended global side-effects with constructors
]{lst/new-global.js}

Consider \jsref{lst:new-global} above. Function \func{Foo()}, if invoked with
the \operator{new} operator, results in an object with a \var{Boolean} property
equal to \keyword{true}. However, if we were to invoke \func{Foo()}
\emph{without} the \operator{new} operator, this would end up \emph{overwriting
the built-in global \var{Boolean} object reference}. To solve this problem,
while at the same time providing the consistency and convenience of being able
to either include or omit the \operator{new} operator, we can add a small block
of code to our generated constructor \var{ctor} (somewhere around line 23 of
\jsref{lst:ctor-factory}, after the extend check but before
\func{\_\_construct()} is invoked):

\lstinputlisting[%
  label=lst:new-global-fix,
  caption=Allowing for omission of the \operator{new} operator,
  firstnumber=24
]{lst/new-global-fix.js}

The check, as demonstrated in \jsref{lst:new-global-fix}, is as simple as
ensuring that \keyword{this} is properly bound to a \emph{new instance of our
constructor \var{ctor}}. If not, the constructor can simply return a new
instance of itself through a recursive call.

Alternatively, the reader may decide to throw an error instead of automatically
returning a new instance. This would require the use of the \operator{new}
operator for instantiation, while still ensuring that the global scope will not
be polluted with unnecessary values. If the constructor is in strict mode, then
the pollution of the global scope would not be an issue and the error would
instead help to point out inconsistencies in the code. However, for the reason
that the keyword is optional for many core ECMAScript constructors, the author
recommends the implementation in \jsref{lst:new-global-fix}.

\subsection{Private Member Encapsulation}
Section~\ref{sec:encap} discussed the encapsulation of private member data
by means of private property and method objects, avoiding the performance impact
of privileged members (see section~\ref{sec:privileged}). In order to avoid
memory leaks, the private data was stored on the instance itself rather than a
truly encapsulated object. The amount of code required for this implementation
is relatively small, but it is still repeated unnecessarily between all
constructors.

The private member implementation had two distinct pieces --- private
properties, as demonstrated in \jsref{lst:encap-inst}, and private methods, as
demonstrated in \jsref{lst:method-priv}. This distinction is important, as
private methods should not be redefined for each new instance (see
\fref{fig:proto-priv-cmp}). Properties, however, \emph{must} have their values
copied for each new instance to prevent references from being shared between
multiple instances (see \jsref{lst:proto-reuse}; this is not an issue for
scalars). For the time being, we will focus on the method implementation and
leave the manual declaration of private properties to the \func{\_\_construct()}
method.

The listings in section~\ref{sec:encap} were derived from a simple concept ---
the private member objects were within the scope of the prototype members.
However, if we are encapsulating this hack within our constructor factory, then
the members (the definition object) would be declared \emph{outside} the scope
of any private member objects that are hidden within our factory. To expose the
private ``prototype'' object, we could accept a function instead of a definition
object, which would expose a reference to the object (\jsref{lst:prot-func}).
However, this would be very unnatural and unintuitive. To keep our ``class''
declarations simple, another method is needed.

Consider the private member concept in a classical sense --- the data should be
available only to the methods of the class, but should not be accessible outside
of them. That is, given any class \code{C} with private property \code{C.\_priv}
and public method \code{C.getPrivValue()}, and an instance \code{i} of class
\code{C}, \code{i.\_priv} should not be defined unless within the context of
\code{i.getPrivValue()}. Consider then the only means of exposing that data to
the members of the prototype in ECMAScript without use of closures: through the
instance itself (\keyword{this}). This naturally derives an implementation that
had not been previously considered due to the impracticality of its use without
an automated factory --- exposing private members before a method invocation and
revoking them after the method has returned.

To accomplish this, the factory must be able to intelligently determine when a
method is being invoked. This leads us into a somewhat sensitive topic ---
function wrapping. In order to perform additional logic on invocation of a
particular method, it must be wrapped within another function. This
\dfn{wrapper} would expose the private data on \keyword{this}, invoke the
original function associated with the method call, remove the reference and then
return whatever value was returned by the original function. This creates the
illusion of invoking the method directly.\footnote{This is the same concept used
to emulate \code{Function.bind()} in pre-ECMAScript 5 environments. This concept
can also be easily extended to create \dfn{partially applied functions}.}

\lstinputlisting[%
  label=lst:func-wrap,
  caption=Wrapping a function by returning a \emph{new} function which calls the
  original,
  lastline=16
]{lst/func-wrap.js}

\jsref{lst:func-wrap} demonstrates the basic concept of a function wrapper.
\func{wrap()} accepts a single argument, \var{func}, and returns a new anonymous
function which invokes \var{func}, returning its value with a prefix and suffix.
Note how all arguments are forwarded to \var{func}, allowing us to invoke our
wrapped function as if it were the original. Also note the context in which
\var{func} is being called (the first argument of \func{apply()}). By binding
\keyword{this} of \var{func} to \keyword{this} of our wrapper, we are
effectively forwarding it. This detail is especially important if we are using
a wrapper within a prototype, as we \emph{must} bind \keyword{this} to the
instance that the method is being invoked upon. Use of \func{wrap()} with a
prototype is demonstrated in \jsref{lst:func-wrap-ex} below.

\lstinputlisting[%
  label=lst:func-wrap-ex,
  caption=Using \func{wrap()} from \jsref{lst:func-wrap} with prototypes,
  firstnumber=last,
  firstline=20
]{lst/func-wrap.js}

It is this concept that will be used to implement method wrapping in our
constructor factory. For each function $f$ of definition object $D$, $f'$ will
be created using a method similar to \jsref{lst:func-wrap-ex}. $f'$ will invoke
$f$ after setting the private member object on \keyword{this}, then reset it
after $f$ returns. Finally, the return value of $f$ will be returned by $f'$. It
should be noted that $f'$ must exist even if $f$ is public, since public methods
may still need access to private members.\footnote{As we will see in the
examination of \fref{fig:func-wrap-perf}, the performance impact of this
decision is minimal.}

\begin{figure*}[t]
\center
\subfloat[%
  Wrapper performance \emph{(invocation only)}. Operations per second rounded to
  millions.\cite{jsperf-func-wrap} Numbers in parenthesis indicate percent
  change between the two values, indicating a significant performance loss.
]{
  \input{data/func-wrap-invoke.tex}
  \label{fig:func-wrap-perf-invoke}
}
\quad
\subfloat[%
  Wrapper performance \emph{with business logic}
  (\code{(new Array(100)).join(',|').split('|')}); performance
  impact is negligible. Operations per second.\cite{jsperf-func-wrap-blogic}
]{
  \input{data/func-wrap-blogic.tex}
  \label{fig:func-wrap-perf-blogic}
}
\caption{Function wrapping performance considerations. When measuring invocation
performance, the wrapper appears to be a terrible solution to any problem.
However, when considering the business logic the remainder of the software is
likely to contain, the effects of the wrapper are negligible. As such, worrying
about the wrapper is likely to be a micro-optimization, unless dealing with
call stack limitations. The wrapper in these tests simply invokes the wrapped
method with \code{Function.apply()}, forwarding all arguments.}
\label{fig:func-wrap-perf}
\end{figure*}

Many readers are likely to be concerned about a decision that wraps every
function of our definition object, as this will require two function calls each
time a method is invoked. \fref{fig:func-wrap-perf-invoke} shows why this detail
is likely to be a concern --- invoking our wrapped function is so slow in
comparison to invoking the original function directly that the solution seems
prohibitive. However, one must consider how functions are \emph{actually} used
--- to perform some sort of business logic. It is rare that we would invoke
bodiless functions continuously in a loop. Rather, we should take into
consideration the \emph{percent change between function invocations that contain
some sort of business logic}. This is precisely what
\frefpg{fig:func-wrap-perf-blogic} takes into consideration, showing that our
invocation worry is would actually be a micro-optimization. For example, in
software that performs DOM manipulation, the performance impact of
wrapper invocation is likely to be negligible due to repaints being highly
intensive operations.

One legitimate concern of our wrapper implementation, however, is limited
call stack space. The wrapper will effectively cut the remaining stack space in
half if dealing with recursive operations on itself, which may be a problem
for environments that do not support tail call optimizations, or for algorithms
that are not written in such a way that tail call optimizations can be
performed.\footnote{Another concern is that the engine may not be able to
perform tail call optimization because the function may recurse on the wrapper
instead of itself.} In such a situation, we can avoid the problem entirely by
recommending that heavy recursive algorithms do not invoke wrapped methods;
instead, the recursive operation can be performed using ``normal'' (unwrapped)
functions and its result returned by a wrapped method call.

\begin{figure}[t]
  \center
  \input{data/stack-limits.tex}
  \caption{Call stack limits of various common browsers. \cite{oreilly-hpj}
  Determining the call stack limit for your own environment is as simple as
  incrementing a counter for each recursive call until an error is thrown.}
  \label{fig:stack-limits}
\end{figure}

That said, call stack sizes for ECMAScript environments are growing increasingly
larger. Call stack limits for common browsers (including historical versions for
comparison) are listed in \frefpg{fig:stack-limits}. Should this limit be
reached, another alternative is to use \func{setTimeout()} to reset the stack
and continue the recursive operation. This can also have the benefit of making
the operation asynchronous.

Factoring this logic into the constructor factory is further complicated by our
inability to distinguish between members intended to be public and those
intended to be private. In section~\ref{sec:encap}, this issue was not a concern
because the members could be explicitly specified separately per implementation.
With the factory, we are provided only a single definition object; asking for
multiple would be confusing, messy and unnatural to those coming from other
classical object-oriented languages. Therefore, our second task shall be to
augment \func{copyTo()} in \jsref{lst:ctor-factory} to distinguish between
public and private members.

Section~\ref{sec:privileged} mentioned the convention of using a single
underscore as a prefix for member names to denote a private member (e.g.
\code{this.\_foo}). We will adopt this convention for our definition object, as
it is both simple and performant (only a single-character check). Combining this
concept with the wrapper implementation, we arrive at
\jsref{lst:ctor-factory-priv}.

\lstinputlisting[%
  label=lst:ctor-factory-priv,
  caption=Altering the constructor factory in \jsref{lst:ctor-factory} to
  support private methods in a manner similar to \jsref{lst:method-priv},
  lastline=97
]{lst/ctor-factory-priv.js}

(INCOMPLETE.)