coope/sec/class-like.tex

558 lines
25 KiB
TeX

\section{Class-Like Objects in ECMAScript}
\label{sec:class-like}
JavaScript is a multi-paradigm scripting language standardized by ECMAScript,
incorporating object-oriented, functional and imperative styles. The
Object-Oriented paradigm in itself supports two sub-paradigms - prototypal and
classical, the latter of which is popular in languages such as Java, C++,
Python, Perl, Ruby, Lisp, PHP, Smalltalk, among many others. ECMAScript itself
is prototypal.
The creation of objects in ECMAScript can be as simple as using an object
literal, as defined by curly braces:
\begin{verbatim}
var obj = { foo: "bar" };
\end{verbatim}
In a classical sense, object literals can be thought of as anonymous
singletons; \cite{gof} that is, they have no name (they are identified by
the variable to which they are assigned) and only one instance of the literal
will exist throughout the life of the software.\footnote{Technically, one could
set the prototype of a constructor to be the object defined by the literal (see
\jsref{lst:proto-reuse}), however the resulting instances would be prototypes,
not instances of a common class shared by the literal and each subsequent
instance.} For example, calling a function that returns the same object literal
will return a distinct, entirely unrelated object for each invocation:
\begin{verbatim}
function createObj()
{
return { name: "foo" };
}
createObj() !== createObj();
\end{verbatim}
Using this method, we can create basic objects that act much like class
instances, as demonstrated in \jsref{lst:singleton}:
\begin{lstlisting}[%
label=lst:singleton,
caption=A ``singleton'' with properties and methods
]
var obj = {
name: "Foo",
setName: function( val )
{
obj.name = val;
},
getName: function()
{
return obj.name;
}
};
obj.getName(); // "Foo"
obj.setName( "Bar" );
obj.getName(); // "Bar"
\end{lstlisting}
\subsection{Prototypes}
\label{sec:proto}
We could re-use \var{obj} in \jsref{lst:singleton} as a \dfn{prototype},
allowing instances to inherit its members. For example:
\begin{lstlisting}[%
label=lst:proto-reuse,
caption=Re-using objects as prototyes \bf{(bad)},
firstnumber=last
]
function Foo() {}
Foo.prototype = obj;
var inst1 = new Foo(),
inst2 = new Foo();
inst2.setName( "Bar" );
inst1.getName(); // "Bar"
inst2.getName(); // "Bar"
\end{lstlisting}
In \jsref{lst:proto-reuse} above, we define \var{Foo} to be a
\dfn{constructor}\footnote{A ``constructor'' in ECMAScript is simply any
function intended to be invoked, often (but not always) with the \operator{new}
operator, that returns a new object whose members are derived from the
function's \var{prototype} property.} with our previous object \var{obj} as its
prototype. Unfortunately, as shown in \jsref{lst:singleton}, \var{name} is being
set on \var{obj} itself, which is a prototype shared between both instances.
Setting the name on one object therefore changes the name on the other (and,
indeed, all instances of \var{Foo}). To illustrate this important concept,
consider \jsref{lst:proto-mod} below, which continues from
\jsref{lst:proto-reuse}:
\begin{lstlisting}[%
label=lst:proto-mod,
caption=The effect of prototypes on instances,
firstnumber=last
]
obj.foo = "bar";
inst1.foo; // "bar"
inst2.foo; // "bar"
\end{lstlisting}
Clearly, this is not how one would expect class-like objects to interact; each
object is expected to have its own state. When accessing a property of an
object, the members of the object itself are first checked. If the member
is not defined on the object itself,\footnote{Note that ``not defined'' does not
imply \emph{undefined}; \code{undefined} is a value.} then the prototype chain
is traversed. Therefore, we can give objects their own individual state by
defining the property on the individual instances, rather than the prototype, as
shown in \jsref{lst:inst-prop}.\footnote{Also demonstrated in
\jsref{lst:inst-prop} is the effect of the \keyword{delete} keyword, which
removes a member from an object, allowing the values of the prototype to ``peek
through`` as if a hole exists in the object. Setting the value to
\code{undefined} will not have the same effect, as it does not produce the
``hole''; the property would return \code{undefined} rather than the value on
the prototype.}
\begin{lstlisting}[%
label=lst:inst-prop,
caption=Setting properties per-instance,
firstnumber=last
]
inst1.foo = "baz";
inst1.foo; // "baz"
inst2.foo; // "bar"
delete inst1.foo;
inst1.foo; // "bar"
\end{lstlisting}
This does not entirely solve our problem. As shown in \jsref{lst:singleton}, our
\var{obj} prototype's methods (\func{getName()} and \func{setName()}) reference
\code{obj.name} - our prototype. \jsref{lst:proto-ref} demonstrates the problem
this causes when attempting to give each instance its own state in regards to
the \var{name} property:
\begin{lstlisting}[%
label=lst:proto-ref,
caption=Referencing prototype values in \var{obj} causes problems with
per-instance data,
firstnumber=last
]
// ...
inst1.name = "My Name";
inst1.getName(); // "Foo"
\end{lstlisting}
ECMAScript solves this issue with the \keyword{this} keyword. When a
method\footnote{A \dfn{method} is simply an invokable property of an object (a
function).} of an instance's prototype is invoked, \keyword{this} is bound, by
default,\footnote{One can override this default behavior with
\func{Function.call()} or \func{Function.apply()}.} to a reference of that
instance. Therefore, we can replace \var{obj} in \jsref{lst:singleton} with the
prototype definition in \jsref{lst:proto-proper} to solve the issue
demonstrated in \jsref{lst:proto-ref}:
\begin{lstlisting}[%
label=lst:proto-proper,
caption=Re-using objects as prototypes \bf{(good)}
]
function Foo( name )
{
this.name = name;
};
Foo.prototype = {
setName = function( name )
{
this.name = name;
},
getName = function()
{
return this.name;
}
};
var inst = new Foo( "Bar" );
inst.name; // "Bar"
inst.getName(); // "Bar"
inst.setName( "Baz" );
inst.getName(); // "Baz"
inst.name = "Foo";
inst.getName(); // "Foo"
\end{lstlisting}
\jsref{lst:proto-proper} shows that \keyword{this} is also bound to the new
instance from within the constructor; this allows us to initialize any
properties on the new instance before it is returned to the caller.\footnote{It
is worth mentioning that one can explicitly return an object from the
constructor, which will be returned in place of a new instance.} Evaluation of
the example yields an additional concern --- the observation that all object
members in ECMAScript are public.\footnote{That is not to say that encapsulation
is not possible; this statement is merely emphasizing that properties of objects
do not support access modifiers. We will get into the topic of encapsulation a
bit later.} Even though the \var{name} property was initialized within the
constructor, it is still accessible outside of both the constructor and the
prototype. Addressing this concern will prove to be an arduous process that
will be covered at great length in the following sections. For the time being,
we will continue discussion of conventional techniques, bringing us to the
concept of \dfn{privileged members}.
\subsection{Privileged Members}
\label{sec:privileged}
The concept of \dfn{encapsulation} is a cornerstone of classical object-oriented
programming. Unfortunately, as \jsref{lst:proto-proper} demonstrates, it becomes
difficult to encapsulate data if all members of a given object are accessible
publicly. One means of addressing this issue is to take advantage of the fact
that functions introduce scope, allowing us to define a local variable (or use
an argument) within the constructor that is only accessible to the
\dfn{privileged member} \func{getName()}.
\lstinputlisting[%
label=lst:privileged,
caption=Using privileged members to encapsulate data
]{lst/privileged-members.js}
If \var{name} in \jsref{lst:privileged} is encapsulated within the constructor,
our methods that \emph{access} that encapsulated data must \emph{too} be
declared within the constructor;\footnote{One may mix prototypes and privileged
members.} otherwise, if placed within the prototype, \var{name} would be out of
scope. This implementation has an unfortunate consequence --- our methods are
now being \emph{redeclared} each and every time a new instance of \var{Foo} is
created, which has obvious performance penalties (see
\fref{fig:proto-priv-cmp}).\footnote{As a general rule of thumb, one should only
use privileged members for methods that access encapsulated data; all other
members should be part of the prototype.}
\begin{figure}
\center
\begin{tabular}{r|r|r|r|}
\cline{2-4}
& Heap Usage
& \multicolumn{1}{|c|}{Inst. Time}
& \multicolumn{1}{|c|}{Call Time} \\
\hline
\multicolumn{1}{|r|}{\jsref{lst:proto-proper}}
& 49.7M & 234ms & 17ms \\
\multicolumn{1}{|r|}{\jsref{lst:privileged}}
& 236.0M & 1134ms & 28ms \\
\hline
\multicolumn{1}{|r|}{\% Change} & 374.8\% & 384.6\% & 64.7\% \\
\hline
\end{tabular}
\caption{Comparing performance of privileged member and prototype
implementations under v8. The heap usage column represents the
heap usage after instantiating \var{Foo} under the respective implementation $n$
times, and the Inst. CPU column reflects the amount of time spent instantiating
the $n$ objects. The Call CPU column reflects the amount of time spent invoking
\emph{each} member of \emph{one} instance $n$ times. $n = 1,000,000$. Lower
numbers are better. Different environments may have different results.}
\label{fig:proto-priv-cmp}
\end{figure}
Due to these performance concerns, it is often undesirable to use privileged
members; many developers will instead prefix, with an underscore, members
intended to be private (e.g. \code{this.\_name}) while keeping all methods on
the prototype.\footnote{One example of a library that uses underscores in place
of privileged members is Dojo at http://dojotoolkit.org.} This serves as a clear
indicator that the API is not public, is subject to change in the future and
should not be touched. It also allows the property to be accessed by
subtypes,\footnote{The term ``subtype'' is not truly the correct term here.
Rather, the term in this context was meant to imply that an instance of the
constructor was used as the prototype for another constructor, acting much like
a subtype (child class).} acting like a protected member. Unfortunately, this
does not encapsulate the data, so the developer must trust that the user will
not tamper with it.
\subsection{Subtypes and Polymorphism}
In classical terms, \dfn{subtyping} (also known as \dfn{subclassing}) is the act of
extending a \dfn{supertype} (creating a \dfn{child} class from a \dfn{parent})
with additional functionality. The subtype is said to \dfn{inherit} its members
from the supertype.\footnote{In the case of languages that support access
modifiers, only public and protected members are inherited.} Based on our prior
examples in section~\ref{sec:proto}, one could clearly see how the prototype of
any constructor could be replaced with an instance of another constructor,
indefinitely, to achieve an inheritance-like effect. This useful consequence of
the prototype model is demonstrated in \jsref{lst:subtype}.\footnote{Unfortunately, a
responsible implementation is not all so elegant in practice.}
\begin{lstlisting}[%
label=lst:subtype,
caption=Extending prototypes (creating subtypes) in ECMAScript
]
var SubFoo = function( name )
{
// call parent constructor
Foo.call( this, name );
};
SubFoo.prototype = new Foo();
SubFoo.prototype.constructor = SubFoo;
// build upon (extend) Foo
SubFoo.prototype.hello = function()
{
return "Hello, " + this.name;
};
var inst = new SubFoo( "John" );
inst.getName(); // "John"
inst.hello(); // "Hello, John"
\end{lstlisting}
Consider the implications of \jsref{lst:subtype} with a close eye. This
extension of \var{Foo} is rather verbose. The first (and rather unpleasant fact
that may be terribly confusing to those fairly inexperienced with ECMAScript)
consideration to be made is \var{SubFoo}'s constructor. Note how the supertype
(\var{Foo}) must be invoked \emph{within the context of
\var{SubFoo}}\footnote{If \func{Function.call()} or \func{Function.apply()} are
not properly used, the function will, depending on the environment, assign
\keyword{this} to the global scope, which is absolutely not what one wants. In
strict mode, this effect is mitigated, but the result is still not what we
want.} in order to initialize the variables.\footnote{If the constructor accepts
more than a few arguments, one could simply do: \code{Foo.apply( this, arguments
);}} However, once properly deciphered, this call is very similar to invocation
of parent constructors in other languages.
Following the definition of \var{SubFoo} is its prototype (line 6). Note from
section~\ref{sec:proto} that the prototype must contain the members that are to
be accessible to any instances of the constructor. If we were to simply assign
\var{Foo} to the prototype, this would have two terrible consequences, the
second of which will be discussed shortly. The first consequence would be that
all members of \var{Foo} \emph{itself} would be made available to instances of
\var{SubFoo}. In particular, you would find that \code{( new SubFoo()
).prototype === Foo.prototype}, which is hardly your intent. As such, we must
use a new instance of \var{Foo} for our prototype, so that the prototype
contains the appropriate members.
We follow the prototype assignment with another alien declaration --- the
setting of \code{SubFoo.prototype.constructor} on line 7. To understand why this
is necessary, one must first understand that, given any object \var{o} such that
\code{var o = new O()}, \code{o.constructor === O}.\footnote{One could apply
this same concept to other core ECMAScript objects. For example, \code{(
function() \{\}).constructor === Function}, \code{[].constructor === Array},
\code{\{\}.constructor === Object}, \code{true.constructor === Boolean} and
sofourth.} Recall from section~\ref{sec:proto} that values ``peek through
holes'' in the prototype chain. In this case, without our intervention,
\code{SubFoo.prototype.constructor === Foo} because \code{SubFoo.prototype = new
Foo()}. The \var{constructor} property is useful for reflection, so it is
important that we properly set this value to the appropriate constructor ---
\var{SubFoo}. Since \var{SubFoo.prototype} is an \emph{instance} of \var{Foo}
rather than \var{Foo} itself, the assignment will not directly affect \var{Foo}.
This brings us to our aforementioned second consequence of assigning
\code{SubFoo.prototype} to a \emph{new} instance of \var{Foo} --- extending the
prototype by adding to or altering existing values would otherwise change the
supertype's constructor, which would be an unintentional side-effect
that could have drastic consequences on the software.
As an example of extending the prototype (we have already demonstrated
overwriting the \var{constructor} and this concept can be applied to overriding
any members of the supertype), method \var{hello()} has been included in
\jsref{lst:subtype} on line 10. Note that \keyword{this} will be bound to the
instance that the method is being invoked upon, since it is referenced within
the prototype. Also note that we are assigning the function in a slightly
different manner than in \jsref{lst:proto-proper}; this is necessary to ensure
that we do not overwrite the prototype we just declared. Any additional members
must be declared explicitly in this manner, which has the negative consequence
of further increasing the verbosity of the code.
An instance of a subtype can be used in place of any of its supertypes in a
concept known as \dfn{polymorphism}. \jsref{lst:poly} demonstrates this concept
with \func{getFooName()}, a function that will return the name of any object of
type \var{Foo}.\footnote{Please note that the \operator{typeof} operator is not
appropriate in this situation, as both instances of \var{Foo} and \var{SubFoo}
would be considered typeof ``object''. The \operator{instanceof} operator is
appropriate when determining types of objects in terms of their
constructor.}
\begin{lstlisting}[%
label=lst:poly,
caption=Polymorphism in ECMAScript
]
function getFooName( foo )
{
if ( !( foo instanceof Foo ) )
{
throw TypeError(
"Expected instance of Foo"
);
}
return foo.getName();
}
var inst_parent = new Foo( "Parent" ),
inst_child = new SubFoo( "Child" );
getFooName( inst_parent ); // "Parent"
getFooName( inst_child ); // "Child"
getFooName( {} ); // throws TypeError
\end{lstlisting}
The concepts demonstrated in this section could be easily used to extend
prototypes indefinitely, creating what is called a \dfn{prototype chain}. In the
case of an instance of \var{SubFoo}, the prototype chain of most environments
would likely be: \var{SubFoo}, \var{Foo}, \var{Object} (that is,
\code{Object.getPrototypeOf( new SubFoo() ) === SubFoo}, and so
fourth).\footnote{ECMAScript 5 introduces \code{Object.getPrototypeOf()}, which
allows retrieving the prototype of an object (instance). Some environments also
support the non-standard \var{\_\_proto\_\_} property, which is a JavaScript
extension.} Keep in mind, however, that the further down the prototype chain the
engine must traverse in order to find a given member, the greater the
performance impact.
Due to the method used to ``extend'' prototypes, it should also be apparent that
multiple inheritance is unsupported by ECMAScript, as each each constructor may
only have one \var{prototype} property.\footnote{Multiple inheritance is
well-known for its problems. As an alternative, styles of programming similar to
the use of interfaces and traits/mixins in other languages are recommended and
are possible in ECMAScript.}
\subsubsection{Extensible Constructors}
\label{sec:ext-ctor}
Before moving on from the topic of extending prototypes, the assignment of
\code{SubFoo.prototype} deserves some additional discussion. Consider the
implications of this assignment; particularity, the invocation of the
constructor \var{Foo}. ECMAScript does not perform assignments to prototypes
differently than any other assignment, meaning all the logic contained within
the constructor \var{Foo} will be executed. In our case, this does not have any
terrible consequences --- \var{name} will simply be initialized to
\code{undefined}, which will be overridden once \var{SubType} is invoked.
However, consider what may happen if \var{Foo} performed checks on its
arguments.
\begin{lstlisting}[%
label=lst:ctor-problem,
caption=Potential constructor problems for prototype assignments
]
function Foo( name )
{
if ( typeof name !== 'string' )
{
throw TypeError( "Invalid name" );
}
this.name = name;
}
// ...
SubFoo.prototype = new Foo(); // TypeError
\end{lstlisting}
As \jsref{lst:ctor-problem} shows, we can no longer use a new instance of
\var{Foo} as our prototype, unless we were to provide dummy data that will pass
any type checks and validations that the constructor performs. Dummy data is not
an ideal solution --- it muddies the code and will cause subtypes to break
should any validations be added to the supertype in the future.\footnote{Of
course, if the constructor of the supertype changes, there are always BC
(backwards-compatibility) concerns. However, in the case of validations in the
constructor, they may simply enforce already existing docblocks, which should
have already been adhered to.} Furthermore, all constructor logic will still be
performed. What if \var{Foo} were to do something considerably more intensive
--- perform vigorous data validations or initialize a database connection,
perhaps?\footnote{Constructors should take care in limiting what actions they
perform, especially if they produce side-effects.} Not only would we have to
provide potentially complicated dummy data or dummy/stubbed objects, our
prototype assignment would also incur an unnecessary performance hit. Indeed,
the construction logic would be performed \(n + 1\) times --- once for the
prototype and once for each instance, which would overwrite the results of the
previous constructor (or duplicate, depending on implementation).
How one goes about solving this problem depends on the needs of the constructor.
Let us first consider a very basic solution --- ignoring constructor logic if
the provided argument list is empty, as is demonstrated in
\jsref{lst:ctor-ignore-empty}.
\begin{lstlisting}[%
label=lst:ctor-ignore-empty,
caption=Ignoring construction logic if provided with an empty argument list
]
function Foo( name )
{
if ( arguments.length === 0 )
{
return;
}
// ...
this.name = name;
}
// ...
SubType.prototype = new Foo(); // OK
\end{lstlisting}
This solution has its own problems. The most apparent issue is that one could
simply omit all constructor arguments to bypass constructor logic, which is
certainly undesirable.\footnote{Constructors allow us to initialize our object,
placing it in a consistent and predictable state. Allowing a user to bypass this
logic could not only introduce unintended consequences during the life of the
object, but would mandate additional checks during method calls to ensure the
current state is sane, which will add unnecessary overhead.} Secondly --- what
if \var{Foo}'s \var{name} parameter was optional and additional construction
logic needed to be performed regardless of whether or not \var{name} was
provided? Perhaps we would want to provide a default value for \var{name} in
addition to generating a random hash that can be used to uniquely identify each
instance of \var{Foo}. If we are immediately returning from the constructor when
all arguments are omitted, then such an implementation is not possible. Another
solution is needed in this case.\footnote{That is not to say that our first
solution --- immediately returning if no arguments are provided --- is useless.
This is a commonly used method that you may find useful for certain
circumstances.}
A solution that satisfies all needs involves a more complicated hack that we
will defer to section~\ref{sec:extending}.\footnote{One may ask why, given all of
the complications of extending prototypes, one doesn't simply set
\code{SubFoo.prototype = Foo.prototype}. The reason for this is simple --- we
would not be able to extend the prototype without modifying the original, as
they would share references to the same object.}
\subsection{Shortcomings}
ECMAScript's prototype model is highly flexible, but leaves much to be desired:
\begin{description}
\item[Access Modifiers]
Classical OOP permits, generally, three common access modifiers: public,
protected and private. These access modifiers permit encapsulating data that
is unique \emph{per instance} of a given type, without the performance
penalties of privileged members (see \jsref{lst:privileged}).
Not only are access modifiers unsupported, but the concept of protected
members is difficult difficult in ECMAScript. In order for a member to be
accessible to other objects higher up on the prototype chain (``subtypes''),
they must be public. Using privileged members would encapsulate the data
within the constructor, forcing the use of public methods to access the data
and disallowing method overrides, effectively destroying any chances of a
protected API.\footnote{As ease.js will demonstrate, protected APIs are
possible through a clever hack that would otherwise lead to terrible,
unmaintainable code.}
\item[Intuitive Subtyping]
Consider the verbosity of \jsref{lst:subtype}. Now imagine how much
duplicate code is required to maintain many subtypes in a large piece of
software. This only serves to distract developers from the actual business
logic of the prototype, forcing them to think in detailed terms of
prototypes rather than in terms of the problem domain.\footnote{The ability
to think within the problem domain rather than abstract machine concepts is
one of the key benefits of classical object-oriented programming.}
Furthermore, as discussed in section~\ref{sec:ext-ctor}, creating extensible
constructors requires considerable thought that must be handled on a
case-by-case basis, or requires disproportionately complicated hacks (as
will be demonstrated in section~\ref{sec:extending}).
\end{description}
Fortunately,\footnote{Well, fortunately in the sense that ECMAScript is flexible
enough that we can work around the issues. It is, however, terribly messy. In
ECMAScript's defense --- this is a consequence of the prototypal model; our
desire to use class-like objects instead of conventional prototypes produces the
necessity for these hacks.} those issues can be worked around with clever hacks,
allowing us to continue closer toward a classical development model.