1
0
Fork 0
liza/doc/bucket.texi

157 lines
5.4 KiB
Plaintext

@c This document is part of the Liza Data Collection Framework manual.
@c Copyright (C) 2017 R-T Specialty, LLC.
@c
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
@c Texts. A copy of the license is included in the section entitled ``GNU
@c Free Documentation License''.
@node Bucket
@chapter Bucket
@helpwanted
@menu
* Value Assignment:Bucket Assignment. Writing data to the Bucket.
* Bucket Diff:: Representing bucket changes.
* Calculated Values:: Dynamic data derived from other values.
* Metabucket:: Bucket holding document metadata
@end menu
@c TODO
@node Bucket Assignment
@section Bucket Value Assignment
@helpwanted
@node Bucket Diff
@section Bucket Diff
@cindex Bucket diff
Changes to the bucket are represented by an array with certain conventions:
@enumerate
@item A change to some index@tie{}@math{k} is represented by the same
index@tie{}@math{k} in the diff.
@item A value of @code{undefined} indicates that the respective index
has not changed.
Holes in the array (indexes not assigned any value) are treated in
the same manner as @code{undefined}.
@item A @code{null} in the last index of the vector marks a
truncation@tie{}point;
it is used to delete one or more indexes.
The vector will be truncated at that point.
Any preceding @code{null} values are treated as if they were
@code{undefined}.@footnote{
The reason for this seemingly redundant (and inconvenient) choice
is that JSON encodes @code{undefined} values as @code{null}.
Consequently, when serializing a diff, @code{undefined}s are lost.
To address this,
any @code{null} that is not in the tail position is treated as
@code{undefined}.
We cannot truncate at the first null,
because @samp{[null,null,null]} may actually represent
@samp{[undefined,undefined,null]}.}
@end enumerate
Diffs are only tracked at the vector (array of scalars) level@mdash{
}if there is a change to a nested structure assigned to an index,
that index of the outer vector will be tracked as modified.
It will, however, recursively compare nested changes to determine if a
modification @emph{has taken place}.@footnote{
See @srcrefjs{bucket,StagingBucket} method @code{#_parseChanges}.}
Examples appear in @ref{f:diff-ex}.
@float Figure, f:diff-ex
@multitable @columnfractions 0.30 0.30 0.40
@headitem Original @tab Diff @tab Interpretation
@item @samp{["foo", "bar"]}
@tab @samp{["baz", "quux"]}
@tab Index@tie{}0 changed to @samp{baz}.
Index@tie{}1 changed to @samp{quux}.
@item @samp{["foo", "bar"]}
@tab @samp{[undefined, "quux"]}
@tab Index@tie{}0 did not change.
Index@tie{}1 changed to @samp{quux}.
@item @samp{["foo", "bar"]}
@tab @samp{[, "quux"]}
@tab Index@tie{}0 did not change.
Index@tie{}1 changed to @samp{quux}.
@item @samp{["foo", "bar"]}
@tab @samp{["baz", null]}
@tab Index@tie{}0 changed to @samp{baz}.
Index@tie{}1 was removed.
@item @samp{["foo", "bar", "baz"]}
@tab @samp{[undefined, null]}
@tab Index@tie{}0 was not changed.
Index@tie{}1 was removed.
Index@tie{}2 was removed.
@item @samp{["foo", "bar", "baz"]}
@tab @samp{[null, undefined, null]}
@tab Index@tie{}0 was not changed.
Index@tie{}1 was not changed.
Index@tie{}2 was removed.
@item @samp{["foo", "bar", "baz"]}
@tab @samp{[null, null, null]}
@tab Index@tie{}0 was not changed.
Index@tie{}1 was not changed.
Index@tie{}2 was removed.
@end multitable
@caption{Bucket diff examples.}
@end float
Diffs are generated by @srcrefjs{bucket,StagingBucket}.
@code{null} truncation is understood both by @code{StagingBucket}
and by @srcrefjs{bucket,QuoteDataBucket}.
A diff is applied to the underlying bucket by invoking
@code{StagingBucket#commit}.
@node Calculated Values
@section Calculated Values
@helpwanted
@node Metabucket
@section Metabucket
@cindex Metabucket
The @dfn{metabucket} is a bucket-like key/value store
separate from the data bucket.@footnote{
It is stored in the @code{meta} field on the Mongo document.}
It should be used to save data that should be accessible only to the server,
but never the client.
Data must still be formatted as a vector,
but unlike the data Bucket,
vector values are sometimes structured data instead of strings.
@devnote{A standard still needs to be devised to provide guidance for
when storing structured data is appropriate,
rather than a vector of strings.}
The client has no means by which to access the metabucket.
Custom fields can be populated by server-side DataAPIs
(@pxref{Server-Side Data API Calls}).
Any fields prefixed with the string @samp{liza_} are reserved and are
populated automatically by the Server.
They are shown in @ref{t:liza-meta}.
@float Table, t:liza-meta
@table @code
@cindex Initial rated date
@item liza_timestamp_initial_rated
A Unix timestamp representing the first time a document was acted
upon by a rating service.
This value is set once and is never updated or cleared.
@end table
@caption{Metabucket fields populated automatically by the Server}
@end float