Proposal for Collections enhancement: atomic media

Steven Robertson Sat, 26 Jan 2008 23:49:04 -0800

To implement part of a forthcoming Really Neat Idea(tm), I would need
an enhancement to the Collections concept.  I am, of course, willing to
develop a patch for this concept, but as the Collections concept itself
required a ton of discussion, I feel that just making a patch and
trying to get it upstream with no prior discussion disrespects all of
you who spent so much time engineering the concept, so I'll pitch it
with words before I do it with code.


DISCLAIMER: I haven't done a thorough audit of the existing code.  Some
of the ideas expressed may make me sound like an idiot.  That isn't a
new thing for me, so don't hesitate to let me know. ;)  Oh, and sorry
about the length.

== Definition

Atomic media (I'm open to suggestions for the name as well as
everything else) is an ordered set of media which behaves like a single
article of media in every way.  This means:

* The atomic media (AM) is identified by a single mediaid.
* The AM has properties that are visible to filter operators.

In addition, AM are subject to the following restrictions, which make
implementing them worlds easier:

* AM can not be saved.
* AM must be split before they can be played.

== Uhh... why?

My undying love of whole albums, of course.

To be more specific: an AM can make certain kinds of excruciatingly
complicated DAGs - and in fact some DAGs not before possible without
resorting to complicated, client-side-sorted idlists - remarkably
simple.  

=== Example time!

"Party shuffle of whole albums by Pinback, Menomena, or Radiohead with
running time greater than 42 minutes (2520000 msec) and an average
rating of four, which I've listened to an average of at least three
times, sorted by number of tracks."

It's possible to construct this list by manually manipulating idlists
with client code, but it would be a complicated operation involving
many queries.  The alternative is to do the following (expressed using
something similar to cli collections syntax - it should be clear how
this would translate into real code).

----

1. Query: artist:Pinback OR artist:Menomena OR artist:Radiohead album

    Returns: idlist.  Contents obvious.

2. Fuse: key:album sum:duration max:tracks avg:rating,timesplayed

    Returns: idlist.  Each mediaid is a virtual id (explained later).
    It should be noted that all of the values specified above would be
    done that way by default; they were manually specified for clarity.

3. Query: duration>2520000 rating>=4 timesplayed>=3

    Returns: idlist.  Each mediaid is still a virtual id.

4. Party shuffle.

    Returns: idlist.  Each mediaid is still a virtual id.

5. Split: limit:1

    Returns: idlist.  The top mediaids are the real mediaids of the
    individual tracks in the first album on the list.  The remaining
    ones are still virtual mediaids representing whole albums.

----

This example - for use in a party shuffle playlist - is the ugliest
example I could come up with, because of the fact that party shuffle
does duplicate some of this functionality.  I have many more
use-cases, as well as ideas regarding ways of expressing the concept of
AM to the user in a familiar and usable way, that helped when designing
the concept, but for brevity's sake I'm excluding those for now.

=== How this is better

It adds a unified method for performing selections on aggregate
statistics, and a straightforward (and client-side-logic-free) method
for computing said statistics.  This has several benefits:

* You can operate on things like albums without ever leaving the
  collections context.  It's possible to achieve some of these things
  manually, but it is disruptive because it must be accomplished 
  programmatically instead of as operations on a DAG.
* Consistent behavior for these features across clients.
* Implementation of more complex random algorithms - tournament
  selection, weighted selection - and sorting on album-wide properties
  becomes essentially free.

In the description for the aforementioned Really Neat Idea(tm), it
should become clear how easily this new functionality can be exposed to
the user.  It might help my case to cite the quote that actually
started this train of thought:

13:37 < anders> Fill my mp3player with random songs, but I prefer
                synth, and make sure the latest added albums are there,
                exept this really crappy album, and btw, the most played
                metal-albums would be nice too. And Hey! I just want
                full albums, don't splitt them, except these songs,
                which are the only good ones on their albums.

The fact that we can express this set of music in natural language much
easier than in Collections notation indicates that there is room for
improvement.  I believe that AM can greatly enhance the ability of a
Collection query to achieve the expressiveness of natural language.


== OK, hot-shot, so how you gonna implement the thing?

I propose the following changes.  Obviously these are very pseudocode,
the names etc. are designed for readability not direct implementation.

* The medialib is given the following new method: 

add_virtual_mediaid ( proplist )
-> int mediaid

Adds the virtual mediaid with the given list of properties to a
temporary table.  Returns a previously unassigned mediaid from the top
of the range of int32.  Some form of garbage collection or flushing
should be considered; I need to work on this aspect a little bit more.

It should be noted that this wouldn't break existing clients or require
further modifications to the code.  Simply exclude virtual mediaids
from the 'universe' collection.


* Two new Collections filters are created:

FUSE
FIELD: yes.i.know.this.is.gross.and.it.needs.work.i.am.still.thinking
VALUE: "key:property [sort:property] min:property[,property]..."

Takes a collection.  Sorts it by key, then by sort - let's say Album,
then Track - and fuses media with the same unique key into one virtual
mediaid.  Property x of the virtual mediaid is determined by an
operation on the values of that property from the real mediaids it
comprises.  Let v be the list of values of the given properties, and vn
being the list of values of the given properties with mathematical
nulls (non-numerical strings, empty strings, missing values) removed.

min: min(vn)
max: max(vn)
sum: sum(vn)
avg: sum(vn) / len(vn)
realavg: sum(vn) / len(v)
concat: '\n'.join(v)

More may be added as I think of them.  They'd be pretty trivial to
implement, of course.

Additionally, the property 'metamediaid' (better named) would be added,
containing the ordered list of mediaids in the virtual mediaid.


SPLIT
FIELD: [property, usually mediaid]
VALUE: [as normal]

Takes the (probably ordered) collection, splits the virtual ids that
match the property/value pair, and returns the new collection.

=== Pros of this desgin

It's a simple design, backwards-compatible*, and easily implemented.
What more could you want?

* If the default is, for instance, "title" to be concatenated with a
newline, a client will properly render the track title of an album in a
playlist/queue/whatever as a list of the individual track titles.
Similar sensible behavior would be available for all properties for
clients that chose not to become aware of AMs.

=== Items that need to be addressed

First and foremost, the FUSE filter's parameter passing is uggggggly.
The syntax isn't bad, but it's a misuse of the FIELD and VALUE
parameters.  Some alternatives spring to mind but none strike me as
elegant.  Any thoughts would be much appreciated.

Saving and loading playlists (and other medialists) with AMs would also
require an extra step, though a simple one: if a program is doing
an album view, put a SPLIT on the end of the chain when you save the
playlist, and a FUSE when you load it.  This saves much complexity from
having to deal with saving and garbage-collecting virtual mediaids, but
it still is an extra step.




*ducks under desk, covers back of neck with hands*

All right, flame away.
Steven

OH CRAP THAT DRILL IS FOR TORNAD^@

signature.asc
Description: PGP signature

--
_______________________________________________
Xmms2-devel mailing list
Xmms2-devel@lists.xmms.se
http://lists.xmms.se/cgi-bin/mailman/listinfo/xmms2-devel

Proposal for Collections enhancement: atomic media

Reply via email to