Saturday, March 17, 2012

ZUNI PuSH - App or API?

I was just answering some questions from Beatrice -- our developer at Trento -- about the ZUNI PuSH system that they are fast completing. We were talking about the filtering feature and how complex or simple it should be. Naturally, I said it should be simple mostly because I think simple search features are more powerful. Simplicity accommodates a vastly larger diversity of queries, and their intentions, than do complex search systems (contrary to theory). However, I made a point to Beatrice which got me thinking. I said that we "should keep in mind that this process provides the subscriber with a filtered feed for them to process. That means that we do not have to, and even probably should not, provide" a complex filter. The point of the system, I went on briefly to say, is that the subscriber can do what they like with the data as they are PuSHed (sent) the filtered feed data. In this way, what we are doing is "much more like and API than an APP", I said.

So this is what got me thinking. We have been treating this system as though it were an App (a Web Application), which it partially is. However, it is an App that allows you to work with feed data as though it were an API (an Application Programming Interface). What APIs do is to allow for an App (App_1) to communicate with another App or system, or many other Apps or systems, so that data from these Apps or systems can be used, modified and reapplied in the App_1. APIs also allow for data to be passed from the App_1 back to the other Apps or systems. APIs are everywhere on the web, but  APIs act in the background, behind a front-end of an App that the user engages with, so users don't usually know they are there.

The ZUNI PuSH system, however, is a bit of a hybrid. Though it is a front end App for people to publish and/or subscribe and filter feeds -- in this way nothing unusual, it actually sends the subscriber the feed data from the filtered feed like an API for the subscriber to process. In this way, the ZUNI PuSH system is like an API.

This may seem like a "technicality", and it is in one sense, but it is actually critical to understanding what it is we are trying to do. The fact that it is both an App and an API shows how this approach is fundamentally different from other access systems such as readers, portals or catalogues. These traditional access systems see information as something that is accessed and referred to. The ZUNI PuSH system sees information more as it is used in mashups -- as something that is used, transformed and recontextualised by the user. The ZUNI PuSH system sees information as a resource, not as a product, and this is a critical difference with far reaching implications.

Thursday, July 21, 2011

What is wrong with PubSubHubBub.

I have been having a great set of skype meetings with our team in Trento (Fabio, Marcos and Beatrice) about the Publish and Subscribe system. We are making excellent progress and have the system fleshed out already. However, from the beginning we have all been aware of some limitation of the usual publish and subscribe models for both RSS/ATOM and PubSubHubBub. We are working with Trento because of three limitations in particular: Size of feeds, lack of security and need to filter output. Today, in anticipation of our meeting Beatrice posted her list of PubSubHubBub limitations which I think are the best to date from anywhere. With her permission, I am posting them here. By the by, for these reasons, we are not using the Google Code PubSubHubBub implementation, but will be using Apache ActiveMQ (

PubSubHubbub limitations

by Beatrice Valeri (, Trento University Computer Science, Italy

PubSubHubbub is a protocol for pushing updates of atom feeds. It is not useful for the UCLA Push project because it is thought for very simple things and it doesn’t cover many of the requirements.

1.   When a new subscriber arrives, he should receive also all messages written before his arrival. This is not completely supported by PubSubHubbub. If the subscriber subscribes before the feed is published, then, after the feed is published, the subscriber receives all messages written before publishing and all the following updates. Once the feed is published, new subscribers receives only the updates.
2.   There is no security on the hub. Anyone can subscribe and publish.
3.   Subscribers have no way to subscribe only on some messages from a feed. Filtering have to be done by the subscriber after messages are received.
4.   PubSubHubbub is not able to manage feeds that are already big at the moment of publishing. When a feed is published, the hub reads it completely and parses it. If the feed is too big, the hub is not able to parse it.
Feeds have to be broken into pieces and each piece has to be published.
5.   The subscriber has to know the feed url in order to subscribe to it. This is not a real pub-sub system since subscriber has to be aware of publishers and has to subscribe again if a new interesting feed is published.
6.   With PubSubHubbub, a feed can be published only if there is already a subscriber waiting for it. This is not what we want.

    Sunday, May 8, 2011

    Why we are not interested in Portals - 2

    From my last post, I have worked up a model of a Portal implementation so we can now compare our model with that of a Portal. Of course, there is a lot more functionality in your average Portal than I have modeled, but the point remains as this model fits the fundamental structure of the vast majority of Portals.

    If we look at the Portal model we see that all of the information flow is from the User to the Portal. There is no information moving back to the User, except that they are looking at on the Portal. The control of the presentation, organization and interface is with the Portal and stays with the Portal. Even the comments don't move back to core Institutional database, but remains within the Portal. As the Portal is almost always under the control of the Institution anyway, this means that information not only moves from the Users to the Institution (no change there), but also the control of the way the information is managed, presented, accessed and ordered remains with the Institution as well (again, no change there).

    If we look at our model, there are three fundamental differences. First, the information that goes to the hub, is not organized, presented nor ordered there, but simply PuSHed from there to the Subscribers' servers. It is at the Subscribers' servers that the information is organized, presented and ordered many times over, and in the local context. There is also the key difference that comments are not simply attached to a Portal instance, but are available to return into the Institution as part of the object's primary record. Most fundamental of all, though, is that our model is reversible. Any Subscriber in our model can become a Publisher, and any Publisher a Subscriber. Knowledge developed through the local use of Institutional information, can be PuSHed back to the Institution to enhance their documentation of the object. A Portal cannot be reversed as it is not a distribution system, but a broadcast system.

    Sunday, May 1, 2011

    Why we are not interested in Portals.

    I thought we would lay down the gauntlet now, even though we are a bit of a way off demonstrating our work. I have uploaded two models for our systems. The first is a User Model, which is a demonstrative diagram for a general audience that shows what our system will intend to do. The second is an Object Model, using UML, which is more for developers. The two more or less depict the same system, though. I should emphasize, for those developers reading this, that the Object Model is not a full Class Model, but more of a "conceptual" Object Model.

    What I want to show, however, is not simply what we intend to do, but to also explain why this is different from a Web Portal. In a recent discussion between the IT Officer for Anthropology at the American Museum of Natural History (New York), Jim and I, it became clear that what we are doing could easily be confused with a Portal. Or, worse in my mind, that it could be assumed that there would be little difference between what we are doing and a Portal. In fact, I think that what we are doing is fundamentally different, and even opposed, to what Portals do. Here is why.

    If you will pardon me drawing a definition from Wikipedia, a Web Portal is "a web site that function as a point of access to information on the World Wide Web. A portal presents information from diverse sources in a unified way." (Wikipedia, emphasis added). Wikipedia goes on to say that a Portal "provide[s] a way for enterprises to provide a consistent look and feel with access control and procedures for multiple applications and databases, which otherwise would have been different entities altogether." This is the key difference to what we are trying to do and what Web Portals are trying to do. Whereas a Web Portal takes a diverse set of resources, centralizes them and gives them a single "enterprise" identity, what we are trying to do is the opposite. We are trying to do is to take a diverse set of resources, distribute them as filtered sets to diverse expert communities, so that these filtered sets of resources can be localized and used in completely different ways.

    The difference between a Web Portal and our approach is not simply superficial, but goes right down to our understanding of what Knowledge is. Where the assumption about knowledge in a Web Portal, and most "knowledge systems," is that knowledge is an accumulated resource, a set of commodities that gain their power as knowledge through their packaging or their organizing, we accept a different, less colonial, view of knowledge. We see knowledge not as a set of proscriptively ordered and presented resources, but as a personal, local and community achievement. Knowledge, for us, is something you do, and do skillfully, not something you acquire, proffer or stockpile.

    So the difference may seem subtle, even trivial, but is in fact fundamental. A Web Portal seeks to share information resources between individuals and communities through a unified, proscribed and centralized system -- an enterprise system much like a museum or archive. However, what we trying to do is to share information resources between individuals and communities by distributing those resources into the diverse local systems so that they can be directly used to build local knowledge. While a Web Portal is, by its very nature, a system that creates a unified identity for information, and information use, through its "enterprise" identity, our system seeks to fundamentally undermine this universalizing and commodifying approach to knowledge by radically replacing unity and centralization with diversity and localization.


    Thursday, March 10, 2011

    Cost-Share Admin Details

    I am posting a copy of the letter that Ramesh sent out to each of the project partners that describes the details of tracking each institution's cost-sharing commitments. Feel free to reference this for preparing upcoming cost-share letters. -KB

    "I am writing to clarify a few administrative details you should be aware of regarding tracking and documenting the cost sharing that each of you as partners committed to during the original submission of the federally funded IMLS grant for which I serve as the Principal Investigator at UCLA. First, let me say I appreciate your work and financial cost sharing, which has contributed to a successful completion of work performed during the first year of the project. Your partnership has been critical to this!

    It is important that your financial office track the dollar values for each type of cost share (people, purchases, etc.) since UCLA as a public institution of higher education is required to follow federal guidelines to substantiate cost share committed by each organization/institution. To achieve this goal, we will request at the end of each year a cost share report from each of you that accounts for how you met the total cost share commitment for your organization/institution with a brief description of how the cost related to the project. Please share this information with your respective financial offices.

    Sample ($10,000 cost share)
    - J. Smith (dbase manager) = $50,000 Annual Salary x 10% effort (or equivalent hours, if hourly) + $1,000 benefits = $6,000
    - Software Purchase for data analysis = $2,500
    - Materials and supplies for survey development/instruments = $1,500
    Total = $10,000

    The authorized organizational leader signs a letter containing the elements of the sample above specific to their cost share certifying that the cost share commitment was met as described in the original proposal. Detailed payroll and/or non-payroll documentation is then maintained by the partner should any additional questions arise in the future. The documentation should be maintained by the partner for a period of five (5) years after the final end date of the grant. We appreciate your diligence and cooperation to document the cost share in the most straight forward manner with minimal time investment beyond what you already do to meet your own financial standards.

    Should you have any questions about the contents required for the cost share report, please contact Tracy Nguyen-Phan at (310) 825-4426.

    Ramesh Srinivasan"

    Saturday, January 15, 2011

    Interview with Jussi Parikka

    Hi all, I was recently interviewed by Jussi Parikka (media archaeologist and digital theorist). We talked about the past, present and future of archives. Might be of interest. You can find the interview here.

    Thursday, December 23, 2010

    More notes from meeting 11-30-10 - system details & focus group ideas

    This is a continuation of my sketchy notes from our mini-meeting at Zuni on November 30, 2010. We spent much of the afternoon in a discussion about sharing protocols and other details of the Zuni local system, and the protocols that will drive the links between the different local systems. Lastly, we had a discussion of the focus groups for evaluating the system.

    We brainstormed about user profiles, and the kinds of information that might work to identify different users and drive the protocols of access. After coming up with a lengthy list, it seemed like a pretty invasive & complex amount of information to gather from users -- might discourage people from signing up.

    - Probably only 1-2% of objects need that level of careful restriction
    -- just mark it 'unavailable without assistance'
    --- can only access it at AAMHC with help
    -- for right now, just set sensitive things aside

    we can run a permission set out of FileMaker
    another idea - restricting access to local IP addresses

    - Robin also emphasized that we need a word that doesn't subjugate the new stuff to the existing catalog
    - we discussed different kinds of updates that might emerge, and whether we should distinguish between different updates.
    Categories we brainstormed: (which could also emerge from the actual work)
    - correctives / corrections
    - events
    - new / additional information
    - research
    - relations / genealogies
    - disputes / discussions

    - idea - part of the system evaluation could be from the self-ID of additions - does a pattern emerge?

    - impact could be just access, or it could be making flexible systems, or user-generated content

    - how do we roll out the system?
    -- events in the community
    -- getting adults on board
    -- permanent kiosks at IHS, etc

    Photos / videos / audio - what if young people don't know the protocols about taking photos or video?

    Focus groups - guiding questions
    themes of what we're interested in
    - experience of the system
    - access to patrimony
    - community dialogue

    question ideas (for anonymous q's in system)
    How easy is this system to use? (answered on a scale of easy to hard)
    Tell us about it...
    How much do you feel you've learned from the system?