WWW 2008 / Refereed Track: Web Engineering - Applications
April 21-25, 2008 · Beijing, China
Organizing and Sharing Distributed Personal Web-Service Data
Roxana Geambasu, Cherie Cheung, Alexander Moshchuk, Steven D. Gribble, and Henry M. Levy
Depar tment of Computer Science & Engineering University of Washington, Seattle, WA, USA 98195
{roxana, cherie, anm, gribble, levy}@cs.washington.edu
ABSTRACT
The migration from desktop applications to Web-based services is scattering personal data across a myriad of Web sites, such as Google, Flickr, YouTube, and Amazon S3. This dispersal poses new challenges for users, making it more difficult for them to: (1) organize, search, and archive their data, much of which is now hosted by Web sites; (2) create heterogeneous, multi-Web-service object collections and share them in a protected way; and (3) manipulate their data with standard applications or scripts. In this paper, we show that a Web-service interface supporting standardized naming, protection, and object-access services can solve these problems and can greatly simplify the creation of a new generation of object-management services for the Web. We describe the implementation of Menagerie, a proof-of-concept prototype that provides these services for Web-based applications. At a high level, Menagerie creates an integrated file and object system from heterogeneous, personal Web-service objects dispersed across the Internet. We present several object-management applications we developed on Menagerie to show the practicality and benefits of our approach.
trip
/
Flickr Google Docs
HR
work
YouTube
photos
Hotmail
Blogger
(a) data integrated into the desktop file system
(b) data isolated across separate Web services
Figure 1: PCs vs. Web services. In the desktop-centric world, users can
organize and share their application data through the file system. In today's Web, data is increasingly trapped inside the Web service that operates on it.
Categories and Subject Descriptors
H.3.4 [Systems and Software]: Distributed systems; H.3.5 [Online Information Systems]: Web-based services, Data sharing
General Terms
Design, Performance
Keywords
Web services, Menagerie, data sharing, data organization
1. INTRODUCTION
The Web is catalyzing a transition from PC-based software and file systems to Internet-based applications and Web services. In the past, users relied solely on their desktop systems to execute applications and store their personal data. Today, many desktop applications have feature-rich "software-as-a-service" counterparts, including Web-based email systems, media editing tools, and office productivity suites. Similarly, services such as Flickr, YouTube, Blogger, and Amazon's S3 allow users to store, edit, and share their data via the Web.
Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 2125, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04.
Web-based services offer compelling advantages over traditional desktop software. Specifically, users can access their Web services and data through multiple devices from anywhere in the world. The Web eliminates administrative tasks such as software installation and update, and it facilitates the network effects that come from having a large community of connected users. However, PC-based systems have compelling advantages of their own, many of which arise from the functions provided by the desktop operating system and file system. The OS supports a set of common, beneficial services that we take for granted. Users can name, organize, and access all of their files within a single hierarchical namespace, irrespective of which applications natively operate on them (Figure 1a). Similarly, applications written by different software vendors can interact with each other through the protected sharing interfaces exposed by the OS, providing users with new composite functions. As the transition to the Web continues, we risk losing the advantages enjoyed by desktop systems. Users' personal data and objects are frequently trapped inside the different Web services that they use (Figure 1b). Consequently, users and services face a set of significant new challenges: Data organization and management. On the desktop, a user can create a folder to hold related files and objects. On the Web, users' data is scattered across the Internet, where it is housed by a myriad of independent Web services. Given this, how can she organize, manage, archive, or search her Web objects and files as a unit? Protected data sharing. While publishing is greatly simplified in the Web service environment, protected sharing, particularly at a fine grain, becomes more difficult. For example, how should one user share a specific subset of her objects with another user? Does the other user need to create accounts on all relevant Web services, and if so, do all of these services support the restricted sharing of only a select object subset?
755
WWW 2008 / Refereed Track: Web Engineering - Applications
Data manipulation and processing. Web services restrict the operations that can be performed on their objects: they typically export a limited API and expose only a small set of user commands through the browser. In contrast, the power of a system such as Unix derives, in part, from its simple data-processing commands (cat, grep, etc.) that can be composed together or extended to manipulate data in new ways. How should we balance the need for Web services to retain ownership over the data and functions they provide, with the benefits that would be gained by allowing third parties to extend services? This paper examines these challenges. First, we discuss the principles and requirements that must underlie any solution. Next, we discuss the design and implementation of Menagerie, a proof-ofconcept system that embodies our solution principles. Menagerie consists of two primary components: (1) the Menagerie Service Interface (MSI), an API that facilitates inter-Web-service communication and access control, and (2) the Menagerie File System (MFS), a software layer that allows "composite Web services" to integrate remote Web objects into a local file system namespace, reducing the engineering effort required to access and manipulate remote data. To demonstrate the value of our approach, we have prototyped several new Web applications on top of Menagerie. Our experience shows that it is possible to combine the ease of use, publishing, and ubiquitous access advantages of Web services with the organizational, protected sharing, and data processing advantages of desktop systems.
April 21-25, 2008 · Beijing, China
Flickr albums product photos family trip landscapes
Schwab stock ticker product folder
Google docs glossy .doc market .xls salary .xls product reviews my mail
Hotmail folders
Figure 2: Motivating Scenario. Ann would like to create a new folder that
links to some of her Flickr, Hotmail, Google Docs, and Schwab objects. As well, she wants to share the folder and its contents with her colleagues, who do not have accounts on all of these services.
2.2 Challenges
Given the limitations of today's Web, it is extremely difficult for Ann to accomplish her goals or for third-party Web services to help her. Ann faces three classes of obstacles: Naming. The Web services in our example provide users with the abstraction of objects that can be manipulated in various ways. Unfortunately, not all of the services expose objects with a predictable, stable URL; instead, some objects are externally presented by the Web service as a diffuse collection of HTML elements, images, frames, and JavaScript, whose URLs might be dynamically generated. Accordingly, users and third-party services have no easy way to name each of the objects that Ann wishes to collect into her virtual folders. Protection. Ann needs to share some of her objects with her colleagues and with the third-party archival service, but she faces several protection-related impediments. Each Web service has implemented its own particular authentication, authorization, and sharing scheme. Thus, Ann's colleagues may need to create accounts on all services to fully access her shared objects. Even if single-sign-on accounts existed across the Web, many services fail to offer flexible and fine-grained protection. In some cases, sharing is all-or-nothing. For such services, allowing Ann's colleagues access to her professional objects may also reveal her personal data. Sharing also may be limited in some ways; for example, some Web services do not allow the sharing of different subsets of objects with different subsets of users. Finally, some services provide secure URLs that the user can hand out to grant object access, but many of these services do not support the selective granting of write access or the revocation of rights. Ann wants to grant her associates access to a single virtual folder, implicitly giving them access to all of the objects within it. Unfortunately, those objects are scattered across many different services, each with its own authorization scheme. Short of Ann giving a third-party aggregation service all of her Web credentials and trusting that service with her objects, such sharing cannot be achieved. Externalization and embedded rendering. Most Web services do not expose object data directly to users and third party services. Instead, they graphically present objects and interaction controls as embedded elements within Web pages. In contrast, on desktop systems, the filesystem permits many programs, including file
2.
MOTIVATION
In this section, we use a simple motivating scenario to expose some of the shortcomings of the Web. From this scenario we derive a set of required properties that a solution must have to overcome these limitations.
2.1
A Simple Scenario
Figure 2 illustrates our simple motivating scenario. We consider Ann, a product manager for a small company. Ann has moved wholeheartedly to Web services for both her personal and business data and information processing needs. Specifically, Ann uses Flickr to manage her photo albums, Google Docs for spreadsheets and word processing files, Hotmail to communicate with colleagues and family, and Schwab to view an interactive stock ticker and maintain her personal financial information. Ann likes to keep her data well organized. In the past, she used her PC's desktop manager to create folders in which her related files were stored or linked. Since many of her documents are now Web-based, she would like to create virtual Web folders that are populated with links to the appropriate Web objects and collections. For example, she would like to collect all of her product marketing resources into a single folder, in spite of the fact that they are spread across many Web services. Ann also wishes to securely share some of her virtual folders with her colleagues, granting them access to view and edit the folders' contents. However, she does not want her colleagues to have access to all of her business files or to her personal files. In addition, not all of her colleagues have accounts on the same Web services as Ann. Finally, Ann is extremely careful with her valuable data and wants to prevent against accidental deletion or an operational Web service failure. She would therefore like to use a third-party archival service to maintain historical versions of all of her Web objects and virtual folders.
756
WWW 2008 / Refereed Track: Web Engineering - Applications
managers, file sharing applications, editors, archivers, and security scanners, to process the same data objects. To realize our scenario, Web services must provide additional functions that most of them lack today. In particular, they must export externalized representations of their objects to allow thirdparty services, such as archival or indexing services, to operate on that data. For simple third-party services, the structure and semantics of the externalized representation does not matter: the object can be exported as an opaque set of bytes. For richer services, a standardized or well-known representation, such as MIME for email, would be more valuable. Finally, Ann and her colleagues rely on a third-party service to create and access virtual folders, and to browse the files within them. To support this, origin Web services should provide useful metadata and facilitate composite graphical interfaces that would allow the objects to be rendered and operated on within arbitrary Web pages. Flash movies exported by sites such as YouTube are good examples of this.
Other applications
April 21-25, 2008 · Beijing, China
MSI
Application
C1
MFS
C2
FS calls
MSI
MSI
Service 1
Ann's data
Service 2
Ann's data
Figure 3: The Menagerie Prototype. The figure shows two Web services
that export Ann's objects, a composite Web application built using the MFS layer, and the MSI capabilities (c1 and c2 ) that the application uses to access the objects.
3. THE MENAGERIE PROTOTYPE
This section describes the structure and implementation of our Menagerie prototype. Menagerie consists of two principle elements: the Menagerie Service Interface and the Menagerie File System. We briefly introduce these elements here and then describe them in more depth in the remainder of this section. The Menagerie Service Interface (MSI) is an inter-Web-service communications API that is comprised of object naming, protection, and access operations. MSI defines a uniform, hierarchical name space into which Web services export the names of their objects. MSI supports fine-grained sharing of Web objects through the use of hybrid capabilities. This protection scheme allows users without service accounts to name and access objects, while also giving services the ability to limit the actions of such users. MSI also specifies a standard set of object-independent access functions for Web services. These functions support object reading and writing, rendering, and metadata export. While our goal is to design an interface that Web services can easily adopt, our prototype implementation also shows that Menagerie is deployable even without Web service support. The Menagerie File System (MFS) simplifies the development of new, composite Web applications. MFS mounts remote MSI object hierarchies into a local file system name space, allowing an application to access remote Web objects through a standard file system interface. Figure 3 depicts a composite Web application that uses MFS to access the Web objects exported by two MSI-capable Web services. The remainder of this section describes in detail MSI's naming, protection, and content operations. Figure 4 shows the functions we have implemented to date. This small set was sufficient to build our example applications; as we gain more experience, we expect the interface to evolve and grow.
2.3
Requirements of a Solution
In the PC-centric world, the operating system provides abstractions, system call interfaces, and utilities to help applications and users overcome the challenges we describe above. In the Web, there is no single trusted layer that users, browsers, and services can rely on. We therefore believe that a new service interface must be defined and adopted to provide the interoperability and integration needed to realize even our simple motivating scenario. This service interface could be defined via conventions on top of the HTTP protocol (e.g., REST[8]), or new special-purpose protocols could be designed for this purpose. Regardless, the challenges we described motivate three clear requirements that the service interface must support: 1. Uniform object namespace. To address the naming challenge described above requires a single global namespace in which all personal data objects are embedded. That is, all of the objects and object collections that users manipulate should have a permanent, globally unique name within this namespace, allowing the Web service, its users, and thirdparty composite services to discover and depend upon these names. 2. Fine-grained protection. To support data sharing and composite services, a Web service must provide fine-grained protection of objects and collections. It should be possible for the user to share only a portion of her objects from a service, while keeping the other objects private. It should also be simple to aggregate and share collections of distributed objects. 3. Unified minimal object access. The combination of a global, hierarchical namespace and fine-grained, protected sharing of personal data allows users and services to find and share objects with each other. To be useful, however, the objects must support some standard set of access functions. As we argued above, the minimal set must include the ability for objects to be embedded and rendered within an arbitrary Web page, and for object data to be externalizable. The next section presents the architecture and implementation of Menagerie, a proof-of-concept prototype we have developed to meet the challenges we have described. Menagerie allows us to experiment with new Web applications that support the organization and sharing of collections of heterogeneous Web service objects. We will describe those applications in Section 4.
3.1 Object Naming
We designed naming in Menagerie with two goals in mind. First, users must be provided with meaningful object names that correspond to the way users name objects inside of a Web service. Second, composite applications must be provided with global, unique identifiers for the objects they access, even though those objects are scattered across heterogeneous Web services. In Menagerie, each Web service exports an object name hierarchy for each of its users. This hierarchy contains the user-readable names of all objects that each user can access. The structure of this hierarchy and the granularity of each object within it are left entirely up to the service, but it typically imitates the logical structure that the service exposes to its users. For example, Flickr offers its users abstractions associated with sets of objects (photo albums) and objects within each set (photos); therefore,
757
WWW 2008 / Refereed Track: Web Engineering - Applications
Namespace functions list(capa, object_ID) returns list of object names and IDs mkdir(capa, parent_ID, name) getattr(capa, object_ID) returns object attributes Protection functions create_capa(capa, object_ID, rights) returns new capa revoke_capa(object_capa, revoke_capa) Content and Metadata functions read(capa, object_ID) returns byte[] write(capa, object_ID, name, content) get_summary(capa, object_ID) returns string get_URL(capa, object_ID) returns string
64 bits 128 bits
April 21-25, 2008 · Beijing, China
Capability token given out by the service Root Node Password global ID Capability validation CapTable stored at the service Root Node Password Openaccess Closed access global ID Rights Rights ... ... ... ...
Figure 5: Hybrid Capability Protection. A capability provides access to
objects within a sub-hierarchy rooted in the object identified by Root Note ID. Open-access rights allow direct object access on the basis of a valid capability, while closed-access operations also require user authentication.
Figure 4: The MSI interface. This table shows the parameters and return types of each function. MSI services must support the naming and protection-related functions, and may optionally support the others. Flickr could choose to export a three-level name hierarchy (e.g., Ann/Disneyland-album/Mickey-photo). Each object in Menagerie is identified using a service-local ObjectID, which is unique within the service and independent of the object's location in the hierarchy. Using the service-local ObjectIDs, Menagerie mints globally unique object identifiers by combining the service-local ObjectIDs with services' DNS names. By making ObjectIDs unique on each service (as opposed to globally unique), we give services the liberty to create and name new objects independently. By making an object's ID independent of the object's location within the service's hierarchy, we ensure that caching and other optimization opportunities are preserved even if the object can be reached via multiple paths. Three functions in MSI support name hierarchy operations: list, getattr, and mkdir. Given the unique ID of a collection node in a hierarchy, list returns the names of all the children of that node, as well as their unique IDs. Getattr returns the attributes of the object with the given ID, including the type of object, a capability for the object (see Section 3.2), the size of the object in bytes, and various additional metadata. Mkdir adds a collection object to the hierarchy. Individual objects are created using the MSI write function, as we will see in Section 3.3.
3.2
Protection
While designing Menagerie's protection model, we considered the two fundamental access control mechanisms: capabilities and access control lists (ACLs). These mechanisms generally lie at opposite ends of a spectrum. Capabilities simplify sharing, while ACLs enable tight access control and user access tracking. While our goal is to simplify fine-grained, distributed object sharing, we must also provide services with the ability to control and track access to their data. Menagerie therefore adopts a hybrid capability-based protection system, which combines the benefits of both mechanisms. A Menagerie capability is an unforgeable token that contains the globally unique ID for an object and a set of access rights. Possession of a capability gives the holder the right to access the object in the specified ways. Capabilities support sharing because they are easy to pass from user to user: Menagerie's capabilities are encoded in URLs that can be emailed or embedded in Web pages. However, a Menagerie capability is also subject to control by the Web service whose object it names. A service can divide its object rights into two types: open-access rights and closed-access rights. An open-access right gives the holder of the capability direct access to the specified operation without further authentication; e.g., if the
right allows the user to read the object, then the service will return the object's contents when presented with a capability with the read bit set. Since a capability is not associated with any principal, an "open-access" operation cannot be attributed to a particular user. A closed-access right, however, requires additional authentication. To perform an operation associated with a closed-access right, a capability with that right enabled is necessary but not sufficient: the user must also authenticate himself before the service will perform the operation. In most cases, this will require an account on that service. By "closing access" to an operation, the service can track the user that invokes the operation, or enhance the user's experience with personalized functions. To implement capabilities, we use the password-capability model [4, 24]. The structure of a Menagerie capability is shown in Figure 5. The capability specifies a globally unique ID of a node in a service's hierarchy and it authorizes access to the entire subhierarchy rooted in that node. The capability also contains a long "password" a random field chosen from an astronomically large number space. The password is generated by the service at capability creation time and ensures that the capability cannot be guessed. A service stores information about all capabilities it creates in a table called CapTable, whose structure is also shown in Figure 5. Because the service stores the capability rights, they cannot be forged by users. As seen in Figure 4, every MSI method call passes at least two parameters: a capability token for an ancestor of the accessed object within the service's hierarchy and the object's ObjectID. Upon an MSI invocation, the service checks that the ancestor relationship holds and that a corresponding (root node ID, password) pair can be found in its CapTable. If not, the capability is invalid and the operation fails. MSI provides functions for creating and revoking capabilities: create_capa and revoke_capa. When a user requests a capability from a service (using create_capa), the service returns a URL that embeds the new capability. In this way, capability sharing is similar to URL sharing in the Web. Revocation of a capability simply zeroes the rights fields in the capability's CapTable entry. To prevent arbitrary users from revoking capabilities, revocation requires a valid capability to the same object with the REVOCATION right enabled. Several current Web services already use slight variations of a hybrid-capability protection model, which confirms the applicability of our approach. As one example, Flickr and other Yahoo! services provide "browser-based authentication [33]," which is essentially a capability-based scheme; it allows users to obtain a "token" for an object, specify a set of rights enabled by that token, and pass the token to an application. As another example, Google Calendar offers users "secret URLs" to their calendars, which they can give
758
WWW 2008 / Refereed Track: Web Engineering - Applications
to friends. These URLs are a type of capability that can be used to view, but not modify, the user's calendar. To share a calendar with update rights, the user must add the sharee to the service's ACL. Our hybrid-capability protection scheme meets our fine-grained sharing goal: it simplifies limited sharing of objects and collections, while providing services with control over more important operations. Menagerie's protection system is flexible enough to support all of the protection policies we encountered in the Web.
non-MSI Web service HTTP MSI proxy
April 21-25, 2008 · Beijing, China
user MSI native Web service Web browser
MSI over XML-RPC
MSI over XML-RPC
HTTP
3.3
The Object Content Access Interface
squid cache MFS + libfuse VFS FUSE
MSI provides composite Web applications with two different ways to access objects. First, for mashup-style applications, Menagerie permits a composite application to embed an object from a remote service within a Web page. The remote service is responsible for the presentation and interaction controls of that embedded object, similar to how YouTube provides embeddable, interactive objects for displaying video. To support building expressive composite GUIs, MSI defines a set of metadata access functions, including get_summary and get_URL. The get_summary function returns an HTML snippet that describes the object visually. For example, get_summary returns an tag for a Flickr photo's thumbnail, an