Introducing OFS – a python "bucket"/object storage library

Many internally distributed storage systems – such as Amazon’s S3 service or Riak’s key-value architecture –  have similarities in the manner in which data is labelled and subsequently retrieved. This is often because the systems themselves use a distributed hash table or a similar distribution algorithm to disperse and then later find the data they store.

OFS is a python library that seeks to capitalise on their similarities – providing a single, general API to put and get files from one of these services while hiding the specifics of the implementation from the user. This allows for local testing and development before transitioning to using one of the cloud services, services which typically cost real money and slows down testing due to the necessity of communicating with these services over an internet connection.

Characteristics of OFS:

  • Uses a ‘bucket/label’ mechanism to identify individual files
  • Provides a list of content in a given bucket (as best as that the service can provide)
  • Provides per-file metadata in so far as the service can provide (key-value or JSON encode-able data)
  • Current backend plugins:
    • Local storage – based on the pairtree specification that optimises file-distribution across a native file-system to handle large quantities of files. Uses JSON to encode arbitrary metadata about the files in a given bucket.
    • Remote storage (S3 and Archive plugins written by Friedrich Lindenberg (pudo) who has also made large contributions to the codebase):
      • Amazon S3
      • Archive.org
      • Riak (in progress)
    • Also in progress – a REST Client by Friedrich Lindenberg (pudo)
    • One key desire is to provide opaque sharding – breaking up very large files to spread across buckets or even systems to improve performance and the range of services or backend systems OFS can make use of.

It is plain that having the ability to write storage code in a common way, but make use of local as well as remote ‘cloud’ storage is of a great benefit. It encourages file storage to be codified in a distribute-able manner so that scaling later on is easier.

This is a work in progress, but the local implementation is intended to be both a reference implementation as well as useful testing or even production backend for storage. Other backends potentially will have less comprehensive metadata support for individual files, but these ‘limits’ will be included as optional warnings or exceptions once we have a handle on what they are.

Please comment or give feedback on this library. Also, we would welcome any patches for other backend support to the library!

http://bitbucket.org/okfn/ofs

This entry was posted in JISC OpenBib and tagged , , , , , . Bookmark the permalink.

One Response to Introducing OFS – a python "bucket"/object storage library

  1. Abe says:

    This looks like an interesting project. I’ve been trying to setup a distributed storage solution on Hadoop using inexpensive servers ($40-50). I have it working, but the limited RAM (128MB) is a problem. I was hoping to do something using Python instead. I went to the link and the project is gone. Did it relocate somewhere else?

    http://www.socialtodolist.com/shared/project/installing-and-using-hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *