Data Storage
Ways in which to store data created in Plone
Object-Oriented - Data.fs
Plone stores its data in an object database called the ZODB.
- http://en.wikipedia.org/wiki/Zope_Object_Database
There are various storage backends, but the default version used by Plone is File Storage. This can seem like a fairly scarey thing - basically a single file. However, there is no real need to be frightened of it. It is pretty simple, so not much can go wrong.
You'll find the File Storage for your Zope instance in [your Zope Instance]/var/filestorage. It's called Data.fs and is usually, on first creation with a single Plone site in it, about 4MB.
Is it reliable?
Yes, we've been using File Storage for about 5 years now and have had just three database errors. All three were user error on our part (arising from a rather complicated setup). We fixed them by rolling back to the back-up of the night before. You'll find that most Plone integrators consider the Data.fs to be the most stable part of the system.
It feels like a black box
Yes it is, because you'll need to use Python to really see what you've got in there, but, on the other hand, many CMS's running on relational databases will treat their datastore as a black box too (don't dare go near it with SQL). If you have a need to know what's going on, or a requirement to run complex reports on the data in your Plone site, then you can mirror it out to a relational database using ore.contentmirror. I've played with this once and found it pretty easy to set up, but I'd get familiar with Zope and Plone first.
Doesn't it get massive?
Yes it can get quite big, but on the whole this hasn't been an issue for us. There are ways of dealing with this. If you have a number of sites you can fragment your data storage into several Data.fs, or even do this for a single site. This is a bit of an advanced concept primarily documented here:
If you are handling large numbers of binary files, then consider adjusting your binary file content types to be stored separately on the file system. There's a module to do this which we've used successfully on our Medical Sciences Intranet - if you have the rights you can look at MSDFile on our subversion server to see how we did it. It involves a little bit of content type development work.
In Plone 4, storage of large files on the file system should be available out-of-the-box (plone.app.blob). You can see progress on this on the product page on plone.org. If you're making an assessment of Plone now, it would be worth getting hold of the first beta and playing with Plone 4.
How do I back-up and restore?
Just copy the Data.fs file. There's a facility called Repozo already available within Zope which covers database packing and incremental backup. You can set up a cron-job to run this regularly and its advisable to pack regularly. It's nicely documented here
You can use Repozo to restore as well, but if you have a full back up of a Data.fs it is straightforward to stop Zope, replace your Data.fs with the back up and restart.
Is my data stuck in this Data.fs thing?
No, it's just that you can't see it as easily as you can using SQL. In addition to the mirroring and file system storage mentioned above:
- You can WebDav
- You can move chunks of data from one site to another using ZEXP
- You can write scripts to export as csv, xml, json. There are nice Python modules to take the pain out of all of this - simplejson, xml.dom.minidom
- XML import/export is supposed to appear in one of the Plone 4 releases. It will be based on collective.transmogrifier (look at quintagroup.transmogrifier for a Plone specific implementation of this).
Relational
An object-oriented database is quite a good way to model a website. It's fairly easy to envisage folderish objects containing other objects, in the same way as a file system tree. Actually, this is a lot easier for your users and editors to imagine than a relational system (it can get confusing when things seem to live in more than one place).
If you're used to thinking relational then it takes some time to think yourself into an object-oriented approach. It might be that a relational database really is the right way to model your data, but it is well-worth investigating whether the object-oriented approach will do the job for you. Here are some notes:
Can I stick a relational database behind Plone?
Yes and no.
You can't substitute the Data.fs for a relational database (although there have been experiments to try this, none have really come to fruition yet).
If you reckon that your data can be modelled in an object-oriented fashion but you need the SQL convenience to run reports, then consider ore.contentmirror to serialize your Plone content to a relational database (it works one way only - you can't write back via the relational database).
If you have a legacy relational database that you need to access, then its perfectly possible to integrate this into a Plone site at the templating level. There are ZSQLMethods to create SQL queries, the resulting data rows can then be easily integrated into a template using the standard ZPT templating language. Once you're familiar with the concepts then it is easy enough to build an interface to both read from and write to an SQL database. There are various options for building the forms to do this:
- PloneFormGen - a TTW interface for form building, very stable product, almost core - convenient for web-editors and well-documented, you might find it slicker to use a template
- Zope Templates - investigate the search form template in CMFPlone/skins/plone_forms to see how this is done.
- Z3c.forms - find your developer feet first, but then this is nice
On www.medsci.ox.ac.uk/skillstraining we've used a Plone content type for each training course - storing some attributes in the ZODB, but then use the template for that content type to additionally look up some extra information from a MySQL database. We used ZMySQLDA and ZSQLMethods.
It would also be worth investigating Chapter 12 in Plone Professional Development (although bear in mind you may need to do some digging around and googling to come up with the latest versions and optimal combination of the products used there). This looks at the more advanced process of object-relational mapping.
Can I model relationships in the Plone OO Database
Yes you can make simple relationships between content and don't forget that nesting one piece of content within another is a relationship in itself. You can always find the parent of an object, and inherit attributes from a parent. The related content option you see with each of the out-of-the-box content types is actually using relationships.
If you've got your own data to model, you'll want to build your own content types, which is a more advanced integrator-type task (again, not difficult once you've got the hang of it). http://intranet.medsci.ox.ac.uk uses a set of related content types (an officer, a committee, a process and an office location) - if you've got access to our subversion repository then you'll find them in MSDMso. Here's the basics:
- You can define a reference field on any content type, make it multi or single valued and restrict it to one or a group of content types.
- The referencebrowser widget makes it easy for the editor to browse the site to add references - you've got quite considerable control over configuring this - the readme file will tell you more.
- There are methods to locate the references from and to an object (getRefs and getBRefs)
- Look at the computeRelatedItems script in CMFPlone/skins/plone_scripts to see how this is done (this will probably get converted to a more sophisticated "browser view" in the future). This script usefully checks to see whether the visitor has permission to see each of the related items, to avoid unnecessary authorization errors in the template.
- Chapter 12 of Plone Professional Development gives you a bit more information.
