Cleaning datastore with checker tool


we are currently trying to reduce our database to a more manageable size. As we now understand, entries are never deleted from the datastore. This must be done manually. We use the DbDataStore as datastore implementation. The database table has grown and has reached a size of over 50 GB. I have tried to run the checker tool with the cleands command, but it seems to get stuck after a while. At some point, after 2 hours running, I saw the following log entry and then nothing more was printed for the next 6 hours.

10:41:40 Loaded 5436000 nodes

Running it locally on a smaller database, I see the tool properly finishing:

10:39:58 Loaded 115000 nodes
10:39:58 Removed 0 binaries
10:39:58 Shutting down repository
10:39:58 Repository has been shut down

Any idea what could have gone wrong here? Are there, for instance, any hardware requirements we need to take into consideration, like a certain amount of available memory? I didn’t see any error messages.

I also saw Apache Jackrabbit has a GarbageCollector class, which can remove unreferenced entries from the datastore. We could use this in a Groovy updater script I think. Would this have the same effect as using the cleands command of the checker tool? Or is there any reason to prefer the tool over programmatic use of the garbage collector?

Thanks for your time!

I don’t think GarbageCollector works any different and you would be running it within the same process as your CMS & increased chances of crashing the server.

Running for a number of hours might be right so I would just leave the cleaner process running
…in addition, you might run other cleanups, cleaning journaling table and logs might help bringing size down…(ideally, journaling table should be cleaned up every couple of weeks)

Hi Mackak,
thanks for your response. We’ll stick to using the checker tool. I increased the cpu, memory and storage size of the environment the database and tool were running in and this time cleaning and optimizing the datastore was successful, halving the size of the datastore. The checker tool may have finished successfully before, but the OPTIMIZE TABLE operation could not complete due to lack of storage space, so I was unable to confirm this.

We are planning to add regular cleanup, also of the journaling table, to our infrastructure later on. Still have some work to do before we get there.