Hi,
we are currently trying to reduce our database to a more manageable size. As we now understand, entries are never deleted from the datastore. This must be done manually. We use the DbDataStore
as datastore implementation. The database table has grown and has reached a size of over 50 GB. I have tried to run the checker tool with the cleands
command, but it seems to get stuck after a while. At some point, after 2 hours running, I saw the following log entry and then nothing more was printed for the next 6 hours.
10:41:40 Loaded 5436000 nodes
Running it locally on a smaller database, I see the tool properly finishing:
10:39:58 Loaded 115000 nodes
10:39:58 Removed 0 binaries
10:39:58 Shutting down repository
10:39:58 Repository has been shut down
Any idea what could have gone wrong here? Are there, for instance, any hardware requirements we need to take into consideration, like a certain amount of available memory? I didn’t see any error messages.
I also saw Apache Jackrabbit has a GarbageCollector
class, which can remove unreferenced entries from the datastore. We could use this in a Groovy updater script I think. Would this have the same effect as using the cleands
command of the checker tool? Or is there any reason to prefer the tool over programmatic use of the garbage collector?
Thanks for your time!