HippoCMS with Redis instead of file storage

Hello,

I’m wondering if it’s possible to replace Lucene filesystem storage with some other like memory cache - for example Redis. Creation of lucene index (storage folder) takes a lot of time for bigger websites during warmup. In some cases I experienced 40 minutes warmup due to indexing operation. I think that moving this to Redis might speed up startup operation.

Is it anyhow possible? I found that there is DirectoryManager class and we can provide our own implementation of getDirectoryManager() to override default behaviour and return our own manager implementation.

Do you have any experience with this? Do you think that it’s possible and worth effotr trying to keep this index in-memory to speed up and make it possible to share index between instances in cluster?

Maybe it’s somewhere out of the box in lucene and I’m not able to find it?

Thank you very much in advance for any help/tips.

There’s a lucene index export addon which allows you to download the index export of one of the instances in the cluster (enterprise offering):
https://documentation.bloomreach.com/library/enterprise/enterprise-features/lucene-index-export/lucene-index-export.html.

You could use this export to start up faster

Indexing is one time operation and you don’t need to index after each restart.

I know but it’s system working in cloud and instances can be removed or added any time

As Baris already mentioned, we solve that problem using lucene index backup and restoring solution (in the enterprise stack).

I once wrote some utility scripts to backup and restore the lucene index easily in that scenario from the enterprise lucene export endpoint:

Its dockerization is probably deprecated now since v13 ships its own docker support in product level, but the scripts (*.sh) files might be still useful to download/backup, restore, or re-initialize on startup (when invoked by setenv.sh) lucene index folder.

In Lucene 3.6, which Jackrabbit depends on now, I see only FSDirectory and RAMDirectory as essential Directory implementations, each of those is mapped to Jackrabbit’s manager implementation: FSDirectoryManager or RAMDirectoryManager.
So, it doesn’t seems that Redis is supported even in Lucene (3.6) level, so it will be very difficult.

According to the warning in the JavaDoc of RAMDirectory, it doesn’t sound like for normal production envs:

Maybe in the latest version, but not in v3.6 as far as I can see.

As Apache Jackrabbit v2 is under the maintenance mode and mostly done for API compatibility for Apache Jackrabbit OAK, I don’t expect a big new feature or improvement in that area in Jackrabbit v2.

Regards,

Woonsan

Thanks, I’ll check if we can migrate to Enterprise

@woonsanko
If we use the lucene import export and we want to use auto-scaling in cloud - how do we set the cluster id for each instance? Based on documentation on hippo site, cluster id is set hostname-whoami and hostname when deployed through a ECS or EKS may not be the same everytime there is a restart or autoscaling?

The online documentation is not specifically for that kind of cloud envs, but it just says in principle:

  • A repository instance must have a unique cluster id. And the cluster ID should be the same as before when restarting; otherwise, it will resync again on startup.
  • As an example, you can use the combination of hostname and whoami.

If you cannot use the combination, you can find a different way. Not sure, but for example you can figure out the unique id generation in that specific env or you can even pass a unique ID somehow… But those are cloud platform specific.

Regards,

Woonsan