Lucene Content Index - shared folder

Hi there, I can see there is a way to export an index, but we are running in a docker container and current the index is created every time the container starts up.

What I would like to know is is it possible to have the index on a shared folder or not in the file system, say use elastic search of something that is outside the container, so we do not have these startup issues.

The Lucene exporter does not really solve this I feel.

J

You cannot share an index directly between instances. Indeces can be copied from one instance to another [1] [2]. For docker you can mount a volume where you keep the index. There are likely many strategies that work for that. Fairly simple would be to copy an index to a location that can be mounted by the docker container. Please also make sure of cleaning up the REPOSITORY_LOCAL_REVISIONS table, if you are starting new containers then you are likely creating unique node ids for each instance.

[1] Backup and Restore Strategy - Bloomreach Experience - Headless Digital Experience Platform
[2]

Hi, I tried mounting a volume for the index location but that caused all sorts of problems and crashed the site.

Is it possible to configure the index to reside somewhere else other than the main storage path, then I can expose that and mount as a volume easily.

Also would there be any read write contention if two nodes attempted to update the index?

Currently we have two nodes running at the moment…

J

Each node needs it’s own index. You will get index corruption if you share the same index.

Your index is stored in a location defined by repo.path. If you store it on a volume /mydisk/storage, then you can mount that in docker with docker run -v /mydisk/storage:/some/location … This can be empty to start and an index will be built, or you can have an index waiting there. Your storage location is likely set in a setenv.sh, but in any case by the property repo.path. Set this to /some/location.

ok that makes sense.

With multiple nodes, it would be tricky trying to configure a shared folder for each node, as each node knows nothing about any other node, these are docker containers, not physical servers, so not sure how this is realistically going to work.

Hows does the bloomreach cloud cope with this issue?

Thanks
J

I’m not an expert with docker. I know you can also do a cp command, even on a stopped container I believe. I’ll see if I can get someone with more knowledge to chime in. You might want to contact our support to see if you can get an infra specialist involved.

You just have to copy the index into container before tomcat startup.
We have created documentation that discusses these points. Not just lucene index export but also repository maintenance as well. On-Premise Kubernetes Setup - Bloomreach Experience Manager (PaaS/Self-Hosted) - The Fast and Flexible Headless CMS

Hi awesome, in my head I was thinking along the same lines before I read your comment. I will checkout the docs!

Thanks
J

Hi, so exported the index, configured a volume, dumped the contents in there and in the entry point copied it into the index folder before apache starts up.

However… I then got this error
13.03.2021 15:22:16 ERROR main [RepositoryServlet.init:249] Error while setting up JCR repository:
web_1 | javax.jcr.RepositoryException: unchecked exception: java.lang.IllegalStateException: Index already present
and it all fell over big style…

Thoughts?

Hi, so exported the index, configured a volume, dumped the contents in there and in the entry point copied it into the index folder before apache starts up.

However… I then got this error
13.03.2021 15:22:16 ERROR main [RepositoryServlet.init:249] Error while setting up JCR repository:
web_1 | javax.jcr.RepositoryException: unchecked exception: java.lang.IllegalStateException: Index already present
and it all fell over big style…

Thoughts?

Hi all, has anyone got a suggestion as to how to import an index before the site starts up and also to stop it blowing up like it is at the moment?

As mentioned the index is currently stored in the docker container. So On startup I copy from a volume mount into the target folder, then start apache, however we are getting the JCR error…

Is there a config item I need to change??

Hey up, is there anyone that can guide me here please…?

current site startup is 20 mins! so we kinda of need to resolve this…

Hi,

unfortunately the documentation linked above is probably better advice than any I can give. If there is no easy answer that someone here can give then the best advice I can give you is to contact Bloomreach for formal support. Though it may be more a question of docker knowledge. In any case, I have never seen that error. Possibly you are adding the index volume after the tomcat is starting?

Hi appreciate the response, however the index is copied in before it starts.
In the docker file it runs this command

exec /usr/local/tomcat/bin/catalina.sh run

This is the last command to run…

Thanks
J

So found the actual section that I needed.

However I am getting this error
/brxm/bin/docker-entrypoint.sh: line 38: unzip: command not found

I have tried installing unzip into the image but it keeps failing…

Hi, so we got it all working.

Next question… is there a way to pass the authentication as a header instead of basic auth? as we have basic auth enabled on our test sites so you get a double basic auth, and I want to automate the index extraction?

Thanks
John

You can take a look at com.onehippo.cms7.index.IndexExportModule to see how the authentication is done. Possibly you could override that part and plugin your authentication method. Although I don’t follow your reasoning. " as we have basic auth enabled on our test sites" lucene index export is part of the cms webapp, not sure why site is related.
(Also, if you could create new posts for different topics it would be easier for other people to find out answers)

Hi ya, sorry I got this working 2 weeks ago after your comment and also we had some time from one of your consultants. Appreciate your help.

With regards to different topics, I kinda of thought it was the same… just further along… :smiley: