Site brought down (potentially by indexing error)

Hi,

Our website https://www.nhbc.co.uk/ was brought down by what seemed to be a search index issue. At the end of the error logs we had the following exception:

25.02.2021 03:00:04 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103364, usedmemorykb=262077, maxmemorykb=262144, access=62479174, miss=261914
25.02.2021 11:35:36 INFO  pool-7-thread-1 [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103392, usedmemorykb=262142, maxmemorykb=262144, access=62479428, miss=262090
25.02.2021 11:39:18 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103385, usedmemorykb=262116, maxmemorykb=262144, access=62479555, miss=262109
25.02.2021 11:47:34 WARN  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [SearchIndex.updateNodes:729] Exception while creating document for node: 46f20fbc-8210-4b3d-bc30-823589bd14b4: javax.jcr.ItemNotFoundException: org.apache.jackrabbit.core.state.NoSuchItemStateException: 46f20fbc-8210-4b3d-bc30-823589bd14b4/{http://www.onehippo.org/jcr/hippostd/nt/2.0}content
25.02.2021 11:47:34 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103391, usedmemorykb=262139, maxmemorykb=262144, access=62479682, miss=262137
25.02.2021 11:47:39 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=versionBundleCache[ConcurrentCache@4a24b7ce], elements=43845, usedmemorykb=59600, maxmemorykb=65536, access=975741, miss=44066
25.02.2021 11:50:04 WARN  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [SearchIndex.updateNodes:729] Exception while creating document for node: 5722c838-fbe6-4284-bebd-9f2f01f25026: javax.jcr.RepositoryException: Missing child node entry for node with id: 5722c838-fbe6-4284-bebd-9f2f01f25026
25.02.2021 11:50:04 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103387, usedmemorykb=262143, maxmemorykb=262144, access=62480063, miss=262196
25.02.2021 11:51:55 WARN  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [SearchIndex.updateNodes:729] Exception while creating document for node: 9c55b168-e9fc-4818-90a6-8adaa6b8538e: javax.jcr.ItemNotFoundException: org.apache.jackrabbit.core.state.NoSuchItemStateException: 9c55b168-e9fc-4818-90a6-8adaa6b8538e/{http://www.onehippo.org/jcr/hippostd/nt/2.0}content
25.02.2021 11:51:55 INFO  ClusterNode-nhbcwebsite-7dd946cb4f-mpplp [AbstractBundlePersistenceManager.logCacheStats:876] cachename=defaultBundleCache[ConcurrentCache@3f91d09a], elements=103373, usedmemorykb=262142, maxmemorykb=262144, access=62480571, miss=262285

Any information as to why this would bring the site down and how we can stop it from happening again would be greatly appricated.

We were able to fix by restoring an earlier version of the database and re-scaling the server, but as we’re unsure of the exact cause, we would like to know how to stop this from happening again?

Best regards.

This could be your index being corrupted. There is no need to restore the database, instead you could’ve just deleted the index on that node and restarted. That will cause the index to be be rebuilt. Else there is an index consistency check which you can enable on startup, which should be faster than rebuilding the index but isn’t guaranteed to fix any issues. For repository problems you can run the repository checker, restoring a database means data loss which you want to avoid.

As to why your index is corrupted, that’s hard to say. Possibly you have custom code that writes to the repository that somehow messed up? Any importers?

Thanks for your reply. With regards to deleting the index, is this just a file somewhere or is it within the console? Thanks for the documentation links for consistency checks and repository checker. I have also raised this question as a support ticket and it was indicated that it was unlikely it was the index that blew the site, so i’ll hold fire on the index checker for now. Not got any custom code that writes to the repository or importers so think we’re good there. Have improved the logging on the box itself so if it does happen again we should have more info.

For local development this is in target/storage, unless you give a parameter repo.path, in which case it will be at the path you give (relative to project root if a relative path is given). For servers you are likely setting repo.path in a variable in setenv.sh. Else it will be in your tomcat folder.