Content storage strategy during development

Hello,

Our team has built a few small demo/proof-of-concept projects on BloomReach over the past couple months, and we’re still disagreeing over the correct strategy for storing our content during development. We also need to understand what, if anything, needs to change in our content strategy as we transition from development to deployment/go-live.

I might start to ramble a bit as I try to reason through these questions, I apologize in advance. I’ve divided the conversation topics into sections below to help organize them. I am hoping that some community members can help validate my thoughts around this.

Context regarding our team

A bit of context, our team primarily consists of experience with AEM/CQ5. In AEM, our content is generally excluded from the version control system (git), and managed with the use of content packages.

Some members of our team have attempted to follow this same model with packages consisting of YAML exported from the Hippo Console, but this feels incorrect. In fact, it has proven to be problematic. Since these packages don’t have include/exclude patterns, dependency management, or subpackages, it is very easy to accidentally overwrite data, import to the wrong repository path, or miss necessary packages.

Content vs Config

In BloomReach, it seems as though content belongs in the VCS repository (YES PLEASE!), or at least the bootstrap content does. We understand the bootstrap concept, but I’m also skeptical of if we really want the content to be limited to bootstrapping during development, or if it makes more sense to treat the content (documents) as config and update it on every local cargo start.

The reason I believe the documents should be treated as config is because, in my mind, every dev should have the complete project contents everytime they run their local Hippo instance.

For example, if another developer makes changes to a component and includes content changes with that, then we would expect all developers to have the updated content when they pull in those changes and restart cargo.

If documents are marked as content, then anybody running Hippo with an external repo.storage path would have to start the project with the bootstrap flag set to true in order to update the content in their repository. Without an external repo.storage path, developers would still have to be sure to mvn clean. Are there any reasons you would advise against storing documents as config? Is there a better way to solve this?

Application vs Development

Now, as for location of content in the VCS repository, I understand that repository-data/application is for data which should go everywhere (including production), and repository-data/development is for data which will be excluded from the distributables.

When we create a project from the archetype, content is stored under repository-data/application. This is somewhat surprising to me.

I suppose it makes sense, since content bootstrapping will ensure that only new content is ever added to the JCR, however this conflicts with my previous assertion that documents should be stored as config during development (in which case, documents should definitely be stored in repository-data/development).

Merge conflicts from combined folder and document YAML

One of the teams biggest pain points in developing with documents stored as content has been merge conflicts.

It seems that the content for all documents in a directory will be combined into a single YAML file (/content/sample-document and /content/another-document are both stored in their parent folder’s content.yaml). Splitting the documents into their own files solves 90% of the issues that we have had with the content storage so far (the extensive merge conflicts).

I have tried to separate that content into three YAML files (content.yaml, content/sample-document.yaml, content/another-document.yaml), but I received errors when I attempted to start cargo with that data. I was, however, successful in splitting these files when I converted the documents from content to config. Unfortunately, new documents would still be exported to the parent folder’s YAML file, but at least we could manually split that out whenever we create a new document.

So did I just do something wrong when I split the YAML files? Is there actually a way to store documents as content, with each document in a YAML file of its own? And better yet, is there a way to force BloomReach to export documents to their own YAML files, instead of including them in the folder’s YAML?

Runtime reloading for config YAML

Finally, I have been wondering about hot-reload/sync for hcm-config files (I would call it auto-reload, but that refers to the auto-reload module for forcing browsers to reload).

This does not appear to be a thing, it seems that the server must be restarted in order to load config changes into the repository. I’m curious why this is, and if anybody has tried to implement auto-reload for config files. It’s great that we can use auto-export to sync changes made in the CMS onto our filesystems, but I frequently find that I would prefer to alter YAML on my filesystem than to modify configurations in the CMS Console.

I attempted to do this using the webfiles module, with the use of a symlink to bypass the SubDirectoriesWatcher#DIRECTORY_FILTER. It worked, but the WebFilesWatcher and related classes explicitly treat the watched files as webfiles and write them to the repository as such. Scanning through the related classes, it seems conceivable that a similar ConfigFilesWatcher module could be implemented in order to support the watch and sync of config files.

I suppose the biggest concerns would be:

  • Infinite loop created by auto-export and ConfigFilesWatcher
  • Attempts to reload invalid YAML (if an IDE saves automatically, a YAML file may be in an invalid state when it tries to sync/reload)

Thank you in advance!
Dave

1 Like

I’m interested in what others have to say about this as well.

I’ve come to the conclusion that I don’t want to count on development having all of the content, but then when I want to test/build new content it is nice to work on it development first.

If there is already documentation on this, then a link would be great.

thanks also,

John

Code/configuration/content flows are described here:

Normally we keep only few documents for development purposes.

1 Like

To expand on what machak linked (see also link [1] below), content does not belong in the project except for development content and possibly some initial bootstrap content for new features. Content belongs to the production repository only. This may be copied to acceptance/test servers, but shouldn’t be used for local development generally.

Content is created by authors, not devs. Their environment is the cms. This is controlled by workflow and is versioned in the repository itself (upon publish). Given this there is little need for a hot deploy scenario. I would even consider this dangerous. Config changes should be done on development, then tested on test and acceptance environments before reaching the server. Only for emergencies should you do config changes on a running system, in which case it will be console work, preferably first on acceptance then production. While it should be possible to do hot reload of config, it may still not work as sometimes it references classes or services that are initialized only on startup.

As for autoexport [2], you have some limited control over it’s behavior. Generally speaking though, it is impossible for it to conform to every possible division of content. It also has to be predictable to the engine as it needs to be able to match a node to a file. You can file improvement requests on the behavior, but it is up to engineering and product management to determine whether any feature is desirable and/or worth the effort. It is not a trivial exercise to get it to work in the first place.

You most likely did do something wrong in creating the content yaml files. The syntax there is slightly different. Please refer to the documentation found under [1]

See also:
[1] https://www.onehippo.org/library/concepts/configuration-management/introduction.html
[2] https://www.onehippo.org/library/development/automatic-export-add-on.html

Thanks for the input. I should have included these links in the original post, because I had already reviewed both in depth. That is largely how I arrived at my questions/uncertainties, especially from [1] and the related documentation.

I feel like this furthers my argument in the section Application vs Development. If content should not be included in deployments, why does the archetype store the sample content in Application instead of Development? It seems to me that the best practice would be for all of /content to reside inside Development.

Production content is created by authors, but my questions are focussed on development. During development, developers do need to create content along with their components/features, for the purpose of being able to demonstrate the populated components. It’s preferable that every member of the dev team should have all of that content, all the time.

And yes, hot deploy of config does not make sense for production, but I think it would be very valuable during development. The console can be slow and clunky to use in situations where I know the location of the YAML in my repository and know which changes are necessary. It’s in these cases where config sync/hotdeploy could improve development efficiency. This becomes even more apparent if you start treating content during development as configuration files (assuming content changes would never be hot deployed, since that contends with the bootstrapping mechanism). As a developer, there may be times where I want to make bulk changes to development content, such that it would be much quicker to find-and-replace across multiple files, than to use the CMS UI. I can still do this, but without hot-reload, I need to restart cargo to apply the changes.

And yes, some things may not be reloadable, if classes/services are initialized on startup and do not reinitialize when their config changes. But that’s not related to hot deploy, it is already an issue with configuration changes made in the console. For example, updating the configuration of most (maybe all) modules does not reinitialize them. Updating the config for the WebFilesWatcher module logs an error:

HippoServiceException: A service was already registered with name...

[INFO] [talledLocalContainer] 16.11.2018 14:07:45 WARN ObservationManager [ObservationDispatcher.run:163] EventConsumer org.onehippo.repository.modules.AbstractReconfigurableDaemonModule$ModuleConfigurationListener threw exception
[INFO] [talledLocalContainer] org.onehippo.cms7.services.HippoServiceException: A service was already registered with name org.onehippo.cms7.services.webfiles.watch.WebFilesWatcherService
[INFO] [talledLocalContainer] at org.onehippo.cms7.services.HippoServiceRegistry.registerNamedServiceInternal(HippoServiceRegistry.java:274) ~[hippo-services-4.6.0.jar:4.6.0]

I’ve reviewed [2] once again, but it’s not clear if any of the available configuration allows us to specify how content should be divided between files. The section File Structure is the only place that I could find any mention of how nodes are serialized to files. Unfortunately it offers very little detail as to how that is decided, instead just stating that it follows best-practice conventions.

Nope, I tried this again, and I’m pretty sure the issue exists with the way Hippo keeps track of content files which have already been imported. When we try to split the documents out of their folders’ YAML files, we get an exception during startup unless we start with a clean JCR repository.

For example, I used the archetype’s sample banners content and split it into three YAML files: banners.yaml (the original, for the folder), banners/banner1.yaml, and banners/banner2.yaml. If I start Hippo with a clean JCR repository, that content all gets bootstrapped correctly. If I change one of the documents, the changes are auto-exported to the correct file (banner 1 changes export to banners/banner1.yaml).

However, if I create a new document in the CMS (call it banner3), auto-export serializes it to the folder’s YAML file (banners.yaml) again. If we move that content to a new file, banners/banner3.yaml, then try to restart Hippo (even without bootstrapping enabled), we get this exception on startup:

ItemExistsException: Node already exists at path /content/documents/xumakcom/banners/banner3

[INFO] [talledLocalContainer] 16.11.2018 15:10:59 ERROR localhost-startStop-1 [ConfigurationContentService.apply:174] Processing ‘APPEND’ action for content node ‘/content/documents/xumakcom/banners/banner3’ failed.
[INFO] [talledLocalContainer] javax.jcr.ItemExistsException: Node already exists at path /content/documents/xumakcom/banners/banner3
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.JcrContentProcessor.validateAppendAction(JcrContentProcessor.java:110) ~[hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.JcrContentProcessor.apply(JcrContentProcessor.java:135) ~[hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.ConfigurationContentService.apply(ConfigurationContentService.java:157) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.ConfigurationContentService.apply(ConfigurationContentService.java:87) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.ConfigurationServiceImpl.applyContent(ConfigurationServiceImpl.java:666) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.ConfigurationServiceImpl.init(ConfigurationServiceImpl.java:216) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.onehippo.cm.engine.ConfigurationServiceImpl.start(ConfigurationServiceImpl.java:122) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at com.onehippo.repository.HippoEnterpriseRepository.initializeConfiguration(HippoEnterpriseRepository.java:178) [hippo-enterprise-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.hippoecm.repository.LocalHippoRepository.initialize(LocalHippoRepository.java:292) [hippo-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at com.onehippo.repository.HippoEnterpriseRepository.create(HippoEnterpriseRepository.java:63) [hippo-enterprise-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at com.onehippo.repository.HippoEnterpriseRepository.create(HippoEnterpriseRepository.java:53) [hippo-enterprise-repository-engine-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_91]
[INFO] [talledLocalContainer] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_91]
[INFO] [talledLocalContainer] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_91]
[INFO] [talledLocalContainer] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_91]
[INFO] [talledLocalContainer] at org.hippoecm.repository.HippoRepositoryFactory.getHippoRepository(HippoRepositoryFactory.java:147) [hippo-repository-connector-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.hippoecm.repository.RepositoryServlet.init(RepositoryServlet.java:184) [hippo-repository-servlets-5.6.0.jar:5.6.0]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1144) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1091) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:983) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4978) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5290) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:985) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1857) [catalina.jar:8.5.34]
[INFO] [talledLocalContainer] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]
[INFO] [talledLocalContainer] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
[INFO] [talledLocalContainer] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
[INFO] [talledLocalContainer] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
[INFO] [talledLocalContainer] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

I have figured out that adding hcm-actions to reload each node which we have split to a new file resolves that issue. Marking these paths as reload also causes them to be re-imported to the repo on startup (overrides bootstrapping and imports everytime), which may not always be desired. Unfortunately, action-lists don’t seem to support wildcards, so every single document we create would have to be added to this list.

hcm-actions.yaml

action-lists:

  • 1:
    /content/documents/xumakcom/banners/banner3: reload

For content which is needed during development, it still feels like the best option is to treat the content as config, since we do want that content reloaded everytime the server is started, and because then we can separate nodes into their own YAML files without this added hcm-actions overhead. Am I nuts?

@marsdev, I am curious how you have approached this on your projects. Does anything in this post stand out to you as either highly desirable or completely wrong?

Thanks again,
Dave

@dhughes-xumak I’m very new to Hippo and am still working on our site. I don’t have a strong opinion yet. But I appreciate you sharing your thoughts.

So I think you make good points. I think I agree with you for a large part.

Yes, at this point the nodes themselves are marked as coming from a specific file. But if autoexport has to create files then it needs to make a decision on how to split them. This behavior could be different, but a choice was made. It is something that needs to be absolutely correct, so perhaps that influenced design.

No what you say makes sense. At this point all I can say is I will bring it to the attention of our PM/Engineering, if they aren’t reading along already.

In our development team are having the same problem with the data and files in the local environment. We are interested in this thread.

It has been almost 13 months since I raised this thread.

I’m just checking in to see if brXM 13 or 14 has included any changes spurred by the questions/concerns raised in this discussion. Does anybody know of any changes in the last two major releases which relate to this discussion?

Thanks!
Dave

Coming from AEM too this was an interesting read and I am also still wondering about how to migrate content. Believe it or not… in the real world, the business expects full blown demos of content pages like they were discussed in the web designs. So all content must be already there! In AEM we could manage this using their package system. But for brXM I am not so optimistic at the moment.

A disadvantage of content moving stricty from PROD to ACC/TEST/DEV is that you might want to have some “development” data to run test suites like Cypress or Selenium against. This could, of course, be done locally by individual developers, but I think it generally makes more sense to run those from (and against) a more centralized machine, like a TEST server. At this point in time, “test data” created on TEST will be overwritten whenever we try to make our TEST environment more representative by copying over repository data from PROD. Which is… not ideal :slight_smile:.