Relevance: Custom collector & characteristic, etc or not?

Dennis_Nijssen · December 14, 2018, 12:43pm

Given the following use case:

As a webmaster I want to be able to personalise my website for visitors that came (now or in a previous session) via a link in my email newsletter. Links in my email newsletter have specific utm parameters to recognize them. If the visitors has not come via a newsletter link in x days, we no longer consider him to be a subscriber.

In the past we had enabled the RequestParamsCollector, collecting the utm parameters hoping this would be enough to enable the webmasters to personalise specific components for the visitors. However as Add the Collectors Bundle to a Project - Bloomreach Experience Manager (PaaS/Self-Hosted) - The Fast and Flexible Headless CMS says: This collector only collects request data and will not update visitor data. As a result you cannot use this collector for personalization and a characteristic and UI plugin are not available.

So I’ve been wondering what is this collectors use? It only seems to save the parameters as part of the collectorData inside the requestLogEntries, but what is it used for?

Extending and overriding the collector does not seem to be possible either in my opinion, as the #updateTargetingData(...) method needs to return a VoidTargetingData, and I can’t add anything to that. And extending the VoidTargetingData seems like a bad practice to me

So what is the best possible solution for my use case? Do I need to create a new collector, together with a characteristic, scorer and UI plugin in order for our webmasters to personalise on the utm parameters? I was thinking of creating a targetingData object as followed:

"utmParameters": {
  "newsletter": true,
  "blogAlert": false
}

However… How will it forget these values after ~90 days? Do I need to save a timestamp of the latest moment an utm parameter was collected? Something like:

"utmParameters": {
  "newsletter": { 
    "value": true,
    "timestamp": 1544790513133
  },
  "blogAlert": {
    "value": false,
    "timestamp": 1544790513133
  }
}

Or am I thinking completely in the wrong direction? I wanted to check this first before we are going to implement it, otherwise if it might be the wrong solution after all, we have a lot of garbage in our targeting data store.

woonsanko · December 28, 2018, 9:16am

Hi Dennis,

I don’t think you can reuse or extend the RequestParamsCollector for your use case.
I personally have thought that the concepts of our Relevance module, or how it has been explained, confuse many developers and therefore it is very hard to apply to real use cases.

Basically, our personalization in pages/components based on the Relevance module is really about defining, inferring and evaluating classifications or category variables based on various inputs such as visitor specific data (e.g, request parameters, cookies, url, authentication info, geolocation, …) or non-visitor specific data (weather, day of week, …). These are explained very well in [1] and [2].

Anyway, from your use case, you need to do the following:

Define your category variable for business users so that they can select value(s) of the defined variable in component setting windows / variants settings to define rules of component renderings.
For example, your category variable might be “visitor_via_newslinke”, which can be technically determined from a request having “specific utm parameters” and coming “in x days”.
Please note that the category variable should be the targeting data itself or main part of it.
If you build everything from the scratch: Collector, CollectorPlugin, … (probably 4+ Java classes, 2+ Ext JavaScript classes, etc.), it will take a lot of efforts in both initial implementation and maintenance, for that simple goal.
Instead if you use the EIRE [1], then you will fulfill your use case within a couple of days, I’d argue.

HTH,

Woonsan

[1] https://www.onehippo.org/library/enterprise/services-features/inference-engine/introduction.html
[2] https://www.onehippo.org/library/enterprise/services-features/inference-engine/expressions-in-inference-rules.html

Dennis_Nijssen · January 12, 2019, 7:52am

Hello Woonsan,

Thank you for your comment.
We have recently started installing the EIRE plugin (BrXM 12.6.0), and writing our first JEXL (Inference Rules) script.

However I noticed a few things, for which I really wish I could just create the SERVPLUG issues in Jira myself…
But since thats not possible I’ll try to summarize them as good as possible in this post, and maybe you can create them in the SERVPLUG project (and hopefully also fix them )

#1:
When opening the “Document Types” tab in the Content Perspective a warning is thrown

10.01.2019 09:09:21 WARN  http-nio-8080-exec-4 [ObservableTreeModel$ExpandedNode.get:183] Unable to find child inferenceengine[1] in observable tree model for org.hippoecm.frontend.model.JcrNodeModel@2d9cc769[
  itemModel=org.hippoecm.frontend.model.JcrItemModel@5d50e7aa[
  path=/hippo:namespaces
]
]

#2:
$.request.getParameterValues(parameterName) returns an instance of Arrays.ArrayList which is not whitelisted.

Looking at the implementation of DefaultGenericServletRequestModel#getParameterValues(String name) I’ll see the usuage of Arrays.asList(...) which returns a (java.util.)Arrays.ArrayList instead of the regular (java.util.)ArrayList. This ArrayList is however not whitelisted and therefore we can’t call methods on it like .contains(...).
It was a hard one to found out, cause I kept getting a warning something like “unsolvable method .contains”, which didn’t point me in the correct direction right away.
However I’m not sure Arrays.ArrayList can even be whitelisted (I tried, but didn’t succeed…), since it’s a privacy static class.
So it’s probably better to return a regular (java.util.)ArrayList instead.

Also be aware a few lines above Collections.emptyList() is used as well, this also returns a (java.util.)Collections.EmptyList which is also a private static class (and not whitelisted??)…

For now I’ve fixed this by copying the values of the (incorrect) Arrays.ArrayList to a new ArrayList like:
new("java.util.ArrayList", $.request.getParameterValues(parameterName))

This is however a quick fix, and probably not the best one either, so I’ll think this is worth an issue Also to prevent time wasted by other developers, you might want to check if more of these cases appear around the EIRE plugin. And maybe write some extra documentation about it (for cases where developers use custom classes that use the same “utils” with instances of (java.util.)Arrays.ArrayLists

#3:
This is more a question than an issue, we noticed the Inference Rules documents always creates a characteristic UI with checkboxes. However what is the use of these checkboxes when the “data” stored in the targeting data is only a singular value (e.g. “blogalert” in our situation):

"newsletters": {
	"collectorId": "newsletters",
	"data": "blogalert",
	"extraData": null
}

Our Inference Rules document / characteristic defines two goals, namely: “General Newsletter” and “Blog Alert”, however it is possible to match with both, but the “data” of the targeting data is just a singular value right?

Do we need to create a mixed goal value like: “General Newsletter & Blog Alert”, but that would not be feasible when we got alot more goal values, because than every combination needs to be created… So is it better to create different Inference Rules for both goals instead? Which might grow as well?

Also I’ve read that returning “null” does not store (reset) anything in the targeting data either, so to “reset” a goal value it’s better to return a string “unknown”, am I correct?

Kind regards,
Dennis Nijssen

woonsanko · January 12, 2019, 12:57pm

Hi Dennis,

That’s great to hear! I’d like to help as much as I can!

You may create a customer project ticket instead with the info alternatively, but it’s fine to communicate here like this.

Dennis_Nijssen:

#1:
When opening the “Document Types” tab in the Content Perspective a warning is thrown

10.01.2019 09:09:21 WARN  http-nio-8080-exec-4 [ObservableTreeModel$ExpandedNode.get:183] Unable to find child inferenceengine[1] in observable tree model for org.hippoecm.frontend.model.JcrNodeModel@2d9cc769[
  itemModel=org.hippoecm.frontend.model.JcrItemModel@5d50e7aa[
  path=/hippo:namespaces
]
]

Did you start seeing this warning just after using EIRE? The inference rule engine document type was designed with very simple standard ones with External document picker fields only.
I’ll take a look into it further whenever possible to see if it’s caused by EIRE or something else. If you find any more clues, let me know.

Dennis_Nijssen:

#2:
$.request.getParameterValues(parameterName) returns an instance of Arrays.ArrayList which is not whitelisted.

Looking at the implementation of DefaultGenericServletRequestModel#getParameterValues(String name) I’ll see the usuage of Arrays.asList(...) which returns a (java.util.)Arrays.ArrayList instead of the regular (java.util.)ArrayList. This ArrayList is however not whitelisted and therefore we can’t call methods on it like .contains(...).
It was a hard one to found out, cause I kept getting a warning something like “unsolvable method .contains”, which didn’t point me in the correct direction right away.
However I’m not sure Arrays.ArrayList can even be whitelisted (I tried, but didn’t succeed…), since it’s a privacy static class.
So it’s probably better to return a regular (java.util.)ArrayList instead.

Also be aware a few lines above Collections.emptyList() is used as well, this also returns a (java.util.)Collections.EmptyList which is also a private static class (and not whitelisted??)…

For now I’ve fixed this by copying the values of the (incorrect) Arrays.ArrayList to a new ArrayList like:
new("java.util.ArrayList", $.request.getParameterValues(parameterName))

This is however a quick fix, and probably not the best one either, so I’ll think this is worth an issue Also to prevent time wasted by other developers, you might want to check if more of these cases appear around the EIRE plugin. And maybe write some extra documentation about it (for cases where developers use custom classes that use the same “utils” with instances of (java.util.)Arrays.ArrayLists

I will add “java.util.Arrays$ArrayList” to the built-in whitelisted types in com.onehippo.cms7.inference.engine.core.jexl3.Jexl3InferenceEngine with this ticket:

Log in - Issues

It already has the following as built-in whitelisted types:

public class Jexl3InferenceEngine extends AbstractInferenceEngine implements ComponentManagerAware {

    private static final String[] DEFAULT_WHITE_ARRAY = {
            Object.class.getName(),
            Object[].class.getName(),
            String.class.getName(),
            String[].class.getName(),
            boolean.class.getName(),
            boolean[].class.getName(),
            Boolean.class.getName(),
            Boolean[].class.getName(),
            byte.class.getName(),
            byte[].class.getName(),
            Byte.class.getName(),
            Byte[].class.getName(),
            char.class.getName(),
            char[].class.getName(),
            Character.class.getName(),
            Character[].class.getName(),
            Number.class.getName(),
            Number[].class.getName(),
            short.class.getName(),
            short[].class.getName(),
            Short.class.getName(),
            Short[].class.getName(),
            int.class.getName(),
            int[].class.getName(),
            Integer.class.getName(),
            Integer[].class.getName(),
            long.class.getName(),
            long[].class.getName(),
            Long.class.getName(),
            Long[].class.getName(),
            float.class.getName(),
            float[].class.getName(),
            Float.class.getName(),
            Float[].class.getName(),
            double.class.getName(),
            double[].class.getName(),
            Double.class.getName(),
            Double[].class.getName(),
            BigInteger.class.getName(),
            BigInteger[].class.getName(),
            BigDecimal.class.getName(),
            BigDecimal[].class.getName(),
            Date.class.getName(),
            Date[].class.getName(),
            Calendar.class.getName(),
            Calendar[].class.getName(),
            GregorianCalendar.class.getName(),
            Collection.class.getName(),
            "java.util.Collections$UnmodifiableCollection",
            Iterator.class.getName(),
            List.class.getName(),
            ArrayList.class.getName(),
            LinkedList.class.getName(),
            "java.util.Collections$EmptyList",
            "java.util.Collections$UnmodifiableRandomAccessList",
            "java.util.Collections$UnmodifiableList",
            Set.class.getName(),
            HashSet.class.getName(),
            LinkedHashSet.class.getName(),
            "java.util.Collections$EmptySet",
            "java.util.Collections$UnmodifiableSet",
            Map.class.getName(),
            HashMap.class.getName(),
            LinkedHashMap.class.getName(),
            ConcurrentHashMap.class.getName(),
            "java.util.Collections$EmptyMap",
            "java.util.Collections$UnmodifiableMap",
            Locale.class.getName(),
            TimeZone.class.getName(),
            ResourceBundle.class.getName(),
            ListResourceBundle.class.getName(),
            PropertyResourceBundle.class.getName(),
            "org.hippoecm.hst.resourcebundle.SimpleListResourceBundle",
            "org.hippoecm.hst.resourcebundle.CompositeResourceBundle",
            "org.springframework.context.support.MessageSourceResourceBundle",
            Pattern.class.getName(),
            Matcher.class.getName(),
            Cookie.class.getName(),
            GenericBuiltinModel.class.getName(),
            GenericLoggerModel.class.getName(),
            GenericContentBeanModel.class.getName(),
            GenericServletRequestModel.class.getName(),
            GenericRequestContextModel.class.getName(),
            GenericTimeModel.class.getName(),
            GenericCollectorContextModel.class.getName(),
            GenericGeoLocationModel.class.getName(),
            DefaultGenericBuiltinModel.class.getName(),
            DefaultGenericLoggerModel.class.getName(),
            DefaultGenericContentBeanModel.class.getName(),
            DefaultGenericServletRequestModel.class.getName(),
            DefaultGenericRequestContextModel.class.getName(),
            DefaultGenericTimeModel.class.getName(),
            DefaultGenericCollectorContextModel.class.getName(),
            DefaultGenericGeoLocationModel.class.getName(),
            StringUtils.class.getName(),
            ArraysUtils.class.getName(),
            ArrayUtils.class.getName(),
            LocaleUtils.class.getName(),
            DateUtils.class.getName(),
            DateFormatUtils.class.getName(),
            DurationFormatUtils.class.getName(),
            NumberUtils.class.getName(),
            RandomUtils.class.getName(),
            CollectionUtils.class.getName(),
            EnumerationUtils.class.getName(),
            IteratorUtils.class.getName(),
            ListUtils.class.getName(),
            MapUtils.class.getName(),
            SetUtils.class.getName(),
            CounterUtils.class.getName(),
            ResourceBundleUtils.class.getName(),
            RegexUtils.class.getName(),
            GenericGeoLocationUtils.class.getName(),
            JsonUtils.class.getName(),
            JSON.class.getName(),
            JSONObject.class.getName(),
            JSONArray.class.getName(),
            JSONNull.class.getName(),
            YamlUtils.class.getName(),
            org.yaml.snakeyaml.nodes.Node.class.getName(),
            org.yaml.snakeyaml.nodes.NodeId.class.getName(),
            org.yaml.snakeyaml.nodes.AnchorNode.class.getName(),
            org.yaml.snakeyaml.nodes.CollectionNode.class.getName(),
            org.yaml.snakeyaml.nodes.MappingNode.class.getName(),
            org.yaml.snakeyaml.nodes.SequenceNode.class.getName(),
            org.yaml.snakeyaml.nodes.ScalarNode.class.getName(),
            org.yaml.snakeyaml.nodes.NodeTuple.class.getName()
            };
// SNIP

JEXL sandbox simply checks the invoking target’s concrete class name for security, before invoking the public interface’s methods such as java.util.List#contains(o). You can actually use col.contains(o) in EIRE scripts on objects from Collections#emptyList()because Jexl3InferenceEngine has “java.util.Collections$EmptyList” in the whitelisted types already.
So, if we add “java.util.Arrays$ArrayList”, then your problem will be resolved. I missed that type.

By the way, it was a bit inconvenient to me even if I understand the necessary, balanced needs between security and productivity. I asked for an improvement in JEXL ( [JEXL-253] Permissions by super type in JexlSandbox - ASF JIRA ), but I don’t think it’s an easy choice anyway.

Therefore, in principle, we should add all the practically-used types there, or allow to add any types in project level as explained in the document at Extend Expressional Inference Rule Engine - Bloomreach Experience Manager (PaaS/Self-Hosted) - The Fast and Flexible Headless CMS like the following:

  <!--
    For security reason, you must explicitly add any custom class types here to use in expressions.
  -->
  <bean id="inferenceEngineWhiteSet"
        class="org.springframework.beans.factory.config.ListFactoryBean">
    <property name="sourceList">
      <list>
        <!-- add custom types here... to add to whitelist -->
      </list>
    </property>
  </bean>

java.util.Arrays$ArrayList is qualified for a built-in whitelisted type.

Please stay tuned at Log in - Issues.

Dennis_Nijssen:

#3:
This is more a question than an issue, we noticed the Inference Rules documents always creates a characteristic UI with checkboxes. However what is the use of these checkboxes when the “data” stored in the targeting data is only a singular value (e.g. “blogalert” in our situation):
"newsletters": {
	"collectorId": "newsletters",
	"data": "blogalert",
	"extraData": null
}

EIRE simplifies relevance-engine based personalization use cases in DXP as follows:

Business users define different variants for a component in Channel Manager, and they associate a variant with one or more classification variable(s). And, they select which value(s) for each variable should be used to filter a visitor and determine if the visitor can be in the specific variant. That’s the inference rule defining process.
A visitor can be categorized as single value in the perspective of this single variable–or single collector, but variant categorization–the inference rule defining process–can include multiple possible values. For example, I’m a Korean, but my favorite site needs to know whether the visitor comes from Africa, Asia, Europe or America for their business reason. One day, business users defined 4 variants, each of which is mapped to one of the 4 continents. But later, they realized that they needed only two variants for their business reason: “Europe” vs. “Non-Europe”. They now need to create one variant with checking “Europe” item, and the other with checking the rest 3 items: “Africa”, “Asia”, “America”.
Therefore, all the possible options are displayed automatically in the EIRE’s CharacteristicPlugin as checkboxes so that business users can create different set variants by selecting different option values. But still a visitor should be classified by a single value. EIRE/Relevance can determine which variant the visitor belongs to, totally based on business user’s own inference rule definitions.

I’d suggest you start thinking from the business user’s perspective. That is, which variable, with which possible values fo the variable, do you want to use in your rule definition process? Which one could be most intuitive?

In a sense, “General Newsletter” or “Blog Alert” might be just a secondary one, not the primary interest for the business user.
So, perhaps you need to reframe it for business users.

Just as an example,

I can perhaps define the primary variable named revisitFrom, which could have “General Newsletter”, “Blog Alert”, “Weekly shopping suggestion”, “unknown”.
So, the primary goal value should be determined by an EIRE script to return one of those values.
Business users will see 4 checkboxes when defining the variant rules.
Perhaps they can create one rule–variant–by name “General Newsletter and Blog” with selecting two checkboxes. For those visitors, the business users want to show specific content, which is their intention. Another variant to display the default content.

Yes. null has a special meaning–nothing to store as targeting data.

Regards,

Woonsan

woonsanko · January 14, 2019, 5:35am

Hi @Dennis_Nijssen,

Please upgrade the module dependency to 2.0.7, which has become available. Your problem will disappear.
java.util.Arrays$ArrayList was my oversight when I tried to add every JRE’s practical concrete collection class name for convenience (even if the list is appendable with custom class names).

I will try to improve the documentations later on as you suggest.

Regards,

Woonsan

Dennis_Nijssen · January 16, 2019, 7:56pm

Hello @woonsanko,

Thank you for the patch, we upgraded our dependencies yesterday and it works fine!

Kind regards,
Dennis

Topic		Replies	Views
Possibilities to extend relevance data Experience Manager (PaaS/OnPrem)	3	540	August 24, 2018
Relevance: Negation of characteristic Experience Manager (PaaS/OnPrem)	2	440	May 10, 2019
Relevance characteristics by sitemapitem (or even documentType) Experience Manager (PaaS/OnPrem)	2	452	April 8, 2019
BUG: NPE in CookieCollector Experience Manager (PaaS/OnPrem)	6	764	September 25, 2019
Configuring Trends Experience Manager (PaaS/OnPrem)	0	351	August 28, 2020

Relevance: Custom collector & characteristic, etc or not?

Related topics