How can I filter out documents that has empty string values using a constraint in the Fluent Search API

I’m attempting to query some documents using the Fluent Search API but I want to exclude documents which return an empty string for one of the String valued properties.

            HstQuery hstQuery = HstQueryBuilder.apply(scopeBean)

When a build a query in this fashion however it seems that the notEqualTo constraint is completely ignored and the beans returned all resolve this same property to be a mix of empty string and non-empty string values requiring me to do the filtering on the beans themselves after I get the results. This defeats the purpose of the using the constraints.

I found the page on how to Add Constraints to a Query using the Fluent Search API but other than that I was unable to find much documentation on these other constraint options unfortunately so I was just left guessing at the behavior and so far it doesn’t seem to be working out in my favor. Any pointers on how to pull this off if possible would be helpful and also any links to extended documentation on how to use the constraints would also be useful.

[deleted nonsense]

I will forward your comments on our documenation to the relevant person. Your feedback is essential to the quality of our documentation. Of course I cannot promise any improvement or any timescale on improvements.

Never mind, you are using a FieldConstraint (from the FiedConstraintBuilder?). I wasn’t reading properly.

I don’t think that you can make a constraint like that. Underneath queries below a certain length will be ignored. Searching for “e” will return too many results in many scenarios and could be a significant performance issue and thus a vector of attack for a dos.

In this case you’d have to do the filtering after retrieving the beans.

Hey @jasper.floor, would you happen to know where I could at least find API documentation for or even the source code for org.hippoecm.hst.content.beans.query.builder.FieldConstraintBuilder? I’m looking at methods like FieldConstraintBuilder.notLike which seem to be completely undocumented from what I can tell and I’m wondering what type of expression I’m supposed to be using with this API just as an example.

With all of the historical stuff out there for this product, the separate repos, and lack of redirects, and outdated documentation for example it’s not obvious where to find API documentation or the source code in many cases. As an added note, I noticed that it looks like the documentation site has been completely overhauled and while the appearance is a nice change in my opinion I noticed it’s a bit hard to see some of the on-prem brXM stuff separate form the SaaS content offering at a glance. I’m hoping this can all be tied together neatly in the end but I’ll settle for just finding the related source code for now so I can figure out how these constrains are supposed to work beyond the incomplete documentation.

Ok, it took an unnecessary amount of hunting around to find this but I’ve at least found the previous patch versions source code on GitHub.

Unfortunately this code doesn’t seem to have any javadoc comments either but at least I can see the implementation. It does appear however that the base interface ConstraintBuilder that FieldConstraintBuilder extends does have some additional documentation that I haven’t been able to find elsewhere. It’s inline HTML and markup so it’s not great to read but at least it’s something.

Here is what is said about like for example:

     * <p>
     *      This function is based on the LIKE predicate found in SQL. This method maps to <code>jcr:like</code> as
     *      <code>jcr:like($property as attribute(), $pattern as xs:string)</code>. Also see JCR spec 1.0 jcr:like
     * </p>
     * <p>
     *     <strong>usage:</strong> For example, the query “Find all documents whose <code>myproject:title</code> property starts with
     *     <code>hip</code>”,  is expressed as: <code>addLike("myproject:title","hip%")</code>.
     *     The <code>%</code> after <code>hip</code> is the wildcard.
     * </p>
     * <p>
     *     This method is particularly helpful in <i>key</i> kind of fields, where the <i>key</i> values contain chars on which
     *     Lucene text indexing tokenizes. For example, "give me all the documents that have a key that start with JIRA key
     *     <code>HSTTW0-23</code>" can be expressed as <code>addLike("myproject:key","HSTTW0-23%")</code>.
     *     This results in documents having key <code>HSTTW0-2345</code>, <code>HSTTW0-2357</code>, etc.
     * </p>
     * <p>
     *     <strong>DO NOT USE '%' AS A PREFIX</strong>. Thus do not use a query like
     *     <code>addLike("myproject:key","%HSTTW0-23%")</code>. Note the prefix '%'. Prefix wildcards blow up in memory and CPU
     *     as they cannot be efficiently done in Lucene.
     * </p>
     * @param value object that must be of type String. If
     *        the parameter {@code start} is {@code null}, this {@link ConstraintBuilder} is ignored (unless another
     *        constraint method is invoked without {@code null} value.
     * @return this {@link Constraint}

I’m planning now to look through this source code to get a better idea of the options that I have for filtering out properties with empty string values.

I still haven’t found any relevant browsable API documentation though after looking through the latest available at the following link.

To get back to my original attempt using the notEqualTo method, it appears that there may be some special treatment around null but there doesn’t seem to be any mention of empty string.

     * Adds a constraint that the value <code>fieldAttributeName</code> is NOT equal to <code>value</code>
     * @param value object that must be of type String, Boolean, Long, Double, {@link Calendar} or {@link Date}. If
     *        the parameter {@code value} is {@code null}, this {@link ConstraintBuilder} is ignored (unless another
     *        constraint method is invoked without {@code null} value.
     * @return this {@link Constraint}

From the chunk block below it does look as if null value may be skipped over since it is not an instance of Calendar or String but I would guess that "" should still go through. I’m curious now if the underlying value is actually null or a non existent property which is just surfacing in the results as an empty string value?

I tried the variation on the constraint shown below which uses exists but it still seemed to result in bean objects which had this property but with an empty string value.

            HstQuery hstQuery = HstQueryBuilder.apply(scopeBean)

Next I’ll see if I can come up with a an xpath expression that I can use with the jcrExpression method to check for the empty string case. Perhaps by looking at length == 0 or something like that.

This is what the code has to say about jrcExpression.

Interesting. Looking more closely at the data through the JCR repository I see that the values for the field that I’ve been targeting are represented like this when filled with an empty string.

[name="abc:text"] = [ , ]

I then referenced back into the document definition and had recalled that I had considered late in the design when using the Type Editor tool to make this field a multi field.

            jcr:primaryType: hipposysedit:field
            hipposysedit:mandatory: false
            hipposysedit:multiple: true
            hipposysedit:ordered: false
            hipposysedit:path: abc:text
            hipposysedit:primary: false
            hipposysedit:type: String

So likely all along the constraints may have worked properly against a single string object but I was likely comparing a string value to essentially an array of values which I’m guessing is why it wasn’t working.

Now looking at the representation above [ , ] I’m wondering how this is actually structured and how I would query it. My best guess is that I’d be best off trying to query this using jrcExpression along with Property.getLength but the exact way of doing this at the moment eludes me.

So suppose I need to rephrase my question. Is it possible to filter using constraints out documents which have a property set to an array?

it works pretty much the same as single value. We are using it for filtering live/draft documents:

//*[@hippo:availability = 'live']

where hippo:availability is a multivalue property

//*[not(@hippo:availability)]  // no property
//*[@hippo:availability = ''] // empty values