Fulltext Search only finds complete words

Hi there,

I am currently implementing search functionality.
When I submit a query parameter to a EssentialsListComponent, only documents are returned, when they contain the complete parameter.

Example:
I have a document with the title “Providerdatabase”. This document is not returned, when I search for “Provider” only when I search for “Providerdatabase” !

Is that the expected behaviour ?? If so, can I change that?

Thomas

Hi Thomas,

I assume you are referring to the text search for the CMS / HST (brXM)
related stack right? With the built in text search of the repository,
pretty much the default Lucene analyzer is used which does not have
any logic for copulas which in general are not common English to use
in real text, contrary to say something like Dutch in which it is
really common. You can plug in your own Lucene Analyzer but I haven’t
heart of one that does support copulas in English

brSM provides smarter search with actual semantics understanding.

HTH

Regards Ard

Hi Thomas,

This is how the EssentialsListComponent works, although you could implement your own version of the EssentialsListComponent by extending or rebuilding it from scratch. If you jump into this class you’ll find this method:

protected Filter createQueryFilter(HstRequest request, HstQuery query) throws FilterException {
        Filter queryFilter = null;
        String queryParam = this.getSearchQuery(request);
        if (!Strings.isNullOrEmpty(queryParam)) {
            log.debug("using search query {}", queryParam);
            queryFilter = query.createFilter();
            queryFilter.addContains(".", queryParam);
        }

        return queryFilter;
    }

If you would replace
queryFilter.addContains(".", queryParam);
with
queryFilter.addContains(".", queryParam+"*");
Then your query for Provider will also return Providerdatabase

Hope this help you out!

PS: I could not find any documentation about this feature on the bloomreach website anymore, maybe it’s gone. But it’s important to know that you should only use the ‘*’ on the end of the query, and not on the beginning!

Cheers,
Jesper

Hi Jesper,

I saw the filter- but I had a wrong understanding of contains :pensive:
I also tried to add “*” in the search form, but that did not work (I assume, this is filtered - is it ?)

Thanks for your quick response!

Hello Thomas,

probably your problem is in org.onehippo.cms7.essentials.components.CommonComponent#cleanupSearchQuery
where you can see the flag allowSingleNonLeadingWildCardPerTerm always passed with value false.

As it is a public method you could override in a custom class of yours and change the flag to true in this case your wildcard will not be cleaned up.
Please do be cautious and careful with such changes as they could have performance implications.

Kind regards,
Lef

So, you would suggest not to allow Wildcard-Search ?

Thomas
October 5

Lef_Karamoulas:

Please do be cautious and careful with such changes as they could have performance implications.

So, you would suggest not to allow Wildcard-Search ?

In general I have never been a fan of the Lucene wildcard searching. First of all, for prefix wildcard searching it typically tends to blow up the jvm for any decent sized index, and postfix prefixing doesn’t play along with stemming resulting in quirky behavior. Type ahead, synonyms and suggestions are typically what an end user really helps, wildcards as far as I have seen never really result in a great user experience. Then again, I am just 1 user.

This is also one of the reasons why we have added support for ‘content feed’ which makes the content suitable for brSM (our search part) to be consumed and intelligent search can be provided (instead of the fairly basic search with the repository)

Hi Ard,

thanks you for your opinion.
I will take your minds in consideration!

Thomas