Filter duplicates (based on property) in HstQuery

Hi there,

We have the following use case. We have two scopes:

  1. containing documents of type x in the default language (e.g. “en_US”).
  2. for the locale (e.g. “nl_NL”) we have a folder containing translations of those documents from the default language as well as documents which only exists in this locale. Note: not all documents from the default language have translations in the other locale.

I want to do a query which gives me alle unique documents based on hippotranslation:id with a preference for the translated documents (“nl_NL”), so:

  1. all documents from the default language (“en_US”) which don’t have a translation in the other locale (“nl_NL”)
  2. all documents which have a translation (“nl_NL”)
  3. all documents in the other locale (“nl_NL”) which don’t exist in default language

I currently do an HstQuery on both scopes and retrieve all documents of a certain type. Afterwards I perform a custom method to eliminate duplicate documents. A more elegant solution would be to do so in ‘one go’. So i’m looking for a way to add a filter/constraint (e.g. with a custom jcrExpressiony/xpath query) to eliminate the duplicates in one go.

E.g.

final HstQuery hstQuery = HstQueryBuilder.create(baseScope, translationScope)
    .ofPrimaryTypes(doctype)
    .limit(queryLimit)
    .where(
        // pseudocode
            if (current hippotranslation:locale equals hippotranslation:locale of defaultScope){
                if (hippotanslatiohippotranslation:id does not exist in translationScope){
                valid;
                }
                else {
                    not  valid;
                }
            }
            else {
                valid;
            }
          )
    .build();

Or visualized…

Is this possible? And if so can you point me in the right direction?

Hope you can advise.

Thank you!

Yves

Hello Yves,

when im reading the conditions it is a bit confusing with regards what exactly you are trying to retrieve.

if i rely on your visualisation this is a query i would try out.

SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“en_US”

__ AND hippotranslation:id NOT IN (SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“nl_NL”)__

UNION

SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“nl_NL”

i havent tested it in the repository and maybe the syntax is not right but you get the point.

Hope this helps.

Kind regards.

Thank you. It’s seems JCR SQL 1 doens’t allow for ‘NOT IN’ operations. Would it be possible using xpath you think?

Hello Yves,

I’ve tried possible approaches and combinations (both xpath and sql) but couldn’t come up with something that works. If someone else with extent knowledge could clarify if this is even possible.

Otherwise I guess your best choice here is programmatically as you described.

Kind regards

JCR sql syntax is very limited. SQL wasn’t designed for a tree structure but for relational data.
https://docs.adobe.com/docs/en/spec/jcr/1.0/8.5.3_SQL_EBNF.html

You can’t UNION or JOIN.

XPATH is mostly complete (afaik) but inherently more limited. It was built for a tree structure, but the limitations will be the same for the most part as SQL.

You can’t do comparison between node properties.

the following would cover case 2 and 3
//element(*, myhippoproject:newsdocument)[@hippotranslation:locale=‘nl_NL’ ]

Case 1 isn’t possible to query directly. You can query all english documents and get the translations from the bean via bean.getAvailableTranslations()getTranslation(“nl_NL”). Which would cover case 1 and 2. Though each call to getAvailableTranslations is actually a query in the background. So also not perfect.

Ok thank you all for thinking along. I will solve / have solved it programmatically.

Before asking the question I solved it by quering both scopes and remove the ‘duplicates’ afterwards. For this I used the getAvailableTranslations on every node so I was executing a lot of queries.

My new/current method of solving this is first use a query to get all hippotranslation:id’s from the translationScope. And use these id’s as constraint in the final query on both scopes. Not perfect, but now I only need two queries.

Perhaps can you live even with single query?

/jcr:root/content/documents//element(*, hippotranslation:translated)[@hippo:availability='live'] order by @hippotranslation:id, @hippotranslation:locale

Regards,

Woonsan

Thank you Woosan. But even if I query the entire repository I still need some way to programmatically filter the ‘duplicate’ translationid’s. For now i’m happy with the solution I’ve got.