Tracing InvalidItemStateException

We have a workflow that fills in some data the authors can’t or don’t want to fill in. It is apparently possible to create a document with garbage data (ie illegal characters in document path such as chinese, smart-quotes, or periods) in document A that causes the workflow in document B to throw InvalidItemStateException caused by ItemNotFoundException caused by NoSuchItemStateException with a UUID that … can’t be found (so I don’t know which document “A” is).

  1. Is there a way to find documents with garbage data (paths, etc)?
  2. Can we query for a UUID that “can’t be found”? What about queries for UUIDs that start with a pattern?

I find it very frustrating that garbage data in 1 document can prevent saving different documents. Are there any suggestions for how we can recover?

I can create new documents from both the content perspective and the document wizard that have invalid characters such as chinese and smart-quotes (thank you Microsoft).
But I was unable to reproduce the workflow save problems with InvalidItemState/ItemNotFound/NoSuchItemState exceptions locally. Trying another server with mysql instead of localhost.

This sounds strange to me. Node names are sanitized to prevent using illegal characters. Chinese characters shouldn’t even be sanitized. If I create a document with smart quotes in the console this is accepted. Illegal node names would not even be possible to save.

The state of document A shouldn’t even affect document B unless your workflow connects the two somehow. Still, the errors you mention shouldn’t occur unless there is something wrong in the workflow code.

To answer you question from the first comment

  1. Is there a way to find documents with garbage data (paths, etc)?

That depends on what you mean by garbage data, but in any case it can’t be data that is illegal in the jcr because that should simply not be possible. Other than that it is a case of querying.

  1. Can we query for a UUID that “can’t be found”? What about queries for UUIDs that start with a pattern?

A UUID that can’t be found can’t be queried for. That’s sort of part of the problem. You can query on a pattern:

//element(*, myhippoproject:newsdocument)[jcr:like(@jcr:uuid, ‘64ab4648%’)]

This finds all “myhippoproject:newsdocument” nodes with a uuid that starts with 64ab4648. But even then I would not expect spectacular results. UUIDS aren’t meant to look like each other.

Is it possible to trace articles via /hippo:log/default ?
Possibly querying for histories of a document?

If deleting via /cms/console, I recognize there won’t be hippo:log items, but might be able to find some history pertaining to a given UUID (that may no longer exist). Is there any documentation about the hippo:log path?

Regarding “garbage data” - the document wizard + create new document both allow some strange characters to come through: (examples of document paths created due to the document name)

  • buy-side-bonuses-may-be-a-‘disappointment’-for-many
  • “i-worked-at-a-chinese-bank-in-hk.-but-now-i’m-desperate-to-join-a-western-firm”
  • mix of chinese characters plus commas, spaces (might have been unicode comma/space)