Can ataccama do this?

Hi there,

Please, can we use Ataccama as the FIRST LINE of defense to AVOID the creation of duplicate people records?

Let’s imagine an API like this:

  • [Full Name] *Required
  • Threshold accuracy (Integer Min: 80 Max:100)
  • At least one is required:
    • Email(rfc5322 - Regex)
    • Phone (E.164)
    • Document (Country: ISO 3166-1 alpha-2 + Type + Document Number)
    • 200 OK - returns list of Person-IDs equal or greater than the threshold provided Ex:
      • {

{“person-id”:”9b151ccc-b447-47ad-9efe-3f3f9d6a36e8”, “rank”: 1}

{“person-id”:”b4a20ad0-8e5d-4664-a4a2-1e5fc59e76a7”, “rank”: 2}

  • 400 BAD REQUEST - if any of the provided fields do not match the required format
  • 404 NOT FOUND - if no person was FOUND equal or greater than the threshold provided

Some of the basic aspects of this SEARCH/DEDUPE mechanism:

  • Real-time deduplication
  • High level of accuracy (Phonetic Search (Metaphone 3 or the like), Fuzzy Search, Ranking the results, Normalization of names)
  • Be able to sustain a high number of API calls per second with good performance and reliability
  • Normalize Full Name:
    • Name + Middle + Surname
    • Remove Double, Triple whitespaces, etc
    • Remove punctuations
    • Remove non-letters
    • Normalize Suffixes (Jr, II, III, IV, etc)
    • Normalize Honorifics (Sr, Mr, Ms, Dr, etc)
    • Normalize abbreviations (Robert → Rob, Williams → Will, etc)

If such capability exists, where can I find the documentation? Please.


Hello Jean,
You are asking about a very specific use case, but I do not understand the context of it. You have picked Ataccama ONE as a product, but are you using this product? Are you a customer of ours?
You are also asking a number of questions related to different operations like standartization of records and deduplication and also about the performance of API calls. Some of these are unrelated to one another and cannot be answered without knowing the details of your solution. For example “a high number of API calls per second” is not precise as we do not know what the number of API calls would be, but also because performance tuning would vary on a variety of factors.
Individually, all these operations are possible. But taken as a whole, I cannot give an exact answer without knowledge of your project.


Thanks for the quick answer :smiling_face_with_three_hearts:. We are implementing Ataccama here as our core MDM, but I am not part of the core team working with it. So before engaging the (busy) team working with Ataccama here, I would like to understand the feasibility of creating what I need in Ataccama. That’s why I thought this public forum was an excellent place to start this conversation and hear from other users.

I can elaborate more about what I would like to use Ataccama for (in this context), and maybe you guys can clarify if Ataccama would be a good fit for this project as well or if I should build it using a different tool or even create it from “scratch”.

So the problem that I am trying to solve there, is to allow every Lead Source application (internal or external) to check the existence of a person (lead or customer) while the user is providing the data, instead of trying to find the golden record later (which still might be necessary). Having a SYNC API that is fast (< 2 seconds) and accurate enough (Fuzzy, Phonetic, Typos, etc) would be an easier to adopt approach (for this problem). This API would return the master-id (UUID?) for this person as stored in Ataccama.

We could require the minimum amount of data, let’s say (First and Last Name, email and phone) and try to match this with the existing people in the MDM, so if the user is found (% of accuracy) then we could ask the person to log in.

Please let me know how I can give extra information so that you guys here (in the public forum) might be able to provide me with some tips.

I appreciated it.


Hey Jean,
I understand your use case now, thanks for the details.
In MDM we have a service called Identify, it does almost exactly what you need. The service answers the question “Is this record similar to any record stored in the hub?”. The service cleanses input data and tries to match the record to records in the hub using matching rules.
The Identify API is in fact a read-only attempt of inserting a record, so the record on input is cleansed. standardized and matched.
The service works not only with basic attributer, but you can submit also related entities such as contact information or address.
You can read more about the service in our documentation, just search for “Identify Services”.