Sean
sad I’m disappointed

Drill-through disabled on duplicate stats

Is drill-through on duplicate statistics disabled in community edition? When I right-click on the Duplicate row in the Basic tab the drill-through menu option is greyed out.
1 person has
this question
+1
Reply

  • Hi Sean,

    In the Counts table (just below the chart) on Basic tab, Drill-through is only enabled for Null and Non-null values. The reason for this is more or less technical - in order to enable it, we would have to extend the database table with a number of columns (thus reducing performance), while the same information can be obtained from the Frequency tab (where drill-through works everywhere).

    A big advantage of using the Frequency tab is that it not only shows how many duplicates you have in the data, but it gives you the extra insight by showing the actual duplicated values. You can then use drill-through to see the records corresponding to a particular value, rather than all duplicates shown together.

    Another tool for detecting duplicates is Primary key analysis. It is more versatile than the simple "Duplicate" count, as you may use several columns within a single key and look for duplicates where all the values are the same. Primary key analysis supports drill-through on both Unique and Non-unique groups.

    In the light of the above two analyses, drill-through on "Duplicate" would be a seldom-used feature which involves a loss in performance. But you are absolutely right that it is natural to expect it here - in a future release, we will try to redesign the context menu on Basic tab to avoid this confusion and lead the user to Frequency tab.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • Sean
    happy I’m enlightened
    Thanks, Pavel, but the Frequency tab only shows a limited number of results. Is there a way to show ALL values with counts > 1?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • Hi Sean,
    yes there is a way, actually 2 different ways.

    1) By default only 100 most common and least common values are shown in the Frequency tab. You can make this value higher in Profiler preferences, which you can find in menu under Window / Preferences, in the tree find the node Ataccama DQ Analyzer / Profiler. There change the value Read limit (upper/bottom lines).


    2) There comes second limitation. By default only 1000 distinct values are written to profile file as in case of many distinct values the file may grow very huge. If you are sure you need more values to be written to profile file you need to:
    a) Create a plan instead of profiling data directly in Create profile dialog,
    b) Open Profiling step by double-clicking on it,
    c) Switch the Step dialog to Normal Layout by clicking a button in the right upper corner.
    d) There you shoud see Output limit setting, where you can override the number of distinct values written to the profile file.


    Feel free to ask us if you need any additional help.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned