Edit Metadata process does not complete for Excel spreadsheet of 200,000 records.
The dialog box "Analyzing Excel Workbook" displays for a few minutes, then closes. The Metadata process completes successfully with the same spreadsheet when it contains a much smaller number of rows. Using DQ Analayzer 7. Using the default db. I am a newbie. Like the product so far but need to work with tables of 500K records. Thanks for any help you can provide.
Help get this topic noticed by sharing it on
Twitter,
Facebook, or email.
Twitter,
Facebook, or email.
Edit Metadata process does not complete for Excel spreadsheet of 200,000 records.
-
Hello Steve,
thank you for sharing your problem.
I have probably found the cause of it and we may try and do something about it in development.
The cause isn't exactly the amount of records, so far I have several Excel files with 200k or 500k records and I can edit their metadata without a problem. But problematic is the representation of string values in Excel. You may know, the xlsx file is a zipped archive of several xml files and one of them named xl\sharedStrings.xml contains all different string values in the Excel table and its values are being referenced from another xml file. Parsing this file causes big memory leak and it's the cause of fail to open the Edit metadata dialog. It means there is no problem with reading or opening the Excel with many records as far they do not contain a lot of distinct string values, what is most probably your case.
As I said we can try and do something about it, we still need to parse the xml file to obtain header, but we may not have to read the file all at once.
Till we manage to do something, I can offer you 2 possible workarounds.
1) Let DQ Analyzer use more memory
What have I tried when I generated my own problematic xlsx file is to let DQ Analyzer use more heap memory. There is a dqa.ini file in the DQ Analyzer installation directory, where is a row containing -Xmx256m, which means there is 256MB of heap memory used for running the DQ Analyzer GUI (not for the running of jobs). Once I replaced 256 with 512, I was able to open Edit metadata dialog even it took a long time.
This workaround might and might not help you, it depends on how big your file really is. You may even try to use even a bigger number then 512 like 768, but it may end to be not working at all, it depends on configuration of your computer.
You may also need to give more memory for Launching the profile jobs, you can do that in Window - Preferences dialog, in the subnode "Launching" under the DQ Analyzer node.
2) Use the Edit Metadata dialog on a smaller file with the same columns
You do not exactly need to use Edit metadata on a file you are going to profile. Metadata is a description of Column names and their types and you may find the description is saved as a file with extension .metaData right next to the excel or txt file. So what you may do is to create the metadata in Edit metadata dialog on a subset of data and replace the excel file afterwords or rename the .metaData file to the filename of Excel file with .metaData extension. Beware that .metaData files are hidden in DQ Analyzer by default, but you can find them and all your files which are shown under DQ Projects node in workspace subdirectory in installation directory.
This workaround is probably harder to do, but it will definitely help you, if you manage to do that. You may still need to set more memory for Launching the profile jobs in Preferences, as I described sooner.
Nice to hear that you like the product and we are sorry that it has problems with your Excel files, reading and working with Excel files are still causing us some problems as our products are used to read text files or data directly from databases, where are no problems with such a representation with all records written in xml format. We'll try to do something about it and get you informed once we release new DQ Analyzer version where you may profile your excel files without any problems. -
-
Thanks for the very fast and very detailed response. Bad news: I tried suggestion 2 and expanded the memory to 1024 Mb (I have 2Gb on the machine). Also increased the threads to 10 (from 1). Creating the profile process seemed to stall / hang. Memory used on the machine never went past 1.1 Gb.
Good news: I exported the spreadsheet to .csv. I successfully edited the Meta data and created the profile!
Thanks again for your help!!
Steve -
Loading Profile...




EMPLOYEE
