Solved

RDM REST API - using count and offset to get all records

25 days ago
May 23, 2025
6 replies
32 views

may_kwok
Star Blazer L3

Let’s say I have a table that has 500,000 rows.

I need to query this table using REST API (because I want to get both published records and records still in edited/waiting for publish states).

I have some params to filter the data with, but even with params, the result set could be say, 10000.

I tried using the REST API using JSON call step in a plan, setting “count” to 10000, and the API says maximum is 1000.

I see that the REST API provides the offset param as well. Is there an example of how I can use that offset param to paginate and query all the records that I will need, but in one plan/component?

This component will be a part of a complex set of components as a larger operation, and I’d prefer to not have to create an extra .ewf and iterator for it, otherwise this whole thing gets too complicate for anyone else to understand…

Any ideas?

Best answer by AKislyakov

Hi @may_kwok,

I don’t have an RDM example readily available, but here is the approach I would use. RDM REST API gives you the total number of rows that you request will extract. That allows you to calculate number of pages you need to read through the response. So overall approach would be:
1. JSON Call #1 -- get total number of rows
2. Alter Format + Splitter + Column Assigner -- calculate required number of pages and generate rows for that, set offsets
3. JSON Call #2 -- get the data
Something similar to that:

View original

Did this topic help you find an answer to your question?

AKislyakov
Ataccamer
22 days ago
May 26, 2025

Hi @may_kwok,

AKislyakov
Ataccamer
22 days ago
May 26, 2025

Also, if total number or rows is reasonably low, you can just increase the ataccama.one.rdm.api.response.max-count parameter (RDM Application Properties :: Ataccama ONE). Although, that might lead to increased memory consumption

may_kwok
Star Blazer L3
19 days ago
May 28, 2025

Thanks @AKislyakov !

This is what I ended up with:

In the first Json call, I pass in my query parameters, and queried my table:

I create a data stream of “root” to get my data count:

For each input row, I calculate how many calls are actually required based on the data count returned:

The seq_list text file is my very quick and dirty way to get a sequence in 1 line. I know my source data and it’s highly unlikely to get to 30000 rows per combo:

Join this list together with my required calls:

Split the list comma separated:

Calculate the offset per “call”:

And then make essentially the same json call again, but with the offset param:

Not hundred percent happy with the dirty file generating the list (e.g. for 3, make list 1,2,3 ) but it worked. So it’ll do for now.

What magic do you do in the Alter Format to come up with the list so you could split it?

Thanks,

May

AKislyakov
Ataccamer
19 days ago
May 29, 2025

Sorry, should’ve shared that earlier

We had this code

toString(page_size) + 
iif(total_count < page_size, '',
	replicate(' ' + toString(page_size), (total_count div page_size) - 1) + iif(total_count % page_size > 0, ' ' + toString(total_count % page_size), ''))

For inputs e.g. total_count = 34 and page_size = 10 it will generate value of 10 10 10 4
which we than split by space, and re-number by descriptor builder and finally use it to calculate page_num and offset

may_kwok
Star Blazer L3
18 days ago
May 29, 2025

Thanks @AKislyakov this worked beautifully. I got rid of the text file. This replicate expression is gold!

For those interested, here’s my final plan looking something like this:

Set page size: just hard code an integer column to 1000
Get counts per page: the big expression that @AKislyakov shared above, that uses the replicate expression. I modified slightly to make it comma separated.
Then I use Regex splitter recognising comma as a pattern to split
Record descriptor builder to make sure my multiple input query parameters are considered as a group
Get page num is this: tointeger(right(rd, length(rd) - lastIndexOf(rd, ':') -1)) -1
Get offset is this: tostring(page_size * page_num)

Cansu
Community Manager
12 days ago
June 5, 2025

Thank you for sharing the plan and steps @may_kwok 🙌

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

Version 15.4 LTS is here! Updates to Data Observability, ONE AI, MDM & RDM

Sending RDM Workflow Updates to Microsoft Teams & Slack

14.5 Updates: MDM & RDM 🤟

Spice Up Your MDM with Emojis! 🎉

14.5 Updates: ONE 🟣

Most Liked this week

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings