Skip to main content
Solved

RDM REST API - using count and offset to get all records


may_kwok
Star Blazer L3
Forum|alt.badge.img+4

Let’s say I have a table that has 500,000 rows.

I need to query this table using REST API (because I want to get both published records and records still in edited/waiting for publish states).

I have some params to filter the data with, but even with params, the result set could be say, 10000.

I tried using the REST API using JSON call step in a plan, setting “count” to 10000, and the API says maximum is 1000.

I see that the REST API provides the offset param as well. Is there an example of how I can use that offset param to paginate and query all the records that I will need, but in one plan/component?

This component will be a part of a complex set of components as a larger operation, and I’d prefer to not have to create an extra .ewf and iterator for it, otherwise this whole thing gets too complicate for anyone else to understand…

 

Any ideas?

Best answer by AKislyakov

Hi ​@may_kwok,

I don’t have an RDM example readily available, but here is the approach I would use. RDM REST API gives you the total number of rows that you request will extract. That allows you to calculate number of pages you need to read through the response. So overall approach would be:
1. JSON Call #1 -- get total number of rows
2. Alter Format + Splitter + Column Assigner -- calculate required number of pages and generate rows for that, set offsets
3. JSON Call #2 -- get the data
Something similar to that:
 

 

View original
Did this topic help you find an answer to your question?

Forum|alt.badge.img+2

Hi ​@may_kwok,

I don’t have an RDM example readily available, but here is the approach I would use. RDM REST API gives you the total number of rows that you request will extract. That allows you to calculate number of pages you need to read through the response. So overall approach would be:
1. JSON Call #1 -- get total number of rows
2. Alter Format + Splitter + Column Assigner -- calculate required number of pages and generate rows for that, set offsets
3. JSON Call #2 -- get the data
Something similar to that:
 

 


Forum|alt.badge.img+2

Also, if total number or rows is reasonably low, you can just increase the ataccama.one.rdm.api.response.max-count parameter (RDM Application Properties :: Ataccama ONE). Although, that might lead to increased memory consumption


may_kwok
Star Blazer L3
Forum|alt.badge.img+4
  • Star Blazer L3
  • May 28, 2025

Thanks ​@AKislyakov !

This is what I ended up with:

 

In the first Json call, I pass in my query parameters, and queried my table:

I create a data stream of “root” to get my data count:

 

For each input row, I calculate how many calls are actually required based on the data count returned:

The seq_list text file is my very quick and dirty way to get a sequence in 1 line. I know my source data and it’s highly unlikely to get to 30000 rows per combo:

Join this list together with my required calls:

Split the list comma separated:

Calculate the offset per “call”:

And then make essentially the same json call again, but with the offset param:

 

Not hundred percent happy with the dirty file generating the list (e.g. for 3, make list 1,2,3 ) but it worked. So it’ll do for now.

 

What magic do you do in the Alter Format to come up with the list so you could split it?

 

Thanks,

May


Forum|alt.badge.img+2

Sorry, should’ve shared that earlier

We had this code

toString(page_size) + 
iif(total_count < page_size, '',
	replicate(' ' + toString(page_size), (total_count div page_size) - 1) + iif(total_count % page_size > 0, ' ' + toString(total_count % page_size), ''))

For inputs e.g. total_count = 34 and page_size = 10 it will generate value of 10 10 10 4
which we than split by space, and re-number by descriptor builder and finally use it to calculate page_num and offset
 

 


may_kwok
Star Blazer L3
Forum|alt.badge.img+4
  • Star Blazer L3
  • May 29, 2025

Thanks ​@AKislyakov  this worked beautifully. I got rid of the text file. This replicate expression is gold!

 

For those interested, here’s my final plan looking something like this:

 

  1. Set page size: just hard code an integer column to 1000
  2. Get counts per page: the big expression that ​@AKislyakov shared above, that uses the replicate expression. I modified slightly to make it comma separated.
  3. Then I use Regex splitter recognising comma as a pattern to split
  4. Record descriptor builder to make sure my multiple input query parameters are considered as a group
  5. Get page num is this: tointeger(right(rd, length(rd) - lastIndexOf(rd, ':') -1)) -1
  6. Get offset is this: tostring(page_size * page_num)

 


Cansu
Community Manager
Forum|alt.badge.img+3
  • Community Manager
  • June 5, 2025

Thank you for sharing the plan and steps ​@may_kwok 🙌


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings