Skip to content

Added shallow search for data.table in tables()#7580

Open
manmita wants to merge 15 commits intomasterfrom
feat/adding_list_search_to_tables
Open

Added shallow search for data.table in tables()#7580
manmita wants to merge 15 commits intomasterfrom
feat/adding_list_search_to_tables

Conversation

@manmita
Copy link
Contributor

@manmita manmita commented Jan 9, 2026

Closes #2606

added arg depth = 1L to tables() one for shallow search
if depth is 0 then its the data.table
if depth is 1, we loop through list-like objects using is.list and which are not data.table or data.frame
if depth > 1, we throw error

added name for the nested list found parent[[1]] or parent$child
pre-allocating info to avoid reallocation cost

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Hello,

I created a new PR in replacement of #7568

Reasons: There was some git issue there and the merge became too complex and I changed the algo because I didnt know previously that rbind or cbind would cost for re-allocation

The current PR considers that part and avoids appends

Previous PR : creating seperate data.table called info and rbind at the end
This PR: pre-allocates for a total-sized data.table and fills the info

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

In reply to previous comment of @jangorecki

An example of when this new feature could be useful?

To support lists which occur due to split.data.table or fread like the following

list(data.table(a = 1, b = 4:6)),
      data.table(a = 2, b = 7:10))

The original code supported data.table() top level and this code adds support for list(data.table) if the arg shallow_search = TRUE

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Example of the original code and the new feature is as follows

> A = list(data.table(a = 1, b = 4:6),
      data.table(a = 2, b = 7:10))
> B = list(data.table(a = 1, b = 4:6), 1:5)
> C = data.table(a = 1, b = 4:6)
> tables()
   NAME NROW NCOL MB COLS    KEY
1:    C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> D = list(d = data.table(a = 1, b = 4:6), x = 1:5)
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
5:    D$d    3    2  0  a,b [NULL]
Total: 0MB using type_size

tables() work same as before and tables(shallow_search = TRUE) searches 1 level

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.03%. Comparing base (1bd88cb) to head (866820a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7580   +/-   ##
=======================================
  Coverage   99.02%   99.03%           
=======================================
  Files          87       87           
  Lines       16896    16937   +41     
=======================================
+ Hits        16732    16773   +41     
  Misses        164      164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

  • HEAD=feat/adding_list_search_to_tables stopped early for isoweek improved in #7144
    Comparison Plot

Generated via commit 866820a

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 59 seconds
Installing different package versions 22 seconds
Running and plotting the test cases 4 minutes and 8 seconds

xenv2$N = list(a = 1:5)
setkey(xenv2$M$b, a)
setindex(xenv2$M$b, b)
test(2360.1, tables(env = xenv2, shallow_search = TRUE)$NAME, c("DT", "L[[1]]", "L[[2]]", "M$b"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer saving the output to re-running tables(env = xenv2, shallow_search = TRUE) many times.

alternatively, just do one test like

test(2360.1, tables(env = xenv2, shallow_search = TRUE)[, .(NAME, NROW, NCOL)], data.table(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making it one test

xenv2$M = list(b = data.table(a = 1, b = 4:6), a = 1:5)
xenv2$N = list(a = 1:5)
setkey(xenv2$M$b, a)
setindex(xenv2$M$b, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move setindex() closer to the index=TRUE tests

R/tables.R Outdated
}
}
else {
# the original code path when shallow_search=FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't make sense outside the context of this PR, "the original code path" will be a relic within a few months

R/tables.R Outdated
tables = function(mb=type_size, order.col="NAME", width=80L,
env=parent.frame(), silent=FALSE, index=FALSE)
env=parent.frame(), silent=FALSE, index=FALSE,
shallow_search=FALSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking the best way to go about this is actually something like either depth=0 or recursive=FALSE. Then both cases share the same logic, except that the default cuts out after the shallow search.

If the code to do a recursive walk is proving intimidating, we can do depth=0, and this PR can support depth=1 and error for depth>1 "not yet supported" and leave it for future work.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can do a depth = 0, 1 and error at depth >1 for this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tables could look for en-list-ed data.tables as well

2 participants