Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-08-03 Thread via GitHub


egolearner commented on PR #47199:
URL: https://github.com/apache/arrow/pull/47199#issuecomment-3149258969

   It seems `wheel-windows-cp313-cp313t-amd64` failing is unrelated to this PR.
   
   > RuntimeError: CFFI does not support the free-threaded build of CPython 
3.13. Upgrade to free-threaded 3.14 or newer to use CFFI with the free-threaded 
build.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-08-01 Thread via GitHub


github-actions[bot] commented on PR #47199:
URL: https://github.com/apache/arrow/pull/47199#issuecomment-3143999063

   Revision: f0e1edfffe07a6ec3d2d51fe2c10b805f05fd57d
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-9e7c764ef8](https://github.com/ursacomputing/crossbow/branches/all?query=actions-9e7c764ef8)
   
   |Task|Status|
   ||--|
   |wheel-windows-cp310-cp310-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp310-cp310-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672265116/job/47191028992)|
   |wheel-windows-cp311-cp311-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp311-cp311-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672264904/job/47191028428)|
   |wheel-windows-cp312-cp312-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp312-cp312-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672264927/job/47191028524)|
   |wheel-windows-cp313-cp313-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp313-cp313-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672265003/job/47191028628)|
   |wheel-windows-cp313-cp313t-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp313-cp313t-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672265005/job/47191028567)|
   |wheel-windows-cp39-cp39-amd64|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-9e7c764ef8-github-wheel-windows-cp39-cp39-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/16672264995/job/47191028545)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-08-01 Thread via GitHub


rok commented on PR #47199:
URL: https://github.com/apache/arrow/pull/47199#issuecomment-3143989301

   @github-actions crossbow submit wheel-windows-*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-08-01 Thread via GitHub


rok commented on PR #47199:
URL: https://github.com/apache/arrow/pull/47199#issuecomment-3143977018

   This looks pretty good @egolearner. I'll start some more Python tests and 
merge if they pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-07-31 Thread via GitHub


egolearner commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2246622702


##
python/pyarrow/tests/parquet/test_basic.py:
##
@@ -76,20 +76,16 @@ def test_set_data_page_size():
 _check_roundtrip(t, data_page_size=target_page_size)
 
 
[email protected]
 def test_set_write_batch_size():

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-07-30 Thread via GitHub


rok commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2242818334


##
python/pyarrow/tests/parquet/test_basic.py:
##
@@ -76,20 +76,16 @@ def test_set_data_page_size():
 _check_roundtrip(t, data_page_size=target_page_size)
 
 
[email protected]
 def test_set_write_batch_size():

Review Comment:
   This now runs when Pandas is not present, which is great, but fails when 
numpy is not present.
   Can you try adding `@pytest.mark.numpy`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-07-30 Thread via GitHub


rok commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2242783733


##
python/pyarrow/tests/parquet/common.py:
##
@@ -121,6 +121,11 @@ def _test_dataframe(size=1, seed=0):
 return df
 
 
+def _test_table(size=1, seed=0):
+df = _test_dataframe(size, seed)
+return pa.Table.from_pandas(df, preserve_index=False)

Review Comment:
   That looks good, thanks!
   
   > Maybe we can deal this in another issue? It seems numpy is still a must 
for a lot of test cases.
   
   Yeah, let's capture that with another issue and defer to when it's needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-07-30 Thread via GitHub


egolearner commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2242686563


##
python/pyarrow/tests/parquet/common.py:
##
@@ -121,6 +121,11 @@ def _test_dataframe(size=1, seed=0):
 return df
 
 
+def _test_table(size=1, seed=0):
+df = _test_dataframe(size, seed)
+return pa.Table.from_pandas(df, preserve_index=False)

Review Comment:
   Thanks for your review @rok 
   
   I have added `_test_dict` function as data generation logic for both 
`_test_dataframe` and `_test_table`. PTAL
   
   > It might even be good to have fallback logic in _test_table for cases 
numpy is not available. This logic could use stdlib's random or some testing 
utility we have available in arrow c++.
   
   Maybe we can deal this in another issue? It seems `numpy` is still a must 
for a lot of test cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47172: [Python][Test] Add function to create Arrow table instead of pandas df [arrow]

2025-07-29 Thread via GitHub


rok commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2240170410


##
python/pyarrow/tests/parquet/common.py:
##
@@ -121,6 +121,11 @@ def _test_dataframe(size=1, seed=0):
 return df
 
 
+def _test_table(size=1, seed=0):
+df = _test_dataframe(size, seed)
+return pa.Table.from_pandas(df, preserve_index=False)

Review Comment:
   Doesn't `_test_dataframe` use Pandas? Depending on Pandas would go counter 
the intent [stated here](https://github.com/apache/arrow/issues/47172):
   > This issue would move some of tests using _test_dataframe to use a new 
utility function and remove the @pytest.mark.pandas in this cases.
   
   You could move numpy logic from `_test_dataframe` into `_test_table` and 
have test `_test_dataframe` like:
   
   ```python
   # I've not tested this
   
   def _test_table(size=1, seed=0):
   np.random.seed(seed)
   return pa.Table({
   'uint8': _random_integers(size, np.uint8),
   'uint16': _random_integers(size, np.uint16),
   'uint32': _random_integers(size, np.uint32),
   'uint64': _random_integers(size, np.uint64),
   'int8': _random_integers(size, np.int8),
   'int16': _random_integers(size, np.int16),
   'int32': _random_integers(size, np.int32),
   'int64': _random_integers(size, np.int64),
   'float32': np.random.randn(size).astype(np.float32),
   'float64': np.arange(size, dtype=np.float64),
   'bool': np.random.randn(size) > 0,
   'strings': [util.rands(10) for i in range(size)],
   'all_none': [None] * size,
   'all_none_category': [None] * size
   )
   
   def _test_dataframe(size=1, seed=0):
   import pandas as pd
   np.random.seed(seed)
   
   return _test_table(size, seed).to_pandas()
   ```
   
   Possibly out of scope:
   It might even be good to have fallback logic in _test_table for cases numpy 
is not available. This logic could use stdlib's `random` or some testing 
utility we have available in arrow c++.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]