Skip to content

Commit

Permalink
read_commandline supports polars engine (#1356)
Browse files Browse the repository at this point in the history
Enable `read_commandline` into a `polars` dataframe.
  • Loading branch information
samukweku authored Jun 9, 2024
1 parent a672fef commit 2dff4f6
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Changelog

## [Unreleased]
- [ENH] `read_commandline` function now supports polars - Issue #1352

- [ENH] Improved performance for non-equi joins when using numba - @samukweku PR #1341
- [ENH] Added a `clean_names` method for polars - it can be used to clean the column names, or clean column values . Issue #1343 @samukweku
Expand Down
28 changes: 22 additions & 6 deletions janitor/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def read_csvs(
return dfs_dict


def read_commandline(cmd: str, **kwargs: Any) -> pd.DataFrame:
def read_commandline(cmd: str, engine="pandas", **kwargs: Any) -> Mapping:
"""Read a CSV file based on a command-line command.
For example, you may wish to run the following command on `sep-quarter.csv`
Expand All @@ -111,26 +111,42 @@ def read_commandline(cmd: str, **kwargs: Any) -> pd.DataFrame:
```
This function assumes that your command line command will return
an output that is parsable using `pandas.read_csv` and StringIO.
We default to using `pd.read_csv` underneath the hood.
Keyword arguments are passed through to read_csv.
an output that is parsable using the relevant engine and StringIO.
This function defaults to using `pd.read_csv` underneath the hood.
Keyword arguments are passed through as-is.
Args:
cmd: Shell command to preprocess a file on disk.
engine: DataFrame engine to process the output of the shell command.
Currently supports both pandas and polars.
**kwargs: Keyword arguments that are passed through to
`pd.read_csv()`.
the engine's csv reader.
Returns:
A pandas DataFrame parsed from the stdout of the underlying
A DataFrame parsed from the stdout of the underlying
shell.
"""

check("cmd", cmd, [str])
if engine not in {"pandas", "polars"}:
raise ValueError("engine should be either pandas or polars.")
# adding check=True ensures that an explicit, clear error
# is raised, so that the user can see the reason for the failure
outcome = subprocess.run(
cmd, shell=True, capture_output=True, text=True, check=True
)
if engine == "polars":
try:
import polars as pl
except ImportError:
import_message(
submodule="polars",
package="polars",
conda_channel="conda-forge",
pip_install=True,
)
return pl.read_csv(StringIO(outcome.stdout), **kwargs)
return pd.read_csv(StringIO(outcome.stdout), **kwargs)


Expand Down

0 comments on commit 2dff4f6

Please sign in to comment.