Pandarallel - ^hot^
What is Pandarallel? Pandarallel is a Python library that provides easy parallel computing for pandas operations. It allows you to replace standard pandas apply , map , and other functions with parallelized versions, leveraging all CPU cores of your machine. Installation pip install pandarallel For full features (progress bars, etc.):
def heavy_func(x): return sum(np.sin(x) * np.cos(x) for _ in range(100)) start = time.time() result_pd = df['x'].apply(heavy_func) print(f"Pandas: time.time() - start:.2fs") Pandarallel start = time.time() result_pll = df['x'].parallel_apply(heavy_func) print(f"Pandarallel: time.time() - start:.2fs") Common Issues & Solutions 1. PicklingError (lambdas with closures) # This will fail df.parallel_apply(lambda row: row['a'] + external_var) Solution: Define a regular function def add_external(row): return row['a'] + external_var pandarallel
df = pd.DataFrame('x': np.random.rand(500000)) What is Pandarallel
pip install pandarallel[full] import pandas as pd from pandarallel import pandarallel Initialize (do this once before using parallel functions) pandarallel.initialize() Optional: with progress bar and custom settings pandarallel.initialize( progress_bar=True, nb_workers=4, # number of workers (default: all CPUs) verbose=1 ) Key Parallel Functions | Pandas Function | Pandarallel Equivalent | |----------------|------------------------| | df.apply() | df.parallel_apply() | | df.applymap() | df.parallel_applymap() | | series.apply() | series.parallel_apply() | | series.map() | series.parallel_map() | | groupby.apply() | groupby.parallel_apply() | Examples 1. Basic parallel_apply on DataFrame import pandas as pd from pandarallel import pandarallel pandarallel.initialize(progress_bar=True) Parallel map on Series def slow_function(x): return x
df = pd.DataFrame( 'a': range(1000000), 'b': range(1000000, 2000000) ) df['c'] = df.apply(lambda row: row['a'] * row['b'], axis=1) Pandarallel (fast) df['c'] = df.parallel_apply(lambda row: row['a'] * row['b'], axis=1) 2. Parallel map on Series def slow_function(x): return x ** 2 + x * 3 series = pd.Series(range(100000)) Parallel version result = series.parallel_map(slow_function) 3. Parallel applymap for element-wise operations df = pd.DataFrame(np.random.rand(1000, 1000)) def complex_func(x): return np.log(x + 1) * np.sin(x) Apply to every element in parallel result = df.parallel_applymap(complex_func) 4. Parallel groupby-apply df = pd.DataFrame( 'group': np.random.choice(['A', 'B', 'C'], 100000), 'value': np.random.randn(100000) ) def group_operation(group): return group['value'].mean() + group['value'].std()
Для отправки комментария необходимо войти на сайт.