Python API#

All public functions are available directly from the top-level condastats package:

from condastats import overall, pkg_platform, data_source, pkg_version, pkg_python

Every function returns a pandas.Series (download counts indexed by package name and, optionally, by grouping dimension and/or time). The overall() function can also return a pandas.DataFrame when complete=True.

S3-backed functions#

These convenience functions read data from the public Anaconda S3 bucket via dask and return aggregated pandas results. They require dask and s3fs to be installed.

condastats.overall(package, month=None, start_month=None, end_month=None, monthly=False, complete=False, pkg_platform=None, data_source=None, pkg_version=None, pkg_python=None)[source]#

Get overall download counts for one or more conda packages.

Parameters#

packagestr or list of str

Package name(s) to query.

monthstr or datetime, optional

Specific month in YYYY-MM format.

start_monthstr or datetime, optional

Start of date range in YYYY-MM format. Must be used with end_month.

end_monthstr or datetime, optional

End of date range in YYYY-MM format. Must be used with start_month.

monthlybool, default False

If True, return monthly breakdown instead of totals.

completebool, default False

If True, return the full DataFrame without aggregation.

pkg_platformstr, optional

Filter by platform (e.g., ‘linux-64’, ‘osx-64’, ‘win-64’).

data_sourcestr, optional

Filter by data source (e.g., ‘anaconda’, ‘conda-forge’).

pkg_versionstr, optional

Filter by package version.

pkg_pythonstr or float, optional

Filter by Python version (e.g., ‘3.7’ or 3.7).

Returns#

pandas.Series or pandas.DataFrame

Download counts, either as a Series (aggregated) or DataFrame (complete).

Parameters:
Return type:

DataFrame | Series

condastats.pkg_platform(package, month=None, start_month=None, end_month=None, monthly=False)[source]#

Get download counts grouped by platform.

Parameters#

packagestr or list of str

Package name(s) to query.

monthstr or datetime, optional

Specific month in YYYY-MM format.

start_monthstr or datetime, optional

Start of date range in YYYY-MM format.

end_monthstr or datetime, optional

End of date range in YYYY-MM format.

monthlybool, default False

If True, return monthly breakdown.

Returns#

pandas.Series

Download counts grouped by platform.

Parameters:
Return type:

Series

condastats.data_source(package, month=None, start_month=None, end_month=None, monthly=False)[source]#

Get download counts grouped by data source.

Parameters#

packagestr or list of str

Package name(s) to query.

monthstr or datetime, optional

Specific month in YYYY-MM format.

start_monthstr or datetime, optional

Start of date range in YYYY-MM format.

end_monthstr or datetime, optional

End of date range in YYYY-MM format.

monthlybool, default False

If True, return monthly breakdown.

Returns#

pandas.Series

Download counts grouped by data source.

Parameters:
Return type:

Series

condastats.pkg_version(package, month=None, start_month=None, end_month=None, monthly=False)[source]#

Get download counts grouped by package version.

Parameters#

packagestr or list of str

Package name(s) to query.

monthstr or datetime, optional

Specific month in YYYY-MM format.

start_monthstr or datetime, optional

Start of date range in YYYY-MM format.

end_monthstr or datetime, optional

End of date range in YYYY-MM format.

monthlybool, default False

If True, return monthly breakdown.

Returns#

pandas.Series

Download counts grouped by package version.

Parameters:
Return type:

Series

condastats.pkg_python(package, month=None, start_month=None, end_month=None, monthly=False)[source]#

Get download counts grouped by Python version.

Parameters#

packagestr or list of str

Package name(s) to query.

monthstr or datetime, optional

Specific month in YYYY-MM format.

start_monthstr or datetime, optional

Start of date range in YYYY-MM format.

end_monthstr or datetime, optional

End of date range in YYYY-MM format.

monthlybool, default False

If True, return monthly breakdown.

Returns#

pandas.Series

Download counts grouped by Python version.

Parameters:
Return type:

Series

Pure-pandas query functions#

These functions operate on any pandas.DataFrame that follows the Anaconda package-data schema (columns: pkg_name, counts, time, pkg_platform, data_source, pkg_version, pkg_python).

They have no dependency on dask or s3fs and work anywhere pandas runs, including Pyodide.

condastats.query_overall(df, package=None, monthly=False, complete=False, pkg_platform=None, data_source=None, pkg_version=None, pkg_python=None)[source]#

Get overall download counts from a pandas DataFrame.

Parameters#

dfpandas.DataFrame

DataFrame with at least pkg_name and counts columns.

packagestr or list of str, optional

Package name(s) to filter by. If None, all packages are included.

monthlybool, default False

If True, return monthly breakdown instead of totals.

completebool, default False

If True, return the full filtered DataFrame without aggregation.

pkg_platformstr, optional

Filter by platform (e.g., ‘linux-64’, ‘osx-64’, ‘win-64’).

data_sourcestr, optional

Filter by data source (e.g., ‘anaconda’, ‘conda-forge’).

pkg_versionstr, optional

Filter by package version.

pkg_pythonstr or float, optional

Filter by Python version (e.g., ‘3.7’ or 3.7).

Returns#

pandas.Series or pandas.DataFrame

Download counts, either as a Series (aggregated) or DataFrame (complete).

Parameters:
Return type:

DataFrame | Series

condastats.query_grouped(df, column, package=None, monthly=False)[source]#

Get download counts grouped by a given dimension.

Parameters#

dfpandas.DataFrame

DataFrame with pkg_name, counts, and column columns.

columnstr

Column name to group by (e.g., 'pkg_platform', 'data_source').

packagestr or list of str, optional

Package name(s) to filter by. If None, all packages are included.

monthlybool, default False

If True, include a monthly breakdown.

Returns#

pandas.Series

Aggregated download counts.

Parameters:
Return type:

Series

condastats.top_packages(df, n=20)[source]#

Get the top n most downloaded packages.

Parameters#

dfpandas.DataFrame

DataFrame with pkg_name and counts columns.

nint, default 20

Number of top packages to return.

Returns#

pandas.Series

Top n packages sorted by total downloads (descending).

Parameters:
Return type:

Series

Common parameters#

The S3-backed functions share a core set of parameters:

package

One or more package names. Pass a string for a single package or a list of strings for multiple packages.

month

A specific month in YYYY-MM format (e.g., "2024-01"). Mutually exclusive with start_month/end_month.

start_month / end_month

Define a date range. Both must be provided together, in YYYY-MM format.

monthly

When True, return a per-month breakdown instead of a single total. Adds a time level to the result index.

Return types#

Scenario

Return type

Default (aggregated)

pandas.Series with a pandas.Index or pandas.MultiIndex

overall(..., complete=True)

pandas.DataFrame with all original columns