nxbench.data package

Subpackages

Submodules

nxbench.data.constants module

nxbench.data.db module

class nxbench.data.db.BenchmarkDB(db_path=None)[source]

Bases: object

Database interface for storing and querying benchmark results.

Parameters:

db_path (str | Path | None)

__init__(db_path=None)[source]

Initialize the database connection.

Parameters:

db_path (str or Path, optional) – Path to SQLite database file. If None, uses default location

delete_results(algorithm=None, backend=None, dataset=None, before_date=None)[source]

Delete benchmark results matching criteria.

Parameters:
  • algorithm (str, optional) – Delete results for this algorithm

  • backend (str, optional) – Delete results for this backend

  • dataset (str, optional) – Delete results for this dataset

  • before_date (str, optional) – Delete results before this date

Returns:

Number of records deleted

Return type:

int

get_results(algorithm=None, backend=None, dataset=None, start_date=None, end_date=None, as_pandas=True)[source]

Query benchmark results with optional filters.

Parameters:
  • algorithm (str, optional) – Filter by algorithm name

  • backend (str, optional) – Filter by backend

  • dataset (str, optional) – Filter by dataset

  • start_date (str, optional) – Filter results after this date (ISO format)

  • end_date (str, optional) – Filter results before this date (ISO format)

  • as_pandas (bool, default=True) – Return results as pandas DataFrame

Returns:

Filtered benchmark results

Return type:

DataFrame or list of dict

get_unique_values(column)[source]

Get unique values for a given column.

Return type:

list[str]

Parameters:

column (str)

save_results(results, git_commit=None, machine_info=None, python_version=None, package_versions=None)[source]

Save benchmark results to database.

Parameters:
  • results (BenchmarkResult or list of BenchmarkResult) – Results to save

  • git_commit (str, optional) – Git commit hash for version tracking

  • machine_info (dict, optional) – System information

  • python_version (str, optional) – Python version used

  • package_versions (dict, optional) – Versions of key packages

Return type:

None

nxbench.data.loader module

class nxbench.data.loader.BenchmarkDataManager(data_dir=None)[source]

Bases: object

Manages loading and caching of networks for benchmarking.

Parameters:

data_dir (str | Path | None)

SUPPORTED_FORMATS: ClassVar[list[str]] = ['.edgelist', '.mtx', '.graphml', '.edges']
__init__(data_dir=None)[source]
Parameters:

data_dir (str | Path | None)

get_metadata(name)[source]
Return type:

dict[str, Any]

Parameters:

name (str)

async load_network(config, session=None)[source]

Load or generate a network based on config.

Return type:

tuple[Graph | DiGraph, dict[str, Any]]

Parameters:
load_network_sync(config)[source]
Return type:

tuple[Graph | DiGraph, dict[str, Any]]

Parameters:

config (DatasetConfig)

nxbench.data.repository module

class nxbench.data.repository.NetworkMetadata(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)[source]

Bases: object

Flexible metadata container for network datasets.

Parameters:
__init__(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)
Parameters:
Return type:

None

category: str = 'Unknown'
citations: list[str]
collection: str | None = 'Unknown'
description: str | None = None
directed: bool = False
download_url: str | None = None
edge_type: str | None = 'Unknown'
name: str
network_statistics: NetworkStats | None = None
source: str = 'Unknown'
tags: list[str] | None
vertex_type: str | None = 'Unknown'
weighted: bool = False
class nxbench.data.repository.NetworkRepository(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]

Bases: object

Asynchronous interface for downloading and working with networks from the networkrepository

Parameters:
  • data_home (str | Path | None)

  • scrape_delay (float)

  • timeout (int)

  • max_connections (int)

  • max_keepalive_connections (int)

  • keepalive_timeout (int)

__init__(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]

Initialize dataset loader with optional custom data directory.

Parameters:
  • data_home (str or Path, optional) – Directory for storing downloaded datasets. If None, defaults to ~/nxdata

  • scrape_delay (float, default=1.0) – Delay between scraping requests to avoid overloading the server.

  • timeout (int, default=10) – Timeout for HTTP requests in seconds.

  • max_connections (int, default=1) – Maximum number of concurrent HTTP connections.

  • max_keepalive_connections (int)

  • keepalive_timeout (int)

async discover_networks_by_category()[source]

Asynchronously scrape network names from networkrepository.com for each category.

Return type:

dict[str, list[str]]

async extract_download_url(soup, name, base_url='https://networkrepository.com/')[source]
Return type:

str | None

Parameters:
  • soup (BeautifulSoup)

  • name (str)

  • base_url (str)

async fetch_with_retry(name)[source]

Attempt to fetch the metadata URL using alternative naming patterns.

Return type:

str | None

Parameters:

name (str)

get_category_for_network(network_name)[source]

Get the category for a given network name.

Parameters:

network_name (str) – Name of the network

Returns:

The category name if found, else None

Return type:

str or None

async get_network_metadata(name, category)[source]

Asynchronously fetch and parse the metadata for a specific network.

Parameters:
  • name (str) – Name of the network

  • category (str) – Category of the network

Returns:

The metadata object populated with information from the network’s page

Return type:

NetworkMetadata

async list_networks(category=None, collection=None, min_nodes=None, max_nodes=None, directed=None, weighted=None, limit=None)[source]

List available networks matching specified criteria asynchronously.

Return type:

list[NetworkMetadata]

Parameters:
  • category (str | None)

  • collection (str | None)

  • min_nodes (int | None)

  • max_nodes (int | None)

  • directed (bool | None)

  • weighted (bool | None)

  • limit (int | None)

async verify_url(url)[source]

Check if the URL is valid and reachable.

Return type:

bool

Parameters:

url (str)

class nxbench.data.repository.NetworkStats(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)[source]

Bases: object

Network statistics that are consistently reported.

Parameters:
__init__(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)
Parameters:
Return type:

None

assortativity: float
avg_clustering: float
avg_degree: float
avg_triangles: float
density: float
max_clique_lb: int
max_degree: int
max_kcore: int
max_triangles: int
min_degree: int
n_edges: int
n_nodes: int
n_triangles: int
transitivity: float

nxbench.data.synthesize module

nxbench.data.synthesize.generate_graph(generator_name, gen_params, directed=False)[source]

Generate a synthetic network using networkx generator functions.

Return type:

Graph

Parameters:
  • generator_name (str)

  • gen_params (dict)

  • directed (bool)

nxbench.data.utils module

nxbench.data.utils.detect_delimiter(file_path, sample_size=5)[source]

Detect the most common delimiter in the first few lines of a file.

Return type:

str

Parameters:
  • file_path (Path)

  • sample_size (int)

nxbench.data.utils.fix_matrix_market_file(in_path)[source]
Return type:

Path

Parameters:

in_path (Path)

nxbench.data.utils.get_connected_components(G)[source]

Retrieve connected components of a graph.

Return type:

list

Parameters:

G (Graph)

nxbench.data.utils.lcc(G)[source]

Extract the largest connected component (LCC) of the graph.

Removes self-loops from the extracted subgraph.

Parameters:

G (nx.Graph) – The input graph.

Returns:

A subgraph containing the largest connected component without self-loops. If the input graph has no nodes, it returns the input graph.

Return type:

nx.Graph

nxbench.data.utils.normalize_name(name)[source]

Normalize the network name for URL construction. Preserves the original casing and replaces special characters with hyphens. Collapses multiple hyphens into a single hyphen and strips leading/trailing hyphens.

Return type:

str

Parameters:

name (str)

nxbench.data.utils.safe_extract(filepath, extracted_path)[source]

Module contents