nxbench.data package¶

Subpackages¶

nxbench.data.scripts package

Submodules¶

nxbench.data.constants module¶

nxbench.data.db module¶

class nxbench.data.db.BenchmarkDB(conn_str=None)[source]¶

Bases: object

Database interface for storing and querying benchmark results in PostgreSQL.

Parameters:: conn_str (str | None)

__init__(conn_str=None)[source]¶

Initialize the database connection.

Parameters:: conn_str (str, optional) – PostgreSQL connection string. If None, a default connection string is used.

delete_results(algorithm=None, backend=None, dataset=None, before_date=None)[source]¶

Delete benchmark results matching criteria.

Parameters:

algorithm (str, optional) – Delete results for this algorithm.
backend (str, optional) – Delete results for this backend.
dataset (str, optional) – Delete results for this dataset.
before_date (str, optional) – Delete results before this date.

Returns:

Number of records deleted.

Return type:

int

get_results(algorithm=None, backend=None, dataset=None, start_date=None, end_date=None, as_pandas=True)[source]¶

Query benchmark results with optional filters.

Parameters:

algorithm (str, optional) – Filter by algorithm name.
backend (str, optional) – Filter by backend.
dataset (str, optional) – Filter by dataset.
start_date (str, optional) – Filter results after this date (ISO format).
end_date (str, optional) – Filter results before this date (ISO format).
as_pandas (bool, default=True) – Return results as a pandas DataFrame.

Returns:

Filtered benchmark results.

Return type:

DataFrame or list of dict

get_unique_values(column)[source]¶

Get unique values for a given column.

Return type:: list[str]
Parameters:: column (str)

save_results(results, git_commit=None, machine_info=None, package_versions=None)[source]¶

Save benchmark results to the database.

Parameters:

results (BenchmarkResult or list of BenchmarkResult) – Results to save.
git_commit (str, optional) – Git commit hash for version tracking.
machine_info (dict, optional) – System information.
package_versions (dict, optional) – Versions of key packages.

Return type:

None

truncate()[source]¶

Completely clear the benchmarks table and reset the serial counter.

Return type:: None

nxbench.data.loader module¶

class nxbench.data.loader.BenchmarkDataManager(data_dir=None)[source]¶

Bases: object

Manages loading and caching of networks for benchmarking.

Parameters:: data_dir (str | Path | None)

SUPPORTED_FORMATS: ClassVar[list[str]] = ['.edgelist', '.mtx', '.graphml', '.edges']¶

__init__(data_dir=None)[source]¶

Parameters:: data_dir (str | Path | None)

get_metadata(name)[source]¶

Return type:: dict[str, Any]
Parameters:: name (str)

async load_network(config, session=None)[source]¶

Load or generate a network based on config.

Return type:

tuple[Graph | DiGraph, dict[str, Any]]

Parameters:

config (DatasetConfig)
session (ClientSession | None)

load_network_sync(config)[source]¶

Return type:: tuple[Graph | DiGraph, dict[str, Any]]
Parameters:: config (DatasetConfig)

nxbench.data.repository module¶

class nxbench.data.repository.NetworkMetadata(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)[source]¶

Bases: object

Flexible metadata container for network datasets.

Parameters:

name (str)
category (str)
description (str | None)
source (str)
directed (bool)
weighted (bool)
vertex_type (str | None)
edge_type (str | None)
collection (str | None)
tags (list[str] | None)
citations (list[str])
network_statistics (NetworkStats | None)
download_url (str | None)

__init__(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)¶

Parameters:

name (str)
category (str)
description (str | None)
source (str)
directed (bool)
weighted (bool)
vertex_type (str | None)
edge_type (str | None)
collection (str | None)
tags (list[str] | None)
citations (list[str])
network_statistics (NetworkStats | None)
download_url (str | None)

Return type:

None

category: str = 'Unknown'¶

citations: list[str]¶

collection: str | None = 'Unknown'¶

description: str | None = None¶

directed: bool = False¶

download_url: str | None = None¶

edge_type: str | None = 'Unknown'¶

name: str¶

network_statistics: NetworkStats | None = None¶

source: str = 'Unknown'¶

tags: list[str] | None¶

vertex_type: str | None = 'Unknown'¶

weighted: bool = False¶

class nxbench.data.repository.NetworkRepository(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]¶

Bases: object

Asynchronous interface for downloading and working with networks from the networkrepository

Parameters:

data_home (str | Path | None)
scrape_delay (float)
timeout (int)
max_connections (int)
max_keepalive_connections (int)
keepalive_timeout (int)

__init__(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]¶

Initialize dataset loader with optional custom data directory.

Parameters:

data_home (str or Path, optional) – Directory for storing downloaded datasets. If None, defaults to ~/nxdata
scrape_delay (float, default=1.0) – Delay between scraping requests to avoid overloading the server.
timeout (int, default=10) – Timeout for HTTP requests in seconds.
max_connections (int, default=1) – Maximum number of concurrent HTTP connections.
max_keepalive_connections (int)
keepalive_timeout (int)

async discover_networks_by_category()[source]¶

Asynchronously scrape network names from networkrepository.com for each category.

Return type:: dict[str, list[str]]

async extract_download_url(soup, name, base_url='https://networkrepository.com/')[source]¶

Return type:

str | None

Parameters:

soup (BeautifulSoup)
name (str)
base_url (str)

async fetch_with_retry(name)[source]¶

Attempt to fetch the metadata URL using alternative naming patterns.

Return type:: str | None
Parameters:: name (str)

get_category_for_network(network_name)[source]¶

Get the category for a given network name.

Parameters:: network_name (str) – Name of the network
Returns:: The category name if found, else None
Return type:: str or None

async get_network_metadata(name, category)[source]¶

Asynchronously fetch and parse the metadata for a specific network.

Parameters:

name (str) – Name of the network
category (str) – Category of the network

Returns:

The metadata object populated with information from the network’s page

Return type:

NetworkMetadata

async list_networks(category=None, collection=None, min_nodes=None, max_nodes=None, directed=None, weighted=None, limit=None)[source]¶

List available networks matching specified criteria asynchronously.

Return type:

list[NetworkMetadata]

Parameters:

category (str | None)
collection (str | None)
min_nodes (int | None)
max_nodes (int | None)
directed (bool | None)
weighted (bool | None)
limit (int | None)

async verify_url(url)[source]¶

Check if the URL is valid and reachable.

Return type:: bool
Parameters:: url (str)

class nxbench.data.repository.NetworkStats(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)[source]¶

Bases: object

Network statistics that are consistently reported.

Parameters:

n_nodes (int)
n_edges (int)
density (float)
max_degree (int)
min_degree (int)
avg_degree (float)
assortativity (float)
n_triangles (int)
avg_triangles (float)
max_triangles (int)
avg_clustering (float)
transitivity (float)
max_kcore (int)
max_clique_lb (int)

__init__(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)¶

Parameters:

n_nodes (int)
n_edges (int)
density (float)
max_degree (int)
min_degree (int)
avg_degree (float)
assortativity (float)
n_triangles (int)
avg_triangles (float)
max_triangles (int)
avg_clustering (float)
transitivity (float)
max_kcore (int)
max_clique_lb (int)

Return type:

None

assortativity: float¶

avg_clustering: float¶

avg_degree: float¶

avg_triangles: float¶

density: float¶

max_clique_lb: int¶

max_degree: int¶

max_kcore: int¶

max_triangles: int¶

min_degree: int¶

n_edges: int¶

n_nodes: int¶

n_triangles: int¶

transitivity: float¶

nxbench.data.synthesize module¶

nxbench.data.synthesize.generate_graph(generator_name, gen_params, directed=False)[source]¶

Generate a synthetic network using networkx generator functions.

Return type:

Graph

Parameters:

generator_name (str)
gen_params (dict)
directed (bool)

nxbench.data.utils module¶

nxbench.data.utils.detect_delimiter(file_path, sample_size=5)[source]¶

Detect the most common delimiter in the first few lines of a file.

Return type:

str

Parameters:

file_path (Path)
sample_size (int)

nxbench.data.utils.fix_matrix_market_file(in_path)[source]¶

Return type:: Path
Parameters:: in_path (Path)

nxbench.data.utils.get_connected_components(G)[source]¶

Retrieve connected components of a graph.

Return type:: list
Parameters:: G (Graph)

nxbench.data.utils.lcc(G)[source]¶

Extract the largest connected component (LCC) of the graph.

Removes self-loops from the extracted subgraph.

Parameters:: G (nx.Graph) – The input graph.
Returns:: A subgraph containing the largest connected component without self-loops. If the input graph has no nodes, it returns the input graph.
Return type:: nx.Graph

nxbench.data.utils.normalize_name(name)[source]¶

Normalize the network name for URL construction. Preserves the original casing and replaces special characters with hyphens. Collapses multiple hyphens into a single hyphen and strips leading/trailing hyphens.

Return type:: str
Parameters:: name (str)

nxbench.data.utils.safe_extract(filepath, extracted_path)[source]¶

nxbench.data package¶

Subpackages¶

Submodules¶

nxbench.data.constants module¶

nxbench.data.db module¶

nxbench.data.loader module¶

nxbench.data.repository module¶

nxbench.data.synthesize module¶

nxbench.data.utils module¶

Module contents¶