nxbench.data package¶
Subpackages¶
Submodules¶
nxbench.data.constants module¶
nxbench.data.db module¶
- class nxbench.data.db.BenchmarkDB(db_path=None)[source]¶
Bases:
object
Database interface for storing and querying benchmark results.
- __init__(db_path=None)[source]¶
Initialize the database connection.
- Parameters:
db_path (str or Path, optional) – Path to SQLite database file. If None, uses default location
- delete_results(algorithm=None, backend=None, dataset=None, before_date=None)[source]¶
Delete benchmark results matching criteria.
- Parameters:
- Returns:
Number of records deleted
- Return type:
- get_results(algorithm=None, backend=None, dataset=None, start_date=None, end_date=None, as_pandas=True)[source]¶
Query benchmark results with optional filters.
- Parameters:
algorithm (str, optional) – Filter by algorithm name
backend (str, optional) – Filter by backend
dataset (str, optional) – Filter by dataset
start_date (str, optional) – Filter results after this date (ISO format)
end_date (str, optional) – Filter results before this date (ISO format)
as_pandas (bool, default=True) – Return results as pandas DataFrame
- Returns:
Filtered benchmark results
- Return type:
- save_results(results, git_commit=None, machine_info=None, python_version=None, package_versions=None)[source]¶
Save benchmark results to database.
- Parameters:
results (BenchmarkResult or list of BenchmarkResult) – Results to save
git_commit (str, optional) – Git commit hash for version tracking
machine_info (dict, optional) – System information
python_version (str, optional) – Python version used
package_versions (dict, optional) – Versions of key packages
- Return type:
nxbench.data.loader module¶
nxbench.data.repository module¶
- class nxbench.data.repository.NetworkMetadata(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)[source]¶
Bases:
object
Flexible metadata container for network datasets.
- Parameters:
- __init__(name, category='Unknown', description=None, source='Unknown', directed=False, weighted=False, vertex_type='Unknown', edge_type='Unknown', collection='Unknown', tags=<factory>, citations=<factory>, network_statistics=None, download_url=None)¶
- Parameters:
- Return type:
None
-
network_statistics:
NetworkStats
|None
= None¶
- class nxbench.data.repository.NetworkRepository(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]¶
Bases:
object
Asynchronous interface for downloading and working with networks from the networkrepository
- Parameters:
- __init__(data_home=None, scrape_delay=1.0, timeout=30, max_connections=10, max_keepalive_connections=5, keepalive_timeout=30)[source]¶
Initialize dataset loader with optional custom data directory.
- Parameters:
data_home (str or Path, optional) – Directory for storing downloaded datasets. If None, defaults to ~/nxdata
scrape_delay (float, default=1.0) – Delay between scraping requests to avoid overloading the server.
timeout (int, default=10) – Timeout for HTTP requests in seconds.
max_connections (int, default=1) – Maximum number of concurrent HTTP connections.
max_keepalive_connections (int)
keepalive_timeout (int)
- async discover_networks_by_category()[source]¶
Asynchronously scrape network names from networkrepository.com for each category.
- async fetch_with_retry(name)[source]¶
Attempt to fetch the metadata URL using alternative naming patterns.
- async get_network_metadata(name, category)[source]¶
Asynchronously fetch and parse the metadata for a specific network.
- Parameters:
- Returns:
The metadata object populated with information from the network’s page
- Return type:
- class nxbench.data.repository.NetworkStats(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)[source]¶
Bases:
object
Network statistics that are consistently reported.
- Parameters:
- __init__(n_nodes, n_edges, density, max_degree, min_degree, avg_degree, assortativity, n_triangles, avg_triangles, max_triangles, avg_clustering, transitivity, max_kcore, max_clique_lb)¶
nxbench.data.synthesize module¶
nxbench.data.utils module¶
- nxbench.data.utils.detect_delimiter(file_path, sample_size=5)[source]¶
Detect the most common delimiter in the first few lines of a file.
- nxbench.data.utils.lcc(G)[source]¶
Extract the largest connected component (LCC) of the graph.
Removes self-loops from the extracted subgraph.
- Parameters:
G (nx.Graph) – The input graph.
- Returns:
A subgraph containing the largest connected component without self-loops. If the input graph has no nodes, it returns the input graph.
- Return type:
nx.Graph