sbws.lib package¶
Submodules¶
sbws.lib.circuitbuilder module¶
-
class
sbws.lib.circuitbuilder.
CircuitBuilder
(args, conf, controller, relay_list, close_circuits_on_exit=True)¶ Bases:
object
The CircuitBuilder interface.
Subclasses must implement their own build_circuit() function. Subclasses may keep additional state if they’d find it helpful.
The primary way to use a CircuitBuilder of any type is to simply create it and then call cb.build_circuit(…) with any options that your CircuitBuilder type needs.
It might be good practice to close circuits as you find you no longer need them, but CircuitBuilder will keep track of existing circuits and close them when it is deleted.
-
build_circuit
(*a, **kw)¶ Implementations of this method should build the circuit and return its (str) ID. If it cannot be built, it should return None.
-
close_circuit
(circ_id)¶
-
relays
¶
-
-
class
sbws.lib.circuitbuilder.
GapsCircuitBuilder
(*a, **kw)¶ Bases:
sbws.lib.circuitbuilder.CircuitBuilder
The build_circuit member function takes a list. Falsey values in the list will be replaced with relays chosen uniformally at random; Truthy values will be assumed to be relays.
-
build_circuit
(path)¶ <path> is a list of relays and Falsey values. Relays can be specified by fingerprint or nickname, and fingerprint is highly recommended. Falsey values (like None) will be replaced with relays chosen uniformally at random. A relay will not be in a circuit twice.
-
-
exception
sbws.lib.circuitbuilder.
PathLengthException
(message=None, errors=None)¶ Bases:
Exception
-
sbws.lib.circuitbuilder.
valid_circuit_length
(path)¶
sbws.lib.relaylist module¶
-
class
sbws.lib.relaylist.
Relay
(fp, cont, ns=None, desc=None)¶ Bases:
object
-
address
¶
-
average_bandwidth
¶
-
bandwidth
¶
-
can_exit_to
(host, port)¶ Returns if this relay can MOST LIKELY exit to the given host:port. host can be a hostname, but be warned that we will resolve it locally and use the first (arbitrary/unknown order) result when checking exit policies, which is different than what other parts of the code may do (leaving it up to the exit to resolve the name).
-
exit_policy
¶
-
fingerprint
¶
-
flags
¶
-
master_key_ed25519
¶ Obtain ed25519 master key of the relay in server descriptors.
Returns: str, the ed25519 master key base 64 encoded without trailing ‘=’s.
-
nickname
¶
-
observed_bandwidth
¶
-
-
class
sbws.lib.relaylist.
RelayList
(args, conf, controller)¶ Bases:
object
Keeps a list of all relays in the current Tor network and updates it transparently in the background. Provides useful interfaces for getting only relays of a certain type.
-
REFRESH_INTERVAL
= 300¶
-
bad_exits
¶
-
exits
¶
-
fast
¶
-
guards
¶
-
non_exits
¶
-
random_relay
()¶
-
relays
¶
-
sbws.lib.relayprioritizer module¶
-
class
sbws.lib.relayprioritizer.
RelayPrioritizer
(args, conf, relay_list, result_dump)¶ Bases:
object
-
best_priority
()¶ Return a generator containing the best priority relays.
NOTE: A lower value for priority means better priority. Remember your data structures class in university and consider this something like a min-priority queue.
Priority is calculated as the sum of the “freshness” of each result for a relay. First we determine <oldest_allowed>, the time at which we stop considering results to be valid. From there, a result’s freshness is determined to be the amount of time between when the measurement was made and <oldest_allowed>. Therefore, you should see that a measurement made more recently will have a higher freshness.
We adjust down the freshness for results containing errors. If we ignored errors and didn’t increase a relay’s priority value for them, then we’ll get stuck trying to measure a few relays that have the best priority but are having issues getting measured. If we treated errors with equal weight as successful results, then it would take a while to get around to giving the relay another chance at a getting a successful measurement.
-
sbws.lib.resultdump module¶
-
class
sbws.lib.resultdump.
Result
(relay, circ, dest_url, scanner_nick, t=None)¶ Bases:
object
A simple struct to pack a measurement result into so that other code can be confident it is handling a well-formed result.
-
class
Relay
(fingerprint, nickname, address, master_key_ed25519, average_bandwidth=None, observed_bandwidth=None)¶ Bases:
object
Implements just enough of a stem RouterStatusEntryV3 for this Result class to be happy
-
address
¶
-
circ
¶
-
dest_url
¶
-
fingerprint
¶
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
master_key_ed25519
¶
-
nickname
¶
-
relay_average_bandwidth
¶
-
relay_observed_bandwidth
¶
-
scanner
¶
-
time
¶
-
to_dict
()¶
-
type
¶
-
version
¶
-
class
-
class
sbws.lib.resultdump.
ResultDump
(args, conf, end_event)¶ Bases:
object
Runs the enter() method in a new thread and collects new Results on its queue. Writes them to daily result files in the data directory
-
enter
()¶ Main loop for the ResultDump thread
-
handle_result
(result)¶ Call from ResultDump thread. If we are shutting down, ignores ResultError* types
-
results_for_relay
(relay)¶
-
store_result
(result)¶ Call from ResultDump thread
-
-
class
sbws.lib.resultdump.
ResultError
(*a, msg=None, **kw)¶ Bases:
sbws.lib.resultdump.Result
-
freshness_reduction_factor
¶ When the RelayPrioritizer encounters this Result, how much should it adjust its freshness? (See RelayPrioritizer.best_priority() for more information about “freshness”)
A higher factor makes the freshness lower (making the Result seem older). A lower freshness leads to the relay having better priority, and better priority means it will be measured again sooner.
The value 0.5 was chosen somewhat arbitrarily, but a few weeks of live network testing verifies that sbws is still able to perform useful measurements in a reasonable amount of time.
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
msg
¶
-
to_dict
()¶
-
type
¶
-
-
class
sbws.lib.resultdump.
ResultErrorAuth
(*a, **kw)¶ Bases:
sbws.lib.resultdump.ResultError
-
freshness_reduction_factor
¶ Override the default ResultError.freshness_reduction_factor because a ResultErrorAuth is most likely not the measured relay’s fault, so we shouldn’t hurt its priority as much. A higher reduction factor means a Result’s effective freshness is reduced more, which makes the relay’s priority better.
The value 0.9 was chosen somewhat arbitrarily.
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
to_dict
()¶
-
type
¶
-
-
class
sbws.lib.resultdump.
ResultErrorCircuit
(*a, **kw)¶ Bases:
sbws.lib.resultdump.ResultError
-
freshness_reduction_factor
¶ There are a few instances when it isn’t the relay’s fault that the circuit failed to get built. Maybe someday we’ll try detecting whose fault it most likely was and subclassing ResultErrorCircuit. But for now we don’t. So reduce the freshness slightly more than ResultError does by default so priority isn’t hurt quite as much.
A (hopefully very very rare) example of when a circuit would fail to get built is when the sbws client machine suddenly loses Internet access.
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
to_dict
()¶
-
type
¶
-
-
class
sbws.lib.resultdump.
ResultErrorStream
(*a, **kw)¶ Bases:
sbws.lib.resultdump.ResultError
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
to_dict
()¶
-
type
¶
-
static
-
class
sbws.lib.resultdump.
ResultSuccess
(rtts, downloads, *a, **kw)¶ Bases:
sbws.lib.resultdump.Result
-
downloads
¶
-
static
from_dict
(d)¶ Given a dict, returns the Result* subtype that is represented by the dict. If we don’t know how to parse the dict into a Result and it’s likely because the programmer forgot to implement something, raises NotImplementedError. If we can’t parse the dict for some other reason, return None.
-
rtts
¶
-
to_dict
()¶
-
type
¶
-
-
sbws.lib.resultdump.
load_recent_results_in_datadir
(fresh_days, datadir, success_only=False, on_changed_ipv4=False, on_changed_ipv6=False)¶ Given a data directory, read all results files in it that could have results in them that are still valid. Trim them, and return the valid Results as a list
-
sbws.lib.resultdump.
load_result_file
(fname, success_only=False)¶ Reads in all lines from the given file, and parses them into Result structures (or subclasses of Result). Optionally only keeps ResultSuccess. Returns all kept Results as a result dictionary. This function does not care about the age of the results
-
sbws.lib.resultdump.
merge_result_dicts
(d1, d2)¶ Given two dictionaries that contain Result data, merge them. Result dictionaries have keys of relay fingerprints and values of lists of results for those relays.
-
sbws.lib.resultdump.
trim_results
(fresh_days, result_dict)¶ Given a result dictionary, remove all Results that are no longer valid and return the new dictionary
-
sbws.lib.resultdump.
trim_results_ip_changed
(result_dict, on_changed_ipv4=False, on_changed_ipv6=False)¶ When there are results for the same relay with different IPs, create a new results’ dictionary without that relay’s results using an older IP.
Parameters: - result_dict (dict) – a dictionary of results
- on_changed_ipv4 (bool) – whether to trim the results when a relay’s IPv4 changes
- on_changed_ipv6 (bool) – whether to trim the results when a relay’s IPv6 changes
Returns: a new results dictionary
-
sbws.lib.resultdump.
write_result_to_datadir
(result, datadir)¶ Can be called from any thread
sbws.lib.v3bwfile module¶
Classes and functions that create the bandwidth measurements document (v3bw) used by bandwidth authorities.
-
class
sbws.lib.v3bwfile.
V3BWFile
(v3bwheader, v3bwlines)¶ Bases:
object
Create a Bandwidth List file following spec version 1.X.X
Parameters: - v3bwheader (V3BWHeader) – header
- v3bwlines (list) – V3BWLines
-
static
bw_kb
(bw_lines, reverse=False)¶
-
bw_line_for_node_id
(node_id)¶ Returns the bandwidth line for a given node fingerprint.
Used to combine data when plotting.
-
static
bw_sbws_scale
(bw_lines, scale_constant=7500, reverse=False)¶ Return a new V3BwLine list scaled using sbws method.
Parameters: - bw_lines (list) – bw lines to scale, not self.bw_lines, since this method will be before self.bw_lines have been initialized.
- scale_constant (int) – the constant to multiply by the ratio and the bandwidth to obtain the new bandwidth
Returns list: V3BwLine list
-
static
bw_torflow_scale
(bw_lines, desc_bw_obs_type=1, cap=0.05, num_round_dig=3, reverse=False)¶ Obtain final bandwidth measurements applying Torflow’s scaling method.
From Torflow’s README.spec.txt (section 2.2):
In this way, the resulting network status consensus bandwidth values # NOQA are effectively re-weighted proportional to how much faster the node # NOQA was as compared to the rest of the network.
The variables and steps used in Torflow:
strm_bw:
The strm_bw field is the average (mean) of all the streams for the relay # NOQA identified by the fingerprint field. strm_bw = sum(bw stream x)/|n stream|
filt_bw:
The filt_bw field is computed similarly, but only the streams equal to # NOQA or greater than the strm_bw are counted in order to filter very slow # NOQA streams due to slow node pairings.
filt_sbw and strm_sbw:
for rs in RouterStats.query.filter(stats_clause). options(eagerload_all('router.streams.circuit.routers')).all(): # NOQA tot_sbw = 0 sbw_cnt = 0 for s in rs.router.streams: if isinstance(s, ClosedStream): skip = False #for br in badrouters: # if br != rs: # if br.router in s.circuit.routers: # skip = True if not skip: # Throw out outliers < mean # (too much variance for stddev to filter much) if rs.strm_closed == 1 or s.bandwidth() >= rs.sbw: tot_sbw += s.bandwidth() sbw_cnt += 1 if sbw_cnt: rs.filt_sbw = tot_sbw/sbw_cnt else: rs.filt_sbw = None
filt_avg, and strm_avg:
Once we have determined the most recent measurements for each node, we # NOQA compute an average of the filt_bw fields over all nodes we have measured. # NOQA
filt_avg = sum(map(lambda n: n.filt_bw, nodes.itervalues()))/float(len(nodes)) # NOQA strm_avg = sum(map(lambda n: n.strm_bw, nodes.itervalues()))/float(len(nodes)) # NOQA
true_filt_avg and true_strm_avg:
for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]: true_filt_avg[cl] = filt_avg true_strm_avg[cl] = strm_avg
In the non-pid case, all types of nodes get the same avg
n.fbw_ratio and n.fsw_ratio:
for n in nodes.itervalues(): n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()] n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()]
n.ratio:
These averages are used to produce ratios for each node by dividing the # NOQA measured value for that node by the network average.
# Choose the larger between sbw and fbw if n.sbw_ratio > n.fbw_ratio: n.ratio = n.sbw_ratio else: n.ratio = n.fbw_ratio
desc_bw:
It is the
observed bandwidth
in the descriptor, NOT theaverage bandwidth
:return Router(ns.idhex, ns.nickname, bw_observed, dead, exitpolicy, ns.flags, ip, version, os, uptime, published, contact, rate_limited, # NOQA ns.orhash, ns.bandwidth, extra_info_digest, ns.unmeasured) self.desc_bw = max(bw,1) # Avoid div by 0
new_bw:
These ratios are then multiplied by the most recent observed descriptor # NOQA bandwidth we have available for each node, to produce a new value for # NOQA the network status consensus process.
n.new_bw = n.desc_bw*n.ratio
The descriptor observed bandwidth is multiplied by the ratio.
Limit the bandwidth to a maximum:
NODE_CAP = 0.05
if n.new_bw > tot_net_bw*NODE_CAP: plog("INFO", "Clipping extremely fast "+n.node_class()+" node "+n.idhex+"="+n.nick+ # NOQA " at "+str(100*NODE_CAP)+"% of network capacity ("+ str(n.new_bw)+"->"+str(int(tot_net_bw*NODE_CAP))+") "+ " pid_error="+str(n.pid_error)+ " pid_error_sum="+str(n.pid_error_sum)) n.new_bw = int(tot_net_bw*NODE_CAP)
However, tot_net_bw does not seems to be updated when not using pid. This clipping would make faster relays to all have the same value.
All of that can be expressed as:
-
classmethod
from_results
(results, state_fpath='', scale_constant=7500, scaling_method=None, torflow_obs=0, torflow_cap=0.05, torflow_round_digs=3, secs_recent=None, secs_away=None, min_num=0, consensus_path=None, max_bw_diff_perc=50, reverse=False)¶ Create V3BWFile class from sbws Results.
Parameters: - results (dict) – see below
- state_fpath (str) – path to the state file
- scaling_method (int) – Scaling method to obtain the bandwidth Posiable values: {NONE, SBWS_SCALING, TORFLOW_SCALING} = {0, 1, 2}
- scale_constant (int) – sbws scaling constant
- torflow_obs (int) – method to choose descriptor observed bandwidth
- reverse (bool) – whether to sort the bw lines descending or not
Results are in the form:
{'relay_fp1': [Result1, Result2, ...], 'relay_fp2': [Result1, Result2, ...]}
-
classmethod
from_v100_fpath
(fpath)¶
-
classmethod
from_v1_fpath
(fpath)¶
-
info_stats
¶
-
static
is_max_bw_diff_perc_reached
(bw_lines, max_bw_diff_perc=50)¶
-
is_min_perc
¶
-
max_bw
¶
-
mean_bw
¶
-
static
measured_progress_stats
(bw_lines, number_consensus_relays, min_perc_reached_before)¶ Statistics about measurements progress, to be included in the header.
Parameters: - bw_lines (list) – the bw_lines after scaling and applying filters.
- consensus_path (str) – the path to the cached consensus file.
- state_fpath (str) – the path to the state file
Returns dict, bool: Statistics about the progress made with measurements and whether the percentage of measured relays has been reached.
-
median_bw
¶
-
min_bw
¶
-
num
¶
-
static
read_number_consensus_relays
(consensus_path)¶ Read the number of relays in the Network from the cached consensus file.
-
sum_bw
¶
-
to_plt
(attrs=['bw'], sorted_by=None)¶ Return bandwidth data in a format useful for matplotlib.
Used from external tool to plot.
-
update_progress
(bw_lines, header, number_consensus_relays, state)¶
-
static
warn_if_not_accurate_enough
(bw_lines, scale_constant=7500)¶
-
write
(output)¶
-
class
sbws.lib.v3bwfile.
V3BWHeader
(timestamp, **kwargs)¶ Bases:
object
Create a bandwidth measurements (V3bw) header following bandwidth measurements document spec version 1.X.X.
Parameters: - timestamp (str) – timestamp in Unix Epoch seconds of the most recent generator result.
- version (str) – the spec version
- software (str) – the name of the software that generates this
- software_version (str) – the version of the software
- kwargs (dict) –
extra headers. Currently supported:
- earliest_bandwidth: str, ISO 8601 timestamp in UTC time zone when the first bandwidth was obtained
- generator_started: str, ISO 8601 timestamp in UTC time zone when the generator started
-
add_stats
(**kwargs)¶
-
static
earliest_bandwidth_from_results
(results)¶
-
classmethod
from_lines_v1
(lines)¶ Parameters: lines (list) – list of lines to parse Returns: tuple of V3BWHeader object and non-header lines
-
classmethod
from_lines_v100
(lines)¶ Parameters: lines (list) – list of lines to parse Returns: tuple of V3BWHeader object and non-header lines
-
classmethod
from_results
(results, state_fpath='')¶
-
classmethod
from_text_v1
(text)¶ Parameters: text (str) – text to parse Returns: tuple of V3BWHeader object and non-header lines
-
static
generator_started_from_file
(state_fpath)¶ ISO formatted timestamp for the time when the scanner process most recently started.
-
keyvalue_tuple_ls
¶ Return list of all KeyValue tuples
-
keyvalue_unordered_tuple_ls
¶ Return list of KeyValue tuples that do not have specific order.
-
keyvalue_v1str_ls
¶ Return KeyValue list of strings following spec v1.X.X.
-
keyvalue_v2_ls
¶ Return KeyValue list of strings following spec v2.X.X.
-
static
latest_bandwidth_from_results
(results)¶
-
num_lines
¶
-
strv1
¶ Return header string following spec v1.X.X.
-
strv2
¶ Return header string following spec v2.X.X.
-
class
sbws.lib.v3bwfile.
V3BWLine
(node_id, bw, **kwargs)¶ Bases:
object
Create a Bandwidth List line following the spec version 1.X.X.
Parameters: - node_id (str) –
- bw (int) –
- kwargs (dict) –
extra headers. Currently supported:
- nickname, str
- master_key_ed25519, str
- rtt, int
- time, str
- sucess, int
- error_stream, int
- error_circ, int
- error_misc, int
-
bw_keyvalue_tuple_ls
¶ Return list of KeyValue Bandwidth Line tuples.
-
bw_keyvalue_v1str_ls
¶ Return list of KeyValue Bandwidth Line strings following spec v1.X.X.
-
static
bw_mean_from_results
(results)¶
-
static
bw_median_from_results
(results)¶
-
bw_strv1
¶ Return Bandwidth Line string following spec v1.X.X.
-
static
desc_bw_obs_last_from_results
(results)¶
-
static
desc_bw_obs_mean_from_results
(results)¶
-
classmethod
from_bw_line_v1
(line)¶
-
classmethod
from_data
(data, fingerprint)¶
-
classmethod
from_results
(results, secs_recent=None, secs_away=None, min_num=0)¶ Convert sbws results to relays’ Bandwidth Lines
bs
stands for Bytes/secondsbw_mean
means the bw is obtained from the mean of the all the downloads’ bandwidth. Downloads’ bandwidth are calculated as the amount of data received divided by the the time it took to received. bw = data (Bytes) / time (seconds)
-
static
last_time_from_results
(results)¶
-
static
result_types_from_results
(results)¶
-
static
results_away_each_other
(results, secs_away=None)¶
-
static
results_recent_than
(results, secs_recent=None)¶
-
static
rtt_from_results
(results)¶
-
sbws.lib.v3bwfile.
kb_round_x_sig_dig
(bw_bs, digits=3)¶ Convert bw to KB and round to x most significat digits.
-
sbws.lib.v3bwfile.
num_results_of_type
(results, type_str)¶
-
sbws.lib.v3bwfile.
result_type_to_key
(type_str)¶