Tx size estimation allows varied in, out types

Prior to this commit, p2wsh inputs from fidelity bonds resulted in miscalculation of transaction fees, even in cases where the exact set of inputs were known (such as a direct send). In this commit we change the estimation to a model in which the caller of jmbitcoin.secp256k1_transaction.estimate_tx_size must specify a list of types, one for each input to the transaction, and the same for outputs. In some cases, the caller of the function uses the default script type of the wallet, but in other cases where the caller can know the exact types of each utxo used as input, and each destination used as output, they are specified explicitly. In particular, the use of fidelity bond outputs as input to transactions can be accounted for. Currently this is only done in direct send payments; coinjoins still fall back to assuming all inputs the same type (note that it is not possible to use fidelity bond utxos as inputs to coinjoins). Note also that the burn destination calculation in direct send is removed, since it is not used, so the maintenance burden is best avoided.
3 years ago · bffad33b74
4 changed files with 154 additions and 107 deletions
--- a/jmbitcoin/jmbitcoin/secp256k1_transaction.py
+++ b/jmbitcoin/jmbitcoin/secp256k1_transaction.py
@ -106,70 +106,89 @@ def human_readable_output(txoutput):
        pass # non standard script
    return outdict
-def estimate_tx_size(ins, outs, txtype='p2pkh', outtype=None):
+def there_is_one_segwit_input(x):
    # note that we need separate input types for
    # any distinct types of scripthash inputs supported,
    # since each may have a different size of witness; in
    # that case, the internal list in this list comprehension
    # will need updating.
    return any([y in ["p2sh-p2wpkh", "p2wpkh", "p2wsh"] for y in x])
 def estimate_tx_size(ins, outs):
    '''Estimate transaction size.
-    The txtype field as detailed below is used to distinguish
+    Both arguments `ins` and `outs` must be lists of script types,
-    the type, but there is at least one source of meaningful roughness:
+    and they must be present in the keys of the dicts `inmults`,
-    we assume that the scriptPubKey type of all the outputs are the same as
+    `outmults` defined here.
-    the input, unless `outtype` is specified, in which case *one* of
+    Note that variation in ECDSA signature sizes means
-    the outputs is assumed to be that other type, with all of the other
+    we will sometimes see small inaccuracies in this estimate, but
-    outputs being of the same type as before.
+    that this is ameliorated by the existence of the witness discount,
-    This, combined with a few bytes variation in signature sizes means
+    in actually estimating fees.
-    we will sometimes see small inaccuracies in this estimate.
+    The value '72' is used for the most-likely size of these ECDSA
-
+    signatures, due to 30[1 byte] + len(rest)[1 byte] + type:02 [1 byte] + len(r)[1] + r[32 or 33] + type:02[1] + len(s)[1] + s[32] + sighash_all [1]
-    Assuming p2pkh:
+    ... though as can be seen, 71 is also likely:
-    out: 8+1+3+20+2=34, in: 32+4+1+1+~72+1+33+4=148,
+    r length 33 occurs when the value is 'negative' (>N/2) and a byte x80 is prepended,
-    ver: 4, locktime:4, +2 (len in,out)
+    but shorter values for r are possible if rare.
-    total = 34*len_out + 148*len_in + 10 (sig sizes vary slightly)
+    Returns:
-
+    Either a single integer, if the transaction will be non-segwit,
-    Assuming p2sh M of N multisig:
+    or a tuple (int, int) for witness and non-witness bytes respectively).
    "ins" must contain M, N so ins= (numins, M, N) (crude assuming all same)
    73*M + 34*N + 45 per input, so total ins ~ len_ins * (45+73M+34N)
    so total ~ 32*len_out + (45+73M+34N)*len_in + 10
    Assuming p2sh-p2wpkh:
    witness are roughly 1+1+~72+1+33 for each input
    (txid, vin, 4+20 for witness program encoded as scriptsig, 4 for sequence)
    non-witness input fields are roughly 32+4+4+20+4=64, so total becomes
    n_in * 64 + 4(ver) + 2(marker, flag) + 2(n_in, n_out) + 4(locktime) + n_out*32
    Assuming p2wpkh native:
    witness as previous case
    non-witness loses the 24 witnessprogram, replaced with 1 zero,
    in the scriptSig, so becomes:
    4 + 1 + 1 + (n_in) + (vin) + (n_out) + (vout) + (witness) + (locktime)
    non-witness: 4(ver) +2 (marker, flag) + n_in*41 + 4(locktime) +2 (len in, out) + n_out*31
    witness: 1 + 1 + 72 + 1 + 33
    '''
-    if txtype == 'p2pkh':
+
-        return 4 + 4 + 2 + ins*148 + 34*outs + (
+    # All non-witness input sizes include: txid, index, sequence,
-            OUTPUT_EXTRA_BYTES[txtype][outtype]
+    # which is 32, 4 and 4; the remaining is scriptSig which is 1
-            if outtype and outtype in OUTPUT_EXTRA_BYTES[txtype] else 0)
+    # at minimum, for native segwit (the byte x00). Hence 41 is the minimum.
-    elif txtype == 'p2sh-p2wpkh':
+    # The witness field for p2wpkh consists of sig, pub so 72 + 33 + 1 byte
-        #return the estimate for the witness and non-witness
+    # for the number of witness elements and 2 bytes for the size of each element,
-        #portions of the transaction, assuming that all the inputs
+    # hence 108.
-        #are of segwit type p2sh-p2wpkh
+    # For p2pkh, 148 comes from 32+4+1+1+~72+1+33+4
-        # Note as of Jan19: this misses 2 bytes (trivial) for len in, out
+    # For p2sh-p2wpkh there is an additional 23 bytes of witness for the redeemscript.
-        # and also overestimates output size by 2 bytes.
+    #
-        witness_estimate = ins*108
+    # Note that p2wsh here is specific to the script
-        non_witness_estimate = 4 + 4 + 4 + outs*32 + ins*64 + (
+    # we use for fidelity bonds; 43 is the bytes required for that
-            OUTPUT_EXTRA_BYTES[txtype][outtype]
+    # script's redeemscript field in the witness, but for arbitrary scripts,
-            if outtype and outtype in OUTPUT_EXTRA_BYTES[txtype] else 0)
+    # the witness portion could be any other size.
-        return (witness_estimate, non_witness_estimate)
+    # Hence, we may need to modify this later.
-    elif txtype == 'p2wpkh':
+    inmults = {"p2wsh": {"w": 1 + 72 + 43, "nw": 41},
-        witness_estimate = ins*108
+               "p2wpkh": {"w": 108, "nw": 41},
-        non_witness_estimate = 4 + 4 + 4 + outs*31 + ins*41 + (
+               "p2sh-p2wpkh": {"w": 108, "nw": 64},
-            OUTPUT_EXTRA_BYTES[txtype][outtype]
+               "p2pkh": {"w": 0, "nw": 148}}
-            if outtype and outtype in OUTPUT_EXTRA_BYTES[txtype] else 0)
+
-        return (witness_estimate, non_witness_estimate)
+    # Notes: in outputs, there is only 1 'scripthash'
-    elif txtype == 'p2shMofN':
+    # type for either segwit/nonsegwit.
-        ins, M, N = ins
+    # p2wsh has structure 8 bytes output, then:
-        return 4 + 4 + 2 + (45 + 73*M + 34*N)*ins + outs*32 + (
+    # x22,x00,x20,(32 byte hash), so 32 + 3 + 8
-            OUTPUT_EXTRA_BYTES['p2sh-p2wpkh'][outtype]
+    # note also there is no need to distinguish witness
-            if outtype and outtype in OUTPUT_EXTRA_BYTES['p2sh-p2wpkh'] else 0)
+    # here, outputs are always entirely nonwitness.
-    else:
+    outmults = {"p2wsh": 43,
-        raise NotImplementedError("Transaction size estimation not" +
+               "p2wpkh": 31,
-                                  "yet implemented for type: " + txtype)
+               "p2sh-p2wpkh": 64,
               "p2pkh": 34}
    # nVersion, nLockTime, nins, nouts:
    nwsize =  4 + 4 + 2
    wsize = 0
    tx_is_segwit = there_is_one_segwit_input(ins)
    if tx_is_segwit:
        # flag and marker bytes
        nwsize += 2
    for i in ins:
        if i not in inmults:
            raise NotImplementedError(
                "Script type not supported for transaction size "
                "estimation: {}".format(i))
        inmult = inmults[i]
        nwsize += inmult["nw"]
        wsize += inmult["w"]
    for o in outs:
        if o not in outmults:
            raise NotImplementedError(
                "Script type not supported for transaction size "
                "estimation: {}".format(o))
        nwsize += outmults[o]
    if not tx_is_segwit:
        return nwsize
    return (wsize, nwsize)
 def pubkey_to_p2pkh_script(pub, require_compressed=False):
    """
--- a/jmclient/jmclient/taker_utils.py
+++ b/jmclient/jmclient/taker_utils.py
@ -10,9 +10,9 @@ from .schedule import human_readable_schedule_entry, tweak_tumble_schedule,\
    schedule_to_text
 from .wallet import BaseWallet, estimate_tx_fee, compute_tx_locktime, \
    FidelityBondMixin
-from jmbitcoin import make_shuffled_tx, amount_to_str, mk_burn_script,\
+from jmbitcoin import make_shuffled_tx, amount_to_str, \
                       PartiallySignedTransaction, CMutableTxOut,\
-                       human_readable_transaction, Hash160
+                       human_readable_transaction
 from jmbase.support import EXIT_SUCCESS
 log = get_log()
@ -21,6 +21,15 @@ Utility functions for tumbler-style takers;
 Currently re-used by CLI script tumbler.py and joinmarket-qt
 """
 def get_utxo_scripts(wallet: BaseWallet, utxos):
    # given a Joinmarket wallet and a set of utxos
    # as passed from `get_utxos_by_mixdepth` at one mixdepth,
    # return the list of script types for each utxo
    script_types = []
    for k, v in utxos.items():
        script_types.append(wallet.get_outtype(v["address"]))
    return script_types
 def direct_send(wallet_service, amount, mixdepth, destination, answeryes=False,
                accept_callback=None, info_callback=None, error_callback=None,
                return_transaction=False, with_final_psbt=False,
@ -97,38 +106,27 @@ def direct_send(wallet_service, amount, mixdepth, destination, answeryes=False,
                "There are no available utxos in mixdepth: " + str(mixdepth) + ", quitting.")
            return
        total_inputs_val = sum([va['value'] for u, va in utxos.items()])
-
+        script_types = get_utxo_scripts(wallet_service.wallet, utxos)
-        if is_burn_destination(destination):
+        fee_est = estimate_tx_fee(len(utxos), 1, txtype=script_types, outtype=outtype)
-            if len(utxos) > 1:
+        outs = [{"address": destination, "value": total_inputs_val - fee_est}]
                log.error("Only one input allowed when burning coins, to keep "
                    + "the tx small. Tip: use the coin control feature to freeze utxos")
                return
            address_type = FidelityBondMixin.BIP32_BURN_ID
            index = wallet_service.wallet.get_next_unused_index(mixdepth, address_type)
            path = wallet_service.wallet.get_path(mixdepth, address_type, index)
            privkey, engine = wallet_service.wallet._get_key_from_path(path)
            pubkey = engine.privkey_to_pubkey(privkey)
            pubkeyhash = Hash160(pubkey)
            #size of burn output is slightly different from regular outputs
            burn_script = mk_burn_script(pubkeyhash)
            fee_est = estimate_tx_fee(len(utxos), 0, txtype=txtype, extra_bytes=len(burn_script)/2)
            outs = [{"script": burn_script, "value": total_inputs_val - fee_est}]
            destination = "BURNER OUTPUT embedding pubkey at " \
                + wallet_service.wallet.get_path_repr(path) \
                + "\n\nWARNING: This transaction if broadcasted will PERMANENTLY DESTROY your bitcoins\n"
        else:
            #regular sweep (non-burn)
            fee_est = estimate_tx_fee(len(utxos), 1, txtype=txtype, outtype=outtype)
            outs = [{"address": destination, "value": total_inputs_val - fee_est}]
    else:
-        #not doing a sweep; we will have change
+        change_type = wallet_service.get_txtype()
-        #8 inputs to be conservative
+        if custom_change_addr:
-        initial_fee_est = estimate_tx_fee(8,2, txtype=txtype, outtype=outtype)
+            change_type = wallet_service.get_outtype(custom_change_addr)
-        utxos = wallet_service.select_utxos(mixdepth, amount + initial_fee_est)
+            if change_type is None:
                # we don't recognize this type; best we can do is revert to default,
                # even though it may be inaccurate:
                change_type = wallet_service.get_txtype()
        outtypes = [change_type, outtype]
        # not doing a sweep; we will have change.
        # 8 inputs to be conservative; note we cannot account for the possibility
        # of non-standard input types at this point.
        initial_fee_est = estimate_tx_fee(8,2, txtype=txtype, outtype=outtypes)
        utxos = wallet_service.select_utxos(mixdepth, amount + initial_fee_est,
                                            includeaddr=True)
        script_types = get_utxo_scripts(wallet_service.wallet, utxos)
        if len(utxos) < 8:
-            fee_est = estimate_tx_fee(len(utxos), 2, txtype=txtype, outtype=outtype)
+            fee_est = estimate_tx_fee(len(utxos), 2, txtype=script_types, outtype=outtypes)
        else:
            fee_est = initial_fee_est
        total_inputs_val = sum([va['value'] for u, va in utxos.items()])
--- a/jmclient/jmclient/wallet.py
+++ b/jmclient/jmclient/wallet.py
@ -57,6 +57,21 @@ def estimate_tx_fee(ins, outs, txtype='p2pkh', outtype=None, extra_bytes=0):
    '''Returns an estimate of the number of satoshis required
    for a transaction with the given number of inputs and outputs,
    based on information from the blockchain interface.
    Arguments:
    ins: int, number of inputs
    outs: int, number of outputs
    txtype: either a single string, or a list of strings
    outtype: either None or a list of strings
    extra_bytes: an int
    These arguments are intended to allow a kind of 'default', where
    all the inputs and outputs match a predefined type (that of the wallet),
    but also allow customization for heterogeneous input and output types.
    For supported input and output types, see the keys of the dicts
    `inmults` and `outmults` in jmbitcoin.secp256k1_transaction.estimate_tx_size`.
    Returns:
    a single integer number of satoshis as estimate.
    '''
    if jm_single().bc_interface is None:
        raise RuntimeError("Cannot estimate transaction fee " +
@ -73,18 +88,33 @@ def estimate_tx_fee(ins, outs, txtype='p2pkh', outtype=None, extra_bytes=0):
            btc.fee_per_kb_to_str(fee_per_kb) +
            " greater than absurd value " +
            btc.fee_per_kb_to_str(absurd_fee) + ", quitting.")
-    if txtype in ['p2pkh', 'p2shMofN']:
+
-        tx_estimated_bytes = btc.estimate_tx_size(ins, outs, txtype, outtype) + extra_bytes
+    # See docstring for explanation:
    if isinstance(txtype, str):
        ins = [txtype]* ins
    else:
        assert isinstance(txtype, list)
        ins = txtype
    if outtype is None:
        outs = [txtype] * outs
    elif isinstance(outtype, str):
        outs = [outtype] * outs
    else:
        assert isinstance(outtype, list)
        outs = outtype
    # Note: the calls to `estimate_tx_size` in this code
    # block can raise `NotImplementedError` if any of the
    # strings in (ins, outs) are not known script types.
    if not btc.there_is_one_segwit_input(ins):
        tx_estimated_bytes = btc.estimate_tx_size(ins, outs) + extra_bytes
        return int((tx_estimated_bytes * fee_per_kb)/Decimal(1000.0))
-    elif txtype in ['p2wpkh', 'p2sh-p2wpkh']:
+    else:
        witness_estimate, non_witness_estimate = btc.estimate_tx_size(
-            ins, outs, txtype, outtype)
+            ins, outs)
        non_witness_estimate += extra_bytes
        return int(int((
            non_witness_estimate + 0.25*witness_estimate)*fee_per_kb)/Decimal(1000.0))
    else:
        raise NotImplementedError("Txtype: " + txtype + " not implemented.")
 def compute_tx_locktime():
    # set locktime for best anonset (Core, Electrum)
--- a/jmclient/test/test_taker.py
+++ b/jmclient/test/test_taker.py
@ -513,15 +513,15 @@ def test_custom_change(setup_taker):
        for out in taker.latest_tx.vout:
            # input utxo is 200M; amount is 20M; as per logs:
            # totalin=200000000
-            # my_txfee=13050
+            # my_txfee=13680 <- this estimate ignores address type
            # makers_txfee=3000
-            # cjfee_total=12000 => changevalue=179974950
+            # cjfee_total=12000 => changevalue=179974320
            # note that there is a small variation in the size of
            # the transaction (a few bytes) for the different scriptPubKey
-            # type, but this is currently ignored by the Taker, who makes
+            # type, but this is currently ignored in coinjoins by the
-            # fee estimate purely based on the number of ins and outs;
+            # Taker (not true for direct send operations), hence we get
-            # this will never be too far off anyway.
+            # the same value for each different output type.
-            if out.scriptPubKey == script and out.nValue == 179974950:
+            if out.scriptPubKey == script and out.nValue == 179974320:
                # must be only one
                assert not custom_change_found
                custom_change_found = True