I think I have an idea how this is happening. I don't have a great way
to prove it.

1) I think it requires having a lot of files/directories, such that the 
inventory pages have multiple levels.
2) It requires that something about the 'interesting' filter is causing some of 
the sub-tree pages to not get loaded. (Which if you have an incremental pull in 
a large tree, it is expected that some of the sub-sets won't get touched.)
3) It doesn't have anything to do with _chk_map_pyx.pyx specifically. It has to 
do with getting data that wasn't expected.

I'll try to document the program flow, because it is a bit confusing
with nested generators.

_get_filtered_chk_streams(...):
  _filter_id_to_entry()
    interesting_nodes = chk_map.iter_interesting_nodes(chk_bytes,
      self._chk_id_roots, uninteresting_root_keys)
    for record in _filter_text_keys(interesting_nodes, self._text_keys,
        chk_map._bytes_to_text_key):
        if record is not None:
            yield record


and in _filter_text_keys:
def _filter_text_keys(interesting_nodes_iterable, text_keys, bytes_to_text_key):
    """Iterate the result of iter_interesting_nodes, yielding the records
    and adding to text_keys.
    """
    text_keys_update = text_keys.update
    for record, items in interesting_nodes_iterable:
        text_keys_update([bytes_to_text_key(b) for n,b in items])
        yield record


The key parts is that for every node yielded by "iter_interesting_nodes", we
expect it to be a tuple of (record, items). And then for the items themselves,
we expect them to be tuples of name, bytes.

The yielder code looks like:

    def process(self):
        for record in self._read_all_roots():
            yield record, []
        for record, items in self._process_queues():
            yield record, items

The idea is that for InternalNode objects, they should *not* have any items,
and for LeafNode objects, they should have items of the appropriate form.


What it appears to be is that we are getting some items which are in the form
"n,b" but that "b" is not a simple string, but a tuple.

The code that grabs items is in "_read_nodes_from_store(refs)": which looks
like:

def _read_nodes_from_store(self, keys):
    ...
    for record in stream:
        ...
        bytes = record.get_bytes_as('fulltext')
        node = _deserialise(bytes, record.key,
                            search_key_func=self._search_key_func)
        if type(node) is InternalNode:
            # Note we don't have to do node.refs() because we know that
            # there are no children that have been pushed into this node
            ...
            prefix_refs = node._items.items()
            items = []
        else:
            prefix_refs = []
            ...
            items = node._items.items()
        yield record, node, prefix_refs, items


Looking at that, we force items to be an empty list if it is an InternalNode.
So this has to be happening on a LeafNode.

Looking at the code parser for LeafNodes, the 'key' portion is rather complex,
but we have:
    value = PyString_FromStringAndSize(value_start, next_line - value_start)
    ...
    entry_bits = StaticTuple_Intern(entry_bits)
    PyDict_SetItem(items, entry_bits, value)

So I can't see how value could be anything other than a String.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/721163

Title:
  bzr crashed with ErrorFromSmartServer: ('error', 'bytes must be a
  string')

To manage notifications about this bug go to:
https://bugs.launchpad.net/bzr/+bug/721163/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to