Archives for category: Uncategorized

I worked on my thesis in one repo, let’s call it thesis. My lab has another repo, pubs, and we have a policy of keeping all submissions (final or in draft) in this repo.

I decided to use subtree merge to maintain the history of my own repo, while moving the contents into another repo. In other words, I grafted a subdirectory of one repo onto a subdirectory of another. For generality, suppose you want to move the path a/b on the repo OLD to the path y/z on the repo NEW.

First, use git subtree split to create a commit on repo OLD, containing only commits which affect the path a/b, putting the resulting commit on branch B:

# (inside repo OLD)
git subtree split -P a/b -b B

This results in the following output, the last line of which is the hash of the new commit. Remember that hash, as we’ll use it later when merging into the new repo.

Then switch to the repo NEW, and add the repo OLD as a remote:

git remote add -f old-remote PATH-TO-OLD

Issue the following command, where COMMIT is the commit hash we got in the subtree split:

# (inside repo NEW)
git subtree add --prefix=y/z COMMIT

The result is that the commit history of subdirectory a/b in the OLD repo is spliced onto subdirectory y/z of NEW, which is what we wanted.

I switched to iTerm2 recently from a lifetime of Terminal.app, and I am not missing the old Terminal one bit.

Ever double-clicked a token in your terminal to select it, and wished that Terminal.app knew what you actually meant to select? iTerm2 lets you define what text a quadruple-click will select, by matching the surrounding context against a regex.

Here are some additional Smart Selection rules I’ve found very useful:

  • Capturing the value of key=value: (?<==)[A-Za-z0-9-]+, precision high
  • Hex strings (hashes): [A-Fa-f0-9]+, precision normal
  • Path without initial [ab]/: (?<=\b[ab]/)([[:letter:][:number:]._-]+/+)+[[:letter:][:number:]._-]+/?, precision high

The last one is to get paths from git diff without the initial a/ or b/.

Constraints of the data model

  • Indexing is not available, so data may have to be denormalised
  • Columns and supercolumns are sorted by key name
    • names are byte strings but interpretation for sorting can be changed
  • Range queries are possible through partitioning
    • RandomPartitioner randomly distributes rows among machines according to MD5 value, leading to load-balancing
      • within a node, rows are sorted by key
    • OrderPreservingPartitioner distributes according to key

Cassandra data model

Column :: key → value

  • similar to a single datum

SuperColumn :: key → { subkey1 → value1, … }

  • a datum whose value is structured

ColumnFamily :: { column1, column2, … } = { key1 → {subkey1 → value1, subkey2 → value2}, … }

  • column families are stored in separate files
  • sorted by key major order
  • similar to an RDBMS table, except sparse

SuperColumnFamily :: { supercolumn1, supercolumn2, … }

Keyspace :: [ key1, key2, ... ] for a ColumnFamily

An example

  • User (an RDBMS table, a Cassandra ColumnFamily)
    • maps user attributes to byte array values
  • To do a query on one of those attributes, say state,
    • need to manually create a ColumnFamily { state → { city → { name → username ] } }
      • like indexing on state
    • then, where state == ‘CA’ is efficient (since ColumnFamilies are sorted by key)
  • Composite keys
    • corresponds to where state == ‘CA’ and city == ‘San Mateo’
      • { state:city → {name → username} }
    • ColumnFamilies are sorted by key
    • we can do where state == ‘CA’ (get all cities)
    • but also where state == ‘CA’ and city == ‘San Mateo’ (get one city)
    • but not range queries on city

Cribbed from

  • http://www.slideshare.net/benjaminblack/cassandra-basics-indexing
  • http://wiki.apache.org/cassandra/DataModel/
  • http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model