pandas - Inconsistent behavior of `dataframe.groupby(allcolumns).agg(len)` -
for demonstration purposes, first, define couple of simple dataframes, df0
, df1
:
>>> import pandas pd >>> import collections co >>> data = [['a', 1], ... ['b', 2], ... ['a', 3], ... ['b', 1], ... ['a', 2], ... ['a', 3], ... ['b', 1]] >>> colnames = tuple('xy') >>> df0 = pd.dataframe(co.ordereddict([(colnames[i], ... [row[i] row in data]) ... in range(len(colnames))])) >>> df0 x y 0 1 1 b 2 2 3 3 b 1 4 2 5 3 6 b 1 >>> >>> df1 = df0.ix[:, [0]] >>> df1 x 0 1 b 2 3 b 4 5 6 b
now, here's result of grouping on all columns of df0
, aggregating len
aggregator function:
>>> df0.groupby(['x', 'y']).agg(len) x y 1 1 2 1 3 2 b 1 2 2 1 dtype: int64
based on result, expected analogous operation df1
, namely df1.groupby(['x']).agg(len)
, give this:
x 4 b 3 dtype: int64
but that's not happens:
>>> df1.groupby(['x']).agg(len) empty dataframe columns: [] index: [a, b]
my questions are:
- is difference in behavior have expected on basis of pandas documentation, or bug in pandas? (if former case, please point me relevant documentation.)
- what's simplest way output expected (as shown above)
df1.groupby(['x']).agg(len)
?
see note @ bottom of aggrgation section: http://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation. pandas 'eats' aggregator column, left nothing aggregate.
you have series @ point. this:
in [63]: s = df1['x'] in [64]: s.groupby(s).agg(len) out[64]: x 4 b 3 name: x, dtype: int64
pandas doesn't automatically because hard figure out want , makes logic more complicated. suppose call bug (in should raise), technically valid.
Comments
Post a Comment