python - pandas groupby to nested json -- dont want calculated fields -
i'm working on d3.js graphic. data in huge multitab .xls. have grab data each tab, decided dump in pandas , export .json.
original data, spread across many tabs:
demography, area, state, month, rate on 65, region2, ga, may, 23 on 65, region2, al, may, 25 nan, random_odd_data, mistake, error 18-65, region2, ga, 77 18-65, region2, al, 75
now, put in pandas, merged , cleaned up:
demography area state month rate 0 on 65 region2 ga may 23 1 on 65 region2 al may 25 2 18-65 region2 ga may 50 3 18-65 region2 al may 55
now, group it
group = df.groupby(['state', 'demography'])
yields
<pandas.core.groupby.dataframegroupby object @ 0x106939610>
trying this:
group = df.groupby(['state', 'demography']).count()
yields correct, except don't want count anything, want "rate"
state demography area month rate al on 65 1 1 1 18-65 1 1 1 ga on 65 1 1 1 18-65 1 1 1
sure enough, exports "1" every value, lol:
group.reset_index().to_json("myjson2.json", orient="index")
dang there, how export each state parent?
[ { "state": "alabama", "over 65": 25, "18-65": 50 }, { "state": "georgia", "over 65": 23, "18-65": 55 } ]
the count method counts number of non-nan entries each group in each column, hence why they're 1 here (each group size 1, there no-nans).
(i can't find specific link, mentioned in the groupby docs.)
i think want pivot_table
:
in [11]: res = df.pivot_table('rate', 'state', 'demography') in [12]: res out[12]: demography 18-65 over65 state al 55 25 ga 50 23
i think you're looking orient='records'
(you need reset_index
first though):
in [13]: res.reset_index().to_json(orient='records') out[13]: '[{"state":"al","18-65":55,"over65":25},{"state":"ga","18-65":50,"over65":23}]'
Comments
Post a Comment