Hey, how do I get access to the mock_data.csv file in the Fundamentals of MLOps Demo: Small to Medium Data lecture?
Itās a randomly generated file. Run the file mockdata.py from the course repo, and the script will generate it for you.
Thanks for the help. About 9 minutes in, the instructor asks us to run df[āprofileā] = df[āprofileā].apply(lambda x: json.loads(x) if pd.notnull(x) else {}) which caused this error. What adjustments to that line of code do I need to make?
TypeError Traceback (most recent call last)
Cell In[21], line 1
----> 1 df[āprofileā] = df[āprofileā].apply(lambda x: json.loads(x) if pd.notnull(x) else {})
File /lib/python3.12/site-packages/pandas/core/series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(ā¦)
4796 **kwargs,
4797 ) ā DataFrame | Series:
4798 āā"
4799 Invoke function on values of Series.
4800
(ā¦)
4915 dtype: float64
4916 āā"
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
ā 4924 ).apply()
File /lib/python3.12/site-packages/pandas/core/apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
ā 1427 return self.apply_standard()
File /lib/python3.12/site-packages/pandas/core/apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesnāt have a na_action
keyword and for backward compat reasons
1503 # we need to give na_action="ignore"
for categorical data.
1504 # TODO: remove the na_action="ignore"
when that default has been changed in
1505 # Categorical (GH51645).
1506 action = āignoreā if isinstance(obj.dtype, CategoricalDtype) else None
ā 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File /lib/python3.12/site-packages/pandas/core/base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
ā 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File /lib/python3.12/site-packages/pandas/core/algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
ā 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )
File lib.pyx:2972, in pandas._libs.lib.map_infer()
Cell In[21], line 1, in (x)
----> 1 df[āprofileā] = df[āprofileā].apply(lambda x: json.loads(x) if pd.notnull(x) else {})
File /lib/python312.zip/json/init.py:339, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 else:
338 if not isinstance(s, (bytes, bytearray)):
ā 339 raise TypeError(fāthe JSON object must be str, bytes or bytearray, ā
340 fānot {s.class.name}')
341 s = s.decode(detect_encoding(s), āsurrogatepassā)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
TypeError: the JSON object must be str, bytes or bytearray, not dict
Please put your code into a code block,
Like this
This will make it easier to test your code. See this page for info on how to format things in this Discord forum.
also ā which lecture are you referring to here? A link would be very welcome.
Iām not seeing a problem with that code. Youāll want to make sure that youāve set up a virtual environment for the directory. You need to the same modules for it as you did to build the data file. I just did the following, and it works for me:
MacBook-Pro:01-data-transformation$ source .vemv/bin/activate
(.vemv) MacBook-Pro:01-data-transformation$ ls
data-cleaning.py data-transformation.py mockdata.py
data-exploration.py mock_data.csv requirements.txt
(.vemv) MacBook-Pro:01-data-transformation$ python
Python 3.13.5 (main, Jun 11 2025, 15:36:57) [Clang 17.0.0 (clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("mock_data.csv")
>>> df.head()
id name age salary hire_date profile department bonus
0 1 Name_103 77.0 60000.0 2022-07-28 {"address": "Street 42, City 29", "phone": "67... Marketing NaN
1 2 Name_436 62.0 50000.0 2018-04-16 {"address": "Street 86, City 12", "phone": "39... Marketing 6209.0
2 3 Name_861 61.0 60000.0 2017-11-17 {"address": "Street 92, City 11", "phone": "81... HR 3924.0
3 4 Name_271 36.0 70000.0 2023-05-27 NaN NaN 4640.0
4 5 Name_107 78.0 60000.0 2018-02-18 {"address": "Street 9, City 21", "phone": "188... IT 9111.0
>>> import json
>>> df['profile'] = df['profile'].apply(lambda x: json.loads(x) if pd.notnull(x) else {})
>>> df['profile'].head()
0 {'address': 'Street 42, City 29', 'phone': '67...
1 {'address': 'Street 86, City 12', 'phone': '39...
2 {'address': 'Street 92, City 11', 'phone': '81...
3 {}
4 {'address': 'Street 9, City 21', 'phone': '188...
Name: profile, dtype: object
VM? Like an Azure VM or EC2 instance (ones Iām familiar with)? Was that explained in the setup? Iāve been following along via using Juypter notesbooks online (as the instructor did).
No VM. Just using python on a Mac. You could do the same thing on Windows. A āvirtual environmentā is just a directory where you create a special directory that that installs python with a known, controlled set of modules. So to set up python3 thatās installed on my Mac, I just did:
python3 -m venv .vemv
source .vemv/bin/activate
pip install numpy
pip install pandas
pip freeze > requirements.txt
That sets up python binaries, makes use of the āenviromentā, and uses pip from the environment to install numpy and pandas into the environment. āfreezeā lets me pass the set-up of the environment to somebody else, for their use, with the same versions of modules I"m using.
Hereās a tutorial about how to do this yourself, and why itās a good way to use python.
Thanks for the solution and while you were answering, I cloned the repo to my mac and went through the tutorial again. It appears a few steps were excluded in the repo that were present in the tutorial which is centered around the json import and subsequent task related to it. The import isnāt even used in the code ( https://github.com/kodekloudhub/Fundamentals-of-MLOps/blob/main/01-data-transformation/data-transformation.py). The other steps were followed and I ultimately get the points he made in the lecture as well as your about not needing a cloud VM or something. Thanks.