ML Ops Fundamentals

Hey, how do I get access to the mock_data.csv file in the Fundamentals of MLOps Demo: Small to Medium Data lecture?

It’s a randomly generated file. Run the file mockdata.py from the course repo, and the script will generate it for you.

1 Like

Thanks for the help. About 9 minutes in, the instructor asks us to run df[ā€˜profile’] = df[ā€˜profile’].apply(lambda x: json.loads(x) if pd.notnull(x) else {}) which caused this error. What adjustments to that line of code do I need to make?


TypeError Traceback (most recent call last)
Cell In[21], line 1
----> 1 df[ā€˜profile’] = df[ā€˜profile’].apply(lambda x: json.loads(x) if pd.notnull(x) else {})

File /lib/python3.12/site-packages/pandas/core/series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(…)
4796 **kwargs,
4797 ) → DataFrame | Series:
4798 ā€œā€"
4799 Invoke function on values of Series.
4800
(…)
4915 dtype: float64
4916 ā€œā€"
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
→ 4924 ).apply()

File /lib/python3.12/site-packages/pandas/core/apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
→ 1427 return self.apply_standard()

File /lib/python3.12/site-packages/pandas/core/apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesn’t have a na_action keyword and for backward compat reasons
1503 # we need to give na_action="ignore" for categorical data.
1504 # TODO: remove the na_action="ignore" when that default has been changed in
1505 # Categorical (GH51645).
1506 action = ā€œignoreā€ if isinstance(obj.dtype, CategoricalDtype) else None
→ 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)

File /lib/python3.12/site-packages/pandas/core/base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
→ 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File /lib/python3.12/site-packages/pandas/core/algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
→ 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

Cell In[21], line 1, in (x)
----> 1 df[ā€˜profile’] = df[ā€˜profile’].apply(lambda x: json.loads(x) if pd.notnull(x) else {})

File /lib/python312.zip/json/init.py:339, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 else:
338 if not isinstance(s, (bytes, bytearray)):
→ 339 raise TypeError(f’the JSON object must be str, bytes or bytearray, ’
340 f’not {s.class.name}')
341 s = s.decode(detect_encoding(s), ā€˜surrogatepass’)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):

TypeError: the JSON object must be str, bytes or bytearray, not dict

Please put your code into a code block,

Like this

This will make it easier to test your code. See this page for info on how to format things in this Discord forum.

also – which lecture are you referring to here? A link would be very welcome.

Same lecture
https://learn.kodekloud.com/user/courses/fundamentals-of-mlops/module/d72a3430-8b54-48d6-89ad-6a5f8b74f4ab/lesson/9dce8d6d-fed1-4e76-a7b8-f4bca2c5d482

I’m not seeing a problem with that code. You’ll want to make sure that you’ve set up a virtual environment for the directory. You need to the same modules for it as you did to build the data file. I just did the following, and it works for me:

MacBook-Pro:01-data-transformation$ source .vemv/bin/activate
(.vemv) MacBook-Pro:01-data-transformation$ ls
data-cleaning.py	data-transformation.py	mockdata.py
data-exploration.py	mock_data.csv		requirements.txt
(.vemv) MacBook-Pro:01-data-transformation$ python
Python 3.13.5 (main, Jun 11 2025, 15:36:57) [Clang 17.0.0 (clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("mock_data.csv")
>>> df.head()
   id      name   age   salary   hire_date                                            profile department   bonus
0   1  Name_103  77.0  60000.0  2022-07-28  {"address": "Street 42, City 29", "phone": "67...  Marketing     NaN
1   2  Name_436  62.0  50000.0  2018-04-16  {"address": "Street 86, City 12", "phone": "39...  Marketing  6209.0
2   3  Name_861  61.0  60000.0  2017-11-17  {"address": "Street 92, City 11", "phone": "81...         HR  3924.0
3   4  Name_271  36.0  70000.0  2023-05-27                                                NaN        NaN  4640.0
4   5  Name_107  78.0  60000.0  2018-02-18  {"address": "Street 9, City 21", "phone": "188...         IT  9111.0
>>> import json
>>> df['profile'] = df['profile'].apply(lambda x: json.loads(x) if pd.notnull(x) else {})
>>> df['profile'].head()
0    {'address': 'Street 42, City 29', 'phone': '67...
1    {'address': 'Street 86, City 12', 'phone': '39...
2    {'address': 'Street 92, City 11', 'phone': '81...
3                                                   {}
4    {'address': 'Street 9, City 21', 'phone': '188...
Name: profile, dtype: object

VM? Like an Azure VM or EC2 instance (ones I’m familiar with)? Was that explained in the setup? I’ve been following along via using Juypter notesbooks online (as the instructor did).

No VM. Just using python on a Mac. You could do the same thing on Windows. A ā€œvirtual environmentā€ is just a directory where you create a special directory that that installs python with a known, controlled set of modules. So to set up python3 that’s installed on my Mac, I just did:

python3 -m venv .vemv
source .vemv/bin/activate
pip install numpy
pip install pandas
pip freeze > requirements.txt

That sets up python binaries, makes use of the ā€œenviromentā€, and uses pip from the environment to install numpy and pandas into the environment. ā€œfreezeā€ lets me pass the set-up of the environment to somebody else, for their use, with the same versions of modules I"m using.

Here’s a tutorial about how to do this yourself, and why it’s a good way to use python.

1 Like

Thanks for the solution and while you were answering, I cloned the repo to my mac and went through the tutorial again. It appears a few steps were excluded in the repo that were present in the tutorial which is centered around the json import and subsequent task related to it. The import isn’t even used in the code ( https://github.com/kodekloudhub/Fundamentals-of-MLOps/blob/main/01-data-transformation/data-transformation.py). The other steps were followed and I ultimately get the points he made in the lecture as well as your about not needing a cloud VM or something. Thanks.