minnnyiのブログ

Final Presentation about Our Daily Data

How to choose a suitable stage play?

My Idea

我是一个住在东京的留学生，也是一个舞台剧爱好者。

I'm a performing arts lover, and also an international student living in Tokyo.

在中国时，我就经常在线上观看日本的舞台剧，来到日本后，多样的剧目更是让我陷入了选择困难。

When I was in China, I often watched Japanese stage plays online. After coming to Japan, the variety of repertoire made it even more difficult for me to choose.

于是我决定使用数据可视化的手段来帮助自己选择心仪的舞台剧，避免自己因为信息量过大而造成错过好剧的遗憾。

So I decided to use data visualization to help me choose my favorite stage play, so as to avoid the regret of missing a good play due to too much information.

一开始我想搭建一个能够个性化定制筛选观剧倾向的平台，因为每个人对于舞台剧的喜好都是不同的，目前的网站或资讯号只能做到信息的收集，个人的观剧计划要依靠耗费大量时间筛选来完成。

At the beginning, I wanted to build a platform that can personalize the watching tendencies, because everyone has different preferences for stage plays. The current website or information account can only collect information, and a personal watching plan needs to be Relying on time-consuming sifting.

但很快我发现以我自己的能力无法支持这样一个宏大的计划，于是我决定从“为自己定制观剧计划”开始，这就是我的课题，“如何为自己选择适合在日本观看的舞台剧？”

But soon I found that I could not support such a grand plan with my own abilities, so I decided to start with "customizing a drama viewing plan for myself". This is my topic,"How to choose a stage play that is suitable to watch in Japan?"

My Process

首先，我挑选了自己较为感兴趣、且在我回国前能够进行首演的24部舞台剧，记录它们的名字、题材、主演、原作、购票信息、剧院位置、上演时间等，组建属于自己的数据集。

First, I selected 24 stage plays that interested me and could be premiered before I returned to China, and recorded their names, themes, actors, original works, ticket purchase information, theater locations, performance times, etc., to create my own of datasets.

接着，我根据自己的喜好对部分因素进行评分，并记录每一部剧的得分结果。我将数据集的内容分为两部分，分别为客观因素和个人喜好。

Then, I rated some of the factors based on my preferences and recorded the score for each show. I divided the content of the data set into two parts, namely objective condition and personal preferences.

对于一些无法直接用数值表示的客观因素，我利用了不同的方式对其进行分级或换算，让每一个因素都能以数字形式表示，方便后续的数据可视化。

For some objective factors that cannot be expressed directly with numerical values, I used different methods to classify or convert them, so that each factor can be expressed in numerical form to facilitate subsequent data visualization.

最终，我得到了一份庞大的数据集。我曾怀疑24部剧的数据集能否做出一份较好的数据可视化成果，需不需要再扩充一部分，但最后数据量说明了一切，再多就是自找麻烦。

Eventually, I ended up with a huge data set. I once doubted whether the data set of 24 dramas could produce a better data visualization result, and whether it was necessary to expand it further. But in the end the amount of data spoke for itself, and any more would be more difficulty.

数据集搭建结束后，进入数据可视化阶段。R语言看似高深，其实上手很快。由于我有一点点Python基础，简单的指令很快就能触类旁通。

After the data set is built, I entered the data visualization stage. The R language may seem advanced, but it’s actually very fast to get started with. Since I have a little bit of Python foundation, simple instructions can be understood quickly.

1.0阶段，我使用单张图来进行数据可视化，将客观因素得分均值为横轴，计算个人喜好得分的均值为纵轴，建立散点图，横纵坐标数值都较大的数据则为筛选出的结果。

In the 1.0 stage, I used a single chart for data visualization. The mean score of the objective factors was used as the horizontal axis, and the mean value of the calculated personal preference score was used as the vertical axis. A scatter plot was created. The data with larger horizontal and vertical coordinate values are the filtered results.

然而以这种方式得出的结果过多，未能达到数据筛选的效果，同时在计算均值的过程中也造成了大量的数据浪费，1.0方案就此废止。

However, too many results were obtained in this way, which failed to achieve the effect of data screening. At the same time, a large amount of data was wasted in the process of calculating the mean, and the 1.0 plan was abandoned.

2.0阶段使用facet wrap创建多面板图，并将一些时间成本尽可能转换为金钱，与票价相加作为客观花费，建立横轴；纵轴则为某一个人喜好的得分因素。在这个阶段分别以不同颜色表示不同剧目、以剧种分面创建了散点图和以不同颜色表示不同剧种、以剧院所在地分面创建了散点图。

In the 2.0 stage, facet wrap is used to create a multi-panel chart, and some time costs are converted into money as much as possible, and added to the ticket price as an objective cost, and the horizontal axis is established; the vertical axis is the scoring factor of a certain person's preference.At this stage, two scatterplots were created, one with different colors representing different plays and facets by play type, and one with different colors representing different play types and facets by theater location.

但此时得出的结果过于分散，特别是以剧院所在地分面时，东京、大阪和其他地方剧烈的分布不平衡导致可视化结果几乎没什么用。

However, the results obtained at this time are too scattered, especially when divided by theater location. The severe distribution imbalance in Tokyo, Osaka and other places makes the visualization results almost useless.

在3.0阶段重新引入tidyverse进行绘图，分面变得合理起来，但因为数据量太大，加入标签后所有的文字都挤到一起，就算无限放大仍然没办法改善，只能放弃加入标签的想法。

In the 3.0 stage, tidyverse was reintroduced for drawing, and faceting became reasonable. However, because the amount of data was too large, all the text was squeezed together after adding labels. Even if it was infinitely enlarged, there was still no way to improve it, so we had to give up the idea of adding labels.

Final Work

在3.0方案的基础上，进行了最终成果的输出。

首先是用于数据可视化的数据集，删除了前期的计算步骤，只留下可视化所需的简单数据，以免出现BUG。

Based on the 3.0 plan, the final results were output.

The first is the data set used for data visualization, which deletes the early calculation steps and leaves only the simple data required for visualization to avoid BUGs.

接着，以客观花费为横轴，用个人喜好得分和不能加入客观花费的客观得分分面，用颜色表示不同剧目，建立散点图。

Then, with the objective cost as the horizontal axis, personal preference scores and objective score facets that cannot be added to the objective cost are used, and colors are used to represent different plays to create a scatter plot.

为了解释数据可视化成果如何帮助我做出观剧决策，需要制作未来继续深入才能搭建的平台的界面，这些界面仅供说明。

In order to explain how the data visualization results can help me make decisions about watching dramas, it is necessary to create interfaces for the platform that will be built in the future. These interfaces are for illustration only.

首先，需要说明其用于选择合适舞台剧的功能。

First, its function for selecting a suitable stage play needs to be explained.

然后，选择自己想要用于比较的舞台剧。

Then, choose the stage play you want to compare to.

在选择完毕后，用户将对选中的舞台剧逐一进行打分，此时用户仅能编辑个人喜好方面的分数，客观情况的得分将有系统自动计算。

After the selection is completed, the user will rate the selected stage plays one by one. At this time, the user can only edit the scores for personal preferences, and the system will automatically calculate the scores for objective circumstances.

打分完成后，系统将显示客观因素的得分结果和用户个人的评分结果，用于确认。

After the scoring is completed, the system will display the scoring results of the objective factors and the user's personal scoring results for confirmation.

确认每一部剧的打分后，系统将为用户展示用于比较筛选的所有剧目，确认无误后，即可进行可视化处理。

After confirming the score of each drama, the system will display all the dramas for comparison and screening to the user. Once confirmed, visual processing can be performed.

系统将首先输出数据可视化的基础图片，之后会将每个分面图中最值得观看的剧目标明。

The system will first output the basic image of data visualization, and then indicate the most worth-watching drama in each facet image.

为了方便用户进一步了解每个分面中哪几部剧更值得观看，系统将输出一份文字版的筛选说明。

In order to facilitate users to further understand which dramas in each section are more worthy of watching, the system will output a text version of the screening instructions.

最终，在所有分面结果中出现次数越多的剧目推荐指数越高，如本次每个分面取前三的筛选结果则为出现三次的推荐指数最高，其次为出现两次的和出现一次的。

In the end, the more plays that appear in all facet results, the higher the recommendation index. For example, if the top three screening results are taken for each facet, the recommendation index will be the one that appears three times, followed by the one that appears twice and the one that appears once. of.

根据最终的筛选结果，我将前四部剧纳入了自己的观剧计划，并已经观看了其中的两部。

Based on the final screening results, I included the first four dramas in my drama viewing plan and have already watched two of them.

Review

在这次workshop中我收获了很多，拓展了新的视野，学到了新的技能。美中不足的是现在没有足够的能力独立完成我想要的平台的搭建，希望在未来能有机会造福更多的舞台艺术爱好者。

I gained a lot from this workshop, expanded my horizons, and learned new skills. The only fly in the ointment is that I currently don’t have enough ability to independently complete the construction of the platform I want. I hope that I will have the opportunity to benefit more stage art lovers in the future.