Discover E-Transcripts

A legal transcript reader for a more enjoyable transcript reading experience

INTRODUCTION

Discover-E-Transcripts (DET) is a Chrome Extension that offers an improved transcript reading experience compared to traditional transcript reading technology (such as PDF readers). 

Transcripts are used by the legal community to archive what was said during trial or pre-trial examinations. These 'legal transcripts' are created by court reporters who typically follow a pseudo-standard format, namely, a unique and recognizable format to differentiate the dialogue of different speakers. For example, shown below is a snippet from a typical legal transcript with bold line numbers at the left, dialogue from a speaker that is asking questions prefaced with 'Q', and dialogue from a speaker that answers the questions prefaced with an 'A':

DET uses regular expression (regex) to detect the dialogue of different speakers in a legal transcript. This enables the following two main benefits over traditional transcript reading technology:

1. the ability to copy and paste text from a transcript with line breaks only at lines with a new speaker (formatted copy), and

2. smarter text-to-speech functions.

Other useful functions in DET include issue flagging and  multi-query searching.

Instructions on how to install DET and realize its benefits are explained below.

INSTALLATION AND USER INTERFACE

Discover-E-Transcripts is available on the Chrome Web store. After installing DET, go to your chrome://extensions dashboard to enable the DET Extension. To access the DET Extension, open the Chrome browser and click on the extension icon (jigsaw) to the right of the address bar and then click the DET icon. To pin the DET Extension to your Chrome tool bar, hit the pin button to the right of the DET Extension (see picture below).

[Discover-E-Transcripts Chrome Extension is currently only available for testers (very special people)]

The DET Home Page will open in a new tab when the user clicks the DET Extension icon. The home page displays the DET logo and sponsors, a Top Navigation Bar, a Bottom Navigation Bar, and basic instructions. The Top Navigation Bar is a light blue-grey colour containing three buttons and the Bottom Navigation Bar is a dark blue-grey colour containing five buttons. The buttons on the Top and Bottom Navigation Bars are: 1) the Upload Transcript Button, 2) the Manual Button, 3) the Donate Button, 4) the Menu Button, 5) the Copy Button, 6) the Stop Speaking Button, 7) the Search Button, and 8) the Flag Issue Button. These buttons are labelled in the picture below. The function of these buttons are explained in more detail throughout this manual. 

UPLOAD A TRANSCRIPT

Click the Upload Transcript Button in the Top Menu Bar to upload a transcript. DET accepts transcripts that are either PDF or TXT file types. In order for DET to properly parse data in the input transcript, the transcript must follow a certain format. 

When a properly formatted legal transcript is uploaded to DET, the text is parsed and output to the the user's screen. The snapshots to the right (or below on smaller screens) show typical examples of what it looks like when a user uploads a properly formatted legal transcript to DET. 

DET's transcript display shows page numbers in the upper left corner of each page formatted as "-p ", followed by 3 to 5 digits, followed by "-". Transcript lines are numbered and separated from transcript text with a black or white vertical border (depending on the display settings). 

DET accepts transcripts that are either PDF or TXT file types. The transcript must follow a certain format in order for DET to properly parse data in the input transcript. Notably, both PDF and TXT transcripts rely on Speaker Identifiers. Accordingly, a discussion of the Speaker Identifiers is provided below, followed by specific formatting details for PDF and TXT file types. 

DET's required transcript format was chosen based on patterns identified from transcripts created by a variety of Canadian court reporters. This format may not capture all transcript formats. If DET does not parse your transcript, consider sending the developers of DET details regarding your transcript format. The developers may be able to optimise DET to recognise a wider range of transcript formats.


1. SPEAKER IDENTIFIERS (RELEVANT TO BOTH PDF AND TXT TRANSCRIPTS)

DET uses regex to detect Speaker Identifier patterns. These patterns are the same for both PDF and TXT transcript file types. There are three categories of Speaker Identifiers

For example, ' Q ' and ' Q. ' are both standard speaker identifiers for this type of speaker. ' Q  MS. CLARK:' and ' Q   MR X:' and ' 33.  Q. ' are also acceptable speaker identifiers for this type of speaker. 

For example, ' A ' and ' A. ' are both standard speaker identifiers for this type of speaker. ' A   MS. SHARK:' and ' A   MR Z:' are also acceptable speaker identifiers for this type of speaker. 

For example, ' MS. SHARK:', ' MR Z:', '  JUSTICE DENNING:',  and ' THE REGISTRAR:' are all acceptable speaker identifiers for this type of speaker. Note that if any of these speaker identifiers were prefaced with a ' Q ' or an ' A ', they would be identified as a speaker that asks questions or a speaker that answers questions, respectively.


2. PDF TRANSCRIPT FORMAT

Shown below is a snapshot of pages 1 and 3 of a typical PDF transcript. You can download this sample transcript here. Below is a list of PDF transcript format requirements that must be met in order for DET to function properly. The description of some of these requirements reference the labels in the picture below. 

3. TXT TRANSCRIPT FORMAT

The picture below shows a snapshot of pages 1 and 3 of a typical TXT transcript. You can download this sample transcript here. Below is a list of TXT transcript format requirements that must be met in order for DET to function properly. The description of some of these requirements make reference to the labels in the picture below. 

4. SAMPLE TRANSCRIPTS

Download a sample .pdf file transcript.

Download a sample .txt file transcript.

SMART TEXT-TO-SPEECH

DET's text-to-speech function is activated by clicking any Speaker Identifier on the transcript text. Speaker Identifiers are portions of the transcript that preface the substantive transcript content and serve to identify the speaker. As noted above, DET recognises three types of speakers: 1) the speaker that asks questions, 2) the speaker that answers questions, and 3) other speakers. 

DET will print Speaker Identifiers that it recognises in different colours (red for the speaker asking questions, blue for the speaker answering questions, and green for other speakers). To activate DET's text-to-speech function, click the Speaker Identifier on the transcript text. Once clicked, the dialogue that follows the Speaker Identifier will be highlighted and the text will be read out loud according to the user specified Voice Settings under the Menu Button on the Bottom Navigation Bar. DET will continue reading the transcript text proceeding the recently clicked Speaker Identifier until it reaches the end of the transcript. To stop the vocalization prematurely, click the Stop Speaking Button on the Bottom Navigation Bar

The image to the right (or below on smaller screens) is a snapshot showing highlighted text of a recently clicked Speaker Identifier. When DET's speaking function is active, the Stop Speaking Button on the Bottom Navigation Bar turns red

FORMATTED COPY

Users can highlight transcript text and click the Copy Button on the Bottom Navigation Bar to copy the selected text to their clipboard. The benefit of using the DET copy function compared to the general copy function (i.e. Ctrl + C) is twofold:

1. the DET formatted copy function causes the citation of the selected text to be printed as the first line on the user's clipboard, and 

2. the DET formatted copy function only prints line breaks at lines with new speakers.

The DET formatted copy function makes it easier to copy portions of transcripts into new documents which could help save a significant amount of time depending on the task. 

Shown to the right (or below on smaller screens) is a snapshot of the transcript text copied to the user's clipboard resulting from the DET formatted copy function and the regular Ctrl + C copy function. The pasted results from a DET formatted copy shows the citation of the copied text as the first line and line breaks (\n) only occurring on lines with new speakers.

ISSUE FLAGGING

Issue flagging can be accomplished by one of the following two option:

1. Select text then click the Flag Issue Button on the Bottom Navigation Bar to flag the selected text in accordance with the user specified Issue Flagging Settings (details below);

2. Alternatively: a) highlight text, b) right click on the highlighted text, and c) select an issue category that the highlighted text should be marked as.

When transcript text is flagged, one or more vertical lines corresponding to an issue will appear to the right of the flagged text. When the user hovers their cursor over the line, the issue name will appear beside the vertical line. 

SEARCHING

Since DET is a Chrome based transcript viewer, users can take advantage of their Chrome search function (Ctrl + F) or other search extensions to search text within the input transcript. DET also has a built-in search function allowing users to search text, a specific transcript page, and flagged issues. All search functions are accessible by clicking on the Search Button on the Bottom Navigation Bar. When a user clicks on the Search Button, a Search Interface will appear below the Bottom Navigation Bar. The Search Interface contains: 1)  a Search Input Field, 2) an Initiate Search Button , 3) a Search Down Button, 4) a Search Up Button, 5) a Search Page Button , 6) a Clear Search Button, and 7) a Search Count Display. These seven components of the Search Interface are shown on the picture to the right (or below on smaller screens). Click on the Search Button on the Bottom Navigation Bar again to make the Search Interface disappear.

1. TEXT SEARCH

To search text using DET's built-in search function, type words in the Search Input Field and then click the Initiate Search Button. The DET search function treats separate words as an entire phrase unless separated by the pipe character: '|'. For example, the difference between searching carbon 13 versus carbon|13 is that the latter query will search all instances of "carbon" and all instances of "13" whereas the former query searches "carbon 13" as a phrase. 

To scroll to the next search hit above or below the user's current position on the screen, click the Search Up Button or Search Down Button, respectively. DET normally shows all the search hits on the transcript highlighted in pink, however, when the user clicks the Search Up Button or Search Down Button, the search hit closest above or below, respectively, the current position on the user's screen will be highlighted in blue. 

The number of search results are displayed in the Search Count Display. The number following the forward slash in the Search Count Display (i.e. the divisor) shows the total number of search hits. The number before the forward slash in the Search Count Display  (i.e. the dividend) indicates the search hit currently viewed by the user, ordered from the beginning to the end of the transcript. 

Clear the Search Input Field via the Clear Search Button.

2. PAGE SEARCH

To search a page using DET's built in search function, type in a page number in the Search Input Field and then click the Search Page Button. If the page exists on the uploaded transcript, DET will automatically scroll the page into view. 


3. ISSUE SEARCH

To search issue labelled text using DET's built in search function, type in "I#" followed by the issue number (1 to 10) in the Search Input Field and then click the Initiate Search Button. DET will highlight the first line of each instance of consecutively labelled transcript text. Search multiple issues by separating issue search queries with a pipe character: '|'. For example, searching I#5|I#7 will search the first line of each instance of consecutively labelled transcript text that is labelled as issue 5 or issue 7. Users can mix issue searching queries with text searching queries. For example,  carbon|I#7 is a valid search query that will highlight all instances of "carbon" and  the first line of each instance of consecutively labelled transcript text that is labelled as issue 7.

As with text searching, scroll to the next search hit above or below the user's current position on the screen by clicking the Search Up Button or Search Down Button, respectively. DET normally shows all the search hits on the transcript highlighted in pink, however, when the user clicks the Search Up Button or Search Down Button, the search hit closest above or below, respectively, the current position on the user's screen will be highlighted in blue. 

Issue search results are included as part of the Search Count Display. Clear the Search input Field via the Clear Search Button.

MENU

1 to 6 Menu Panels (depending on screen size) will appear below the  Bottom Navigation Bar when a user clicks the Menu Button on the Bottom Navigation Bar. The six Menu Panels are: 1. Voice Settings, 2. PDF Settings, 3. Transcript Metadata, 4. Display Settings, 5. Time Tracker, and 6. Issue Flagging Settings. DET will display as many Menu Panels that will fit on the user's screen. If all 6 Menu Panels are not able to fit on the user's screen, a Left and Right Menu Arrow will appear below the Menu Panels that are displayed, allowing the user to navigate between different Menu Panels.

1. VOICE SETTINGS

The Voice Settings are divided into the following four sub-settings:

1: Voice Type Selection. Users can choose different voices for the three recognizable speaker types. The 'other' speaker type is further subdivided into two user defined speaker groups and a default group.

2: User Defined Speaker Input. Certain transcripts (for example, judge alone trial transcripts) may not have a lot of 'question' and 'answer' Speaker Identifiers. To enable the 'other' speaker type with the capacity to have different voice types, users can specify 2 different groups that fall within the 'other' speaker type. For example, users can input the names of individuals from law firm X into group 1 and individuals from law firm Y into group 2. By setting the Group 1 and Group 2 voice type to different voices, the user can effectively distinguish between different speakers that fall within the 'other' speaker type. If an 'other' speaker type does not fall within Group 1 or 2 then the Default voice will be applied.  

3: Voice Rate Selection. Users can change the rate at which text is read using the the Voice Rate Selection slider.

4: Vocalize Speaker ID. If this setting is enabled then the text-to-speech function will say the speaker type before the dialogue of that speaker. For example, if a line on a transcript contains an answer of "Yes", enabling the Vocalize Speaker ID for the answer speaker will cause DET's text-to-speech function to vocalize "Answer: Yes" (instead of simply "Yes"). This setting, and the User Defined Speaker Input setting, are intended to help eliminate confusion as to who is speaking when users listen to a transcript.

2. PDF SETTINGS

The PDF Settings includes the option to specify the first page of  a PDF transcript. As noted in previously in the 'PDF Transcript Format' heading above, DET does not detect page numbers of PDF transcripts. Instead, DET assumes the first transcript page containing line numbers is the first physical page of the transcript. If the first physical page in a PDF transcript is not marked as 'page 1' of the transcript, then users must manually adjust the 'PDF first page number' in DET to correspond with the whatever page number is on the first physical page of the transcript. Use '0' or negative values in cases where the uploaded PDF transcript has its first numbered page on a page after the first physical page. 

Click the sync button to apply your changes. This function does not do anything for transcripts in TXT format. 

Acceptable input range is -99 to 99999. 

3. TRANSCRIPT METADATA

Users can view certain metadata of the uploaded transcript by clicking on the 'metadata' button in the side menu. Once the metadata button is clicked, the following transcript metadata is displayed:

'Page count' means all detected pages in the uploaded transcript. Since DET does not display pages without line numbers, the displayed pages may be less than the 'Page count'.

'Questions answered' is calculated by dividing the 'answer count' by the 'question count' and multiplying by 100. Theoretically, if all questions have been answered, then the 'question count' and 'answer count' will be the same and the questions answered should be 100%. The 'questions answered' should give users a rough idea of the percent of questions objected to. Sometimes DET incorrectly identifies text as an answer. For example, if a line begins with one or more spaces, followed by 'A.' or 'A', followed by one or more spaces, then text following this pattern will be considered an answer even though it may not be. This erroneous answer detection may cause the 'questions answered' to be inaccurate.

4. DISPLAY SETTINGS

Users can increase or decrease the transcript font size and line spacing. Dark mode will invert screen colours. The 'Page buffer count' check boxes allow users to select the maximum number of pages for display on screen. A lower page display count usually results in better performance, especially on slow computers. 

The search function only highlights content on the pages displayed on screen. This may mislead the user as to the true number of search hits in situations where the uploaded transcript contains more pages than the selected page buffer count. To better inform the user of the true number of search hits, DET reports the number of search hits for each batch of buffered pages under each page buffer button (see buttons circled in red in the figure below).

5. TIME TRACKER

The Time Tracker starts from zero when: 1) a user first opens the DET Chrome Extension, 2) when the page is refreshed, 3) when a new transcript is uploaded, and 4) when the clear button [X] is clicked.  Pause and play the Time Tracker using the pause/play button. 

 Warning: the Time Tracker may not track time accurately. If the user uploaded a large transcript, DET may take a while to perform functions which could cause the Time Tracker to stop running temporarily resulting is a delayed time. Best to rely on this time for emergency purposes only. 

6. ISSUE FLAGGING SETTINGS

Renaming issues: Transcripts lines may be marked with up to 10 issues. By default, these issues are called "Issue 1" to "Issue 10". To change an issue name, click the default issue name text under the 'Issue name' column (to the right of the check boxes), and start typing new text. Only alphanumericals, spaces, underscores, and dashes are allowed in issue names.

Marking issues:  As noted in the 'Issue Flagging' heading above,  transcript lines may be flagged with an issue by selecting transcript text and then clicking the Flag Issue Button on the Bottom Navigation Bar. The selected text will be marked with an issue if that issue has its 'flag checkbox' checked under these Issue Flagging Settings. Transcript text will be stripped of an issue if the issue has its 'trash checkbox' checked. If the 'flag checkbox' and 'trash checkbox' for a given issues are both unchecked, then selecting text and clicking the Flag Issue Button on the Bottom Navigation Bar will maintain the status quo of that issue (i.e. it will remain marked or unmarked). 

Printing issues: Check the 'print checkbox' and then click the Print Issues Button to print all transcript text marked as a particular issue to a separate page. 

Downloading issues: Users can save their issue markup by clicking the Download Issues Button. This button will cause the citations corresponding to each issue flagged transcript line to be downloaded to the user's computer. The resulting DET Issue Flags File is organised into json file compatible data. For example, to the right (or below on small screens) is an example of a DET Issue Flags File with comments prefaced with two red forward slashes and where 'USER_DEFINED_ISSUE_NAME' is the name given to the issue by the user. Note: all issue markup is deleted on upload of a new transcript or when a user refreshes or exits the page. Accordingly, users must download a DET Issue Flags File to save their issue markup.   

Uploading issues: Issue flags can be added to DET in bulk by clicking the Upload Issues Button and selecting a DET Issue Flags File.

{

  "issue 1": [

    "USER_DEFINED_ISSUE_NAME",

    "FC1",

    "FC2",

    "FC3"

  ],

  "issue 2": [

    "USER_DEFINED_ISSUE_NAME",

    "FC26",

    "FC27",

    "FC28"

  ],

  

  // issues 3 to 8 would be printed here 


  "issue 9": [  // example of an issue with no flags

    "USER_DEFINED_ISSUE_NAME"

  ],

  "issue 10": [

    "USER_DEFINED_ISSUE_NAME",

    "FC55"

  ]

}

PRICING

Users are encouraged to donate to an organisation they care about. Enterprise users are encouraged to become sponsors. See the sponsorship policy for details. 


PRIVACY POLICY

The only data stored by DET are the user options in the settings menu, namely:

1. all Voice Settings options,

2. all PDF Settings options, and

3. the dark mode toggle option in the Display Settings.

Transcripts are processed dynamically at run time. Accordingly, DET does not store any information resulting from a transcript upload.  


LICENCE

End user license agreement for Discover-E-Transcripts version 1.x:

1. DET version 1 (including all sub-versions) are available for use in Canada for research purposes only. 

2. The Licensor offers Discover-E-Transcripts as-is and as-available, and makes no representations or warranties of any kind concerning Discover-E-Transcripts, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to users of Discover-E-Transcripts.

3. To the extent possible, in no event will the Licensor be liable to users of Discover-E-Transcripts on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this license or use of Discover-E-Transcripts, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to users of Discover-E-Transcripts.

4. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.


SPONSORSHIP POLICY

Fill out the sponsorship form to become a sponsor.  DET has the right to decline a sponsorship request. If DET accepts a sponsorship request and payment thereof, the sponsor-sponsee relationship will be governed by the following terms: 

1. Sponsor's business name and logo, hyperlinked to a website of their choice, shall be displayed in a 75 pixel by 214 pixel rectangle under the 'Supported by:' subheading (located below the DET logo) on the DET Chrome Extension for 365 days after receipt of sponsorship payment.

2. DET can accept more than one sponsor. DET will display sponsors on the DET Chrome Extension home page in accordance with the following ranking principles: 

a) primary display priority is based on sponsorship tier (see the sponsorship form for information regarding sponsorship tiers), 

b) secondary display priority is based on seniority (consecutive length of sponsorship), and 

c) tertiary display priority is based on alphabetical order of sponsor name. 

3.  All sponsorship proceeds will be allocated by Legal Informatics in the following order: 

a) towards the develop and maintenance of the Discover-E-Transcripts Chrome Extension,

b)  towards the develop and maintenance of a Discover-E-Transcripts Android app, 

c) towards the develop and maintenance of a Discover-E-Transcripts Apple app, and 

d) towards the develop of other Legal Informatics projects.

4. Sponsors have no ownership (or rights to a licence) in anything created by Legal Informatics, including any products cited clause #3.

5. No refund for money paid to DET by sponsors. 


BUG REPORTS AND FEATURE REQUESTS

Any problems and/or inaccuracies can be sent to the developers through this form. Please specify a return e-mail address if you would like to be contacted back.


CREDITS

Below is a list of libraries that help make DET what it is. The creators of the content below do not endorse DET in any way.

[1] DET's PDF processing is built with pdf.js distribution version 2.9.359 as-is (no modifications made). pdf.js is copyright protected work licensed under the Apache License, Version 2.0. For ease of reference, disclaimer of warranties and limitation of liability from sections 7 to 9 of the Apache License, Version 2.0 are reproduced below:

7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

[2] DET's search function is built with mark.js version 9.0.0 as-is (no modifications made). mark.js is copyright protected work licensed under the MIT License. For ease of reference, disclaimer of warranties and limitation of liability from the MIT License are reproduced below:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

[3] DET uses JQuery version 3.5.1 as-is (no modifications made). JQuery is copyright protected work licensed under the MIT License. Disclaimer of warranties and limitation of liability from the MIT License are reproduced above (same as the mark.js license).

[4] DET uses FontAwesome icons (version 5.15.4), various components of which are provided under the following licenses:

Icons: CC BY 4.0 License

Fonts: SIL OFL 1.1 License, and

Code: MIT License


Discover-E-Transcripts 2021 ©